Topics

LMSys platform that gives benchmarks, has data contamination checks in place. (how?question ) Lately many models were doing very good in terms of benchmarks but weren’t great when tested in-the-wild, e.g. phi series, mistral etc. This is mainly due to data contamination.