PawBench

Slice 切片分析

按任务标签维度对模型表现进行细粒度对比,行始终为 model × harness

1切片维度· 选维度 + 1 个值,看该子集上的 model × harness 表现
切片维度
具体值
Legend:< 2525–4040–5555–70≥ 70·Cells = mean score (×100) on the tasks in that bucket