PawBench

Slice Analysis

Fine-grained model comparison along task label dimensions; rows are always model × harness

1Slice by· Pick a dimension + 1 value → see model × harness on that subset
Slice by
Value
Legend:< 2525–4040–5555–70≥ 70·Cells = mean score (×100) on the tasks in that bucket