Slice Analysis

Fine-grained model comparison along task label dimensions; rows are always model × harness

1Slice by· Pick a dimension + 1 value → see model × harness on that subset

Slice by

Value

Legend:< 2525–4040–5555–70≥ 70·Cells = mean score (×100) on the tasks in that bucket