Mean Level Score

A continuous alternative to the binary Level 3/4 measure: weighted average of level percentages, range [1, 4].

EQAO reports four performance levels. The mean level score summarises a school's full achievement distribution as a single number by assigning ordinal scores to each level:

mean_level = (1·L1% + 2·L2% + 3·L3% + 4·L4%) / 100

The result lies in [1, 4]. Unlike L3/L4%, which collapses the four bins to a binary threshold, the mean level captures:

Two schools with the same L34% will differ in mean level when their within-half distributions differ.


Section 1: Distribution of Mean Level

Summary by year:


Section 2: Mean Level vs L3/L4%

L34% and mean level are highly correlated, but not equivalent. The scatter below shows all school-years; the OLS regression line and R² quantify how much distributional information L34% alone captures.

Theoretical basis. With levels scored 1–4, mean_level decomposes as:

mean_level = 1 + L₂/100 + 2·(L₃+L₄)/100 + L₄/100

where L₂, L₃, L₄ are percentages. Because L₃+L₄ = L34%, the OLS slope on L34% is theoretically close to 0.02 (per percentage point), adjusted for the cross-correlations between L₄ and L34% in the data.


Section 3: What Drives the Residual?

After removing the L34% trend, the residual reflects two independent within-half effects:

  1. Above-standard quality — share of L3+L4 students at L4 vs L3 (L4 share).
  2. Below-standard severity — share of L1+L2 students at L2 vs L1 (L2 share).

Both push the residual positive when students concentrate at the upper end of their respective half.


Section 4: Residual Distribution and Outliers (2025)

Highest positive residuals — mean level well above L34%-prediction: high L4 concentration among at-standard students, and/or high L2 among below-standard students.

Largest negative residuals — mean level below L34%-prediction: high L1 concentration among below-standard students, and/or high L3 (not L4) among at-standard students.


Methodology Notes

  • Score assignment: Ordinal scores 1–4 assume equal spacing between levels. This is the simplest defensible choice given that we observe counts per bin rather than underlying continuous scores. Converting to a 0–100 scale via (mean_level − 1) / 3 × 100 gives an equivalent metric compatible with the noise model used in Metric Stability.
  • Data availability: All four level columns (L1–L4) are present in schools_g{3,6}.parquet and are computed from the same assessed-student denominator as L34%. No additional ETL is required.
  • Using mean level in model validation: The metric is included in Metric Stability Section 5b as "Mean level (0–100)" — the normalized version on a 0–100 scale so RMSE is directly comparable to L34% results.
  • Noise model: At the 0–100 normalized scale, the worst-case score SD is ≈37 (uniform distribution across 4 equally-spaced levels), vs 50 for a Bernoulli at p=0.5. The conservative 100/√n noise ceiling used throughout the dashboard remains an upper bound.