Multilevel Model: Variance Decomposition

How much of the variation in school achievement is persistent school quality, shared cohort effects, or subject-specific? A multilevel binomial model decomposes the variance properly — accounting for school size, bounded outcomes, and province-level test difficulty shifts.

Method: We model L3/4 counts directly (binomial likelihood approximated via logit-normal) with information weights n·p·(1−p). Fixed effects for each year×subject combination absorb province-level test difficulty changes. Random intercepts at the school level (M1), school×subject level (M1b), and school×year level (M2) decompose variance into four components via subtraction. Sample: English-language schools with complete, non-suppressed data across three years (2022–23 through 2024–25). 2021–22 is excluded: participation-rate columns are not reliably populated that year.

Variance decomposition

On the logit scale, total variance in school×year×subject achievement splits into four components:

% School

Persistent school quality — stable across years and subjects. Driven by SES, demographics, staffing.

% Subject profile

Persistent school×subject divergence — e.g. a school that is consistently stronger in Math than its overall level.

% Cohort

Year-specific school effect shared across all three subjects — the general cohort factor.

% Noise

Observation-level variation — sampling noise plus any true within-cell fluctuation.

The general cohort factor is smaller than PCA suggested. The previous PCA analysis on year-over-year differences found ~75% (G3) and ~67% (G6) of change variance was shared. The multilevel model finds % (G3) and % (G6) of within-school variance is the cohort factor. PCA overstated it because (a) differencing removed the dominant school component and (b) raw differences gave equal weight to noisy small schools.

Within-school decomposition

Zooming in on the within-school variance only (excluding persistent school quality):

Shrinkage: raw vs model-estimated school quality

The multilevel model pulls noisy small-school estimates toward the grand mean — the classic Gelman shrinkage effect. Each dot is one school; the x-axis is the raw mean L3/4 proportion (averaged across all years and subjects), the y-axis is the model's partial-pooling estimate.

Small schools (light dots) are pulled furthest from the diagonal — the model recognises their extreme proportions are dominated by sampling noise. Large schools (dark dots) stay near the diagonal because their raw estimates are already reliable.

Cohort effects over time

The cohort effect (school×year random intercept minus school random intercept) captures the year-specific shared component — how much better or worse a school's entire cohort performed relative to the school's long-run average, across all subjects.

The cohort effects are centred near zero each year (by construction — the fixed effects absorb year-level means). The spread shows how much schools vary in their year-to-year fortunes. Wider distributions indicate more cohort-driven volatility.

Model progression

Subject profile effects

A school's subject profile is its persistent subject-specific deviation — how much stronger or weaker it is in one subject relative to its own overall level. A school with a positive Math profile is consistently better at Math than its school intercept alone would predict, across all years.

Each dot below is one school×subject pair; x-axis is the subject profile on the logit scale (positive = relatively stronger, negative = relatively weaker). Schools near zero have uniform profiles across subjects.

The distribution of subject profiles is centred at zero for each subject (by construction — the M1b model absorbs overall school effects). The spread shows how much persistent subject-specific variation exists: a wider distribution means schools differ more in their subject strengths. Math tends to have a wider spread than Reading or Writing.

Interpretation

The dominant source of achievement variation is persistent school quality (% for Grade ). This is the stable school factor — SES, demographics, staffing, neighbourhood — that first-differencing deliberately removes. By modelling levels directly, we see it clearly.

Persistent subject profiles account for % of total variance. Some schools are consistently stronger in one subject relative to their overall level — e.g. a school that persistently outperforms in Math but underperforms in Writing. This is stable across years and distinct from both the school intercept and the year-to-year cohort factor.

The "general cohort factor" accounts for % of total variance (% of within-school variance). When a school has a good year, it tends to be good across all subjects. But this is a smaller effect than PCA on raw differences suggested, because PCA inflated the common component by: (a) removing the dominant school effect via differencing, and (b) treating all schools equally regardless of size.

Observation-level noise is % of total — sampling variation plus any true within-cell fluctuation. Grade 6 tends to have more subject-specific structure (% subject profile vs % in G3), consistent with G6 Math operating more independently from literacy.

Practical implications:

A single year's achievement score for a small school tells you mostly about persistent school quality, a little about the cohort, and almost nothing subject-specific. Partial pooling (shrinkage) gives better estimates than raw percentages.
Year-over-year changes at a school are ~% "real cohort" and the rest is subject-specific noise. Attributing a one-year swing to a curriculum intervention requires more evidence.
Persistent subject profiles mean that a school's relative strength in Math vs Reading is a real, stable feature — not just year-to-year noise. But at % of total variance, it is a small effect compared to overall school quality.
The logit-scale model properly handles schools near the ceiling (>90% L3/4) or floor (<20%), where percentage-point changes are compressed. A 5pp gain at 50% is a different signal than 5pp at 90%.