This article by Gordon McKenzie first appeared in Research Professional.

“And those whom he predestined he also called; and those whom he called he also justified; and those whom he justified he also glorified.” (Romans 8:30). The Teaching Excellence Framework touches on the fundamental question of predestination versus free will.

A full set of the TEF metrics, calculated by the Higher Education Funding Council for England, requires three years of reportable, benchmarked data for each of the core metrics — solid, immovable, unchangeable data.  So are the gold, silver and bronze recipients determined already? Or does the opportunity for institutions to submit evidence putting the metrics in context allow them to shape their own fate?

Any rational person reading the Department for Education’s Teaching Excellence Framework: year two specification would conclude that the TEF assessment is driven by the metrics. Assessment will be “based on both core and split metrics” and “supplemented” by the additional evidence of the provider submission.

The specification describes a sequential process for assessment. The first step is a review of core metrics (on teaching quality, learning environment and student outcomes), assessing whether performance is significantly better or worse than might be expected. This initial judgement — “hypothesis” — is “based on distance from benchmarks using the system of significance”. And while “system of significance” as a phrase has a flavour of arcane knowledge, reserved for senior initiates, using its results is simple: a provider with three or more positive flags and no negative flags (where “flags” means a significant difference from baseline) should be considered initially as “gold”. A provider with two or more negative flags should be considered initially as “bronze”. Everyone else is “silver”.

Step two is to test the initial hypothesis against the split metrics – metrics for a series of sub-groups, mainly those from disadvantaged backgrounds. But these will only count when there are positive or negative flags (and, given the smaller sample sizes, the specification acknowledges there will be fewer of these).

Step three is to consider the additional evidence provided by the provider submission. And the weight given to the evidence in the submission depends on the strength of the initial hypothesis, which depends on the metrics. The specification states: “The more clear-cut performance is against the core metrics, the less likely it is that the initial hypothesis will change in either direction in light of the further evidence.” Here “clear cut” means the number of positive or negative flags.

So, the TEF is driven by the metrics. But is it determined by the metrics too? Do they, in St Paul’s words, predestine, call, justify and glorify? In some cases, yes. According to the specification: “In the unusual case of a provider having six positive flags, we anticipate it will be highly unlikely that an initial hypothesis of gold would not be maintained, regardless of the content of the additional evidence.” Indeed. It would have to be a suicidal, ill-judged and self-defeating submission to overturn that judgement.

But these are the “unusual” cases – just how unusual (and what they are) we don’t yet know. For everyone else, do we conclude that the act of free will expressed through the submission will make a difference? Probably — yes. There isn’t a binary divide between “unusual” and everyone else. The further you move from “clear cut” the more the submission will matter, mattering most to those on the borderline and to those with a mix of positive and negative flags. The specification gives other examples too – very small providers that are unlikely to differ from benchmarks in a statistically significant way, and institutions in which positive or negative flags are concentrated in one aspect of assessment.

How free is the free will? The specification suggests not entirely. The primary purpose of the submission is to address shortcomings in performance against – yes – the metrics. The specification states: “In looking at the provider submission, assessors will be looking for evidence of factors that could have affected performance against the core and split metrics.”  Of course, they will also make judgements on evidence that addresses the criteria in other ways. But the sense is that this is secondary. “A provider is not required to address each criterion or to use them as a checklist,” it goes on. “Rather, they may wish to focus on areas of strength and areas where there are weaknesses in performance against the core and split metrics.” Free will exercised within a pre-determined framework. No room for the Pelagian heresy.

All this may just be the logical consequence of metrics that may be the best we have but are not a perfect proxy for teaching excellence; if the measure is inherently vulnerable then the narrative has to concentrate on shoring it up. But it is also a bit of a shame. While the specification does touch on examples of the rich activity that makes for an excellent learning environment and the highest quality teaching, I fear this richness will get squeezed out of the 15 pages to which submissions are limited and will fall victim to the need to feed the metrics. The structure of any performance assessment framework tends to shape the responses and behaviour of those being assessed. As teachers teach to the test, so providers will submit to the metrics.

The reasons behind such a defined, even trammelled process are understandable; while God may be inscrutable, the decisions of the TEF panel will be examined under many microscopes. Providers will be able to appeal their TEF outcome only on the basis of a “significant procedural irregularity” and anyone disgruntled by the result will have every incentive to do so. Being clear, in detail, on what a regular procedure is, is essential. Does that leave room for the holistic assessment that the DfE specification promises? No doubt the TEF chair and panel — a great strength of this process — will do their best to provide it.