Education data reuse for AI development

news / August 28, 2024

Today the government announced that pupil assessment data will be used in a ‘store’ to build AI products.

We have written to both departments at DSIT and the Department for Education with questions regarding the data sourcing, as well as where this fits into their strategy for the education sector.

While resources for SEND provision and teacher retention are priorities for the sector, we hope the government is not fooled into thinking fifty-year old tech offers short cuts for serious thinking on the immediate needs and future long term holistic education strategy — because this is one of the first spending announcements in education and it’s not going to schools.

It was unclear from the announcement whether the total spend is cumulative or if the three and four million mentioned are separate or were just not adding up.

“The project, backed by £4 million of government investment, will pool government documents including curriculum guidance, lesson plans and anonymised pupil assessments which will then be used by AI companies to train their tools so they generate accurate, high-quality content, like tailored, creative lesson plans and workbooks, that can be reliably used in schools.”

“The content store, backed by £3 million, is a first-of-its kind approach to processing government data for AI, as the UK government forges ahead with using technology to transform public services and improve lives people across the country.”

Is it three, four, or seven in total? (*It is now clearer the £4m total is split into three for the data store and £1m for other companies).

We at Defend Digital Me are yet to see any products that claim to use AI that offer independent evidence of improved learning outcomes. In recent years we’ve seen a lot of companies with, shall we say generously, “over-hyped marketing” selling their products to schools — for example a well-known seating plan company, that claimed to be selling an AI product but it was not, as the ICO discovered in its assessment for misuse of pupil data to train an AI product without parental permission.

Furthermore, the announcement use of the terms, “personalised learning” and “cost cutting,” are contested in education and across the public sector and concerning that education is being used as a guinea pig for the “first of its kind” plans.

“Using technology to transform public services and improve lives people across the country,” are meaningless buzzwords — and a whole range of public sector services and people have been harmed by tech projects, whether directly or by omission of effort on other things, done badly by governments including in the UK.

Generative AI models are already losing their appeal to investors as accuracy and reliability issues are becoming ever more apparent, and they cannot legally use children’s data or be used by children without parental permission or easily at all in the public sector**. It is unsurprising therefore if companies are seeking to embed their products urgently into new markets and to exploit new data sources, so far protected from their re-use by not being online.

But it is unclear if this future plan involves “government data” as the announcement claims (ie about government and admin processes) or is it our public administrative data (ie about people and our interactions with public services and therefore personal data)? The term, “Assessment data” suggests it is pupil data. It’s also interesting the post says the spend is of “government investment” when of course it is money coming from the taxpayer.

Any personal data processing (including separating data to make parts of it non-personal or anonymous) needs a lawful basis that is necessary and proportionate to the task and cannot just be invented retrospectively to fit how the data users have chosen to do things, especially for product development, that does not enjoy the same exemptions as research would, for example.

Data controllers must inform people at the time when their data are *collected* how it will be used and why, and purpose limitation protects people from retrospective changes to that, which aren’t what people either agreed to or at least were told to reasonably expect at the time of collection.

Will the government inform past and present pupils and their families —whose data may be processed to create this store — of their rights, and about who will have access to what data about them to build it, or use it later?

Families don’t send their children to school to be turned into data products, and of those 1,004 parents Survation polled in 2018, 69% didn’t know a national pupil database even existed at all, never mind that they are in it or how to exercise their connected legal rights.

We will want to see more information about what is planned here for this “data store” and hope government engages soon and widely with the sector and civil society.

Jen Persson said, “I’m keen to see technology used well, but so far the shiny promises and hype of AI companies seem to be steering UK government tech policy towards their own interests in new product development more than the reality of what is needed by the education sector — a sector that when it comes to technology needs initial teacher training and CPD, and a rights-based data-management infrastructure put in place first to fix what’s not working now, before any new tech initiatives.”

“We are further disappointed with the current government’s lack of engagement to date with expert civil society in data and technology — surprisingly poorer than their predecessors. But it will be for the schools sector to decide how it responds to protect the rights of the pupils and families who entrust their data to them for the purposes of their children’s education, and not for profit.”

“More information must follow soon to ensure public and professional trust in such new initiatives is not harmed, and it becomes toxic to trust in the new government overall.”

Addendum and Other references:

Use Cases for Generative AI in Education Building a proof of concept for Generative AI feedback and resource
generation in education contexts: Technical report (August 2024) (.pdf)
The announcement (archived) From the Department for Science, Innovation and Technology, Department for Education, The Rt Hon Peter Kyle MP and Stephen Morgan MP Published 28th August 2024
The Faculty “agreement notice” (p.121-122) and the DfE “AI privacy notice”
Accompanying public atttudes report, commissioned by the DfE
Teacher assessment exemplification: KS2 English writing
The ICO 2021 guidance on anonymisation.
From 2012: examples of “government data”
An example of a public admin dataset the National Pupil Database which is not “government data”

Notes added Sept.1st and 4th to clarify

*It appears the total planned spend is £4 million, one of which is to be awarded in ways still to be decided to, “those who bring forward the best ideas to put the data into practice to reduce teacher workload.”

**OpenAI for example “Terms and Conditions” in commercial everyday use require 13-18 year olds to have parental permission before use, and under 13s are not to use it at all — this means it’s not generally usable by children in the public sector (individual use cases need to be examined but generally either as a stand alone or built into other products) as “consent” is not able to withdrawn again, or if it is compulsory in the classroom or for homework, or if its freely given nature is affected by the power imbalance of the authorities asking for pupils 13+ to use it and it would be detrimental to decline.

The separate publication of the report, Use Cases for Generative AI in Education: Building a proof of concept for Generative AI feedback and resource generation in education contexts: Technical report (August 2024) (.pdf) provides more context and opens up further questions on permissions, IP and lawfulness of the proof of concept build.