OpenMined Featured in Communications of the ACM on the Future of Synthetic Data and AI Training

3 months ago
news

In a recent article published by the Communications of the ACM — the flagship publication of the Association for Computing Machinery — OpenMined’s Executive Director, Andrew Trask, was featured as a key voice in the growing conversation around synthetic data, AI training, and the critical importance of controlling how data shapes model behavior.

The Growing Role of Synthetic Data in AI

The article, titled “AI Goes Synthetic to Get Real,” explores how synthetic data — data created by humans or algorithms to simulate real-world information — is rapidly becoming a cornerstone of AI development. With high-quality human-generated data increasingly scarce, AI developers are turning to synthetic datasets to train large language models across fields including finance, medicine, criminal justice, and engineering.

While synthetic data offers significant benefits, such as enabling organizations to build more equitable and resilient AI models without navigating privacy constraints, the article highlights a crucial concern: the risk of data manipulation and degraded model quality. As synthetic and real data increasingly blend together, subtle errors can compound into a process researchers describe as “model collapse.”

Who Controls the Data Controls the Model

The article presents Andrew’s perspective on the value of AI training data. As Trask explains in the piece:

“Whoever controls an AI’s training data gets to decide how that model will behave.”

This insight underscores a central challenge in AI development, that without proper governance and transparency mechanisms, training data can be manipulated, whether inadvertently or intentionally, to produce deceptive or biased results. Andrew’s remarks highlight the need for technical infrastructure that gives stakeholders meaningful control over how data influences AI systems.

Attribution-Based Control: A Path Forward

The article also spotlights OpenMined’s work on Attribution-Based Control as a promising remedy for these challenges. As described in the piece, Attribution-Based Control uses cryptographic and deep learning techniques to allow AI users to choose which sources influence each prediction or model, while also enabling data owners to decide how their data will be used.

A secondary benefit of this approach is improved management of hallucinations, which remains a persistent challenge in large language models. As Andrew notes, if you can choose the sources that inform a model’s outputs, you can also determine whether those sources are appropriate for the task at hand.

Why This Matters

This recognition in Communications of the ACM places OpenMined’s work at the center of one of AI’s most pressing challenges: ensuring that the data powering AI systems is governed transparently and responsibly. As synthetic data grows from roughly 60% of all AI training data in 2024 to potentially surpassing real data by 2030, the need for tools like Attribution-Based Control will only intensify.

OpenMined remains committed to building the technical infrastructure that ensures data governance serves the public interest so that the future of AI is shaped by accountability, not opacity.

Read the full article on the Communications of the ACM website.

Interested? 👀

Sign up to recieve an email when new content like this is posted.

Want to write for OpenMined or help update a post?

Let us know!

By sending, you agree to our privacy policy
and join the OpenMined Newsletter.

Author: OpenMined Team

Category:

news

Topics:

Attribution-Based Control

Continued Reading...

View all posts

March 2, 2026
news
policy

Reflections on the 2026 India AI Impact Summit

February 5, 2026
news