- AiNews.com
- Posts
- Epoch AI Faces Criticism Over OpenAI Funding Transparency for FrontierMath
Epoch AI Faces Criticism Over OpenAI Funding Transparency for FrontierMath
Image Source: ChatGPT-4o
Epoch AI Faces Criticism Over OpenAI Funding Transparency for FrontierMath
Epoch AI, a nonprofit focused on developing benchmarks for artificial intelligence, has faced backlash for not disclosing OpenAI’s financial involvement in its work until December 20, 2024. The organization revealed that OpenAI had provided funding for the creation of FrontierMath, a benchmark designed to test an AI’s mathematical abilities with expert-level problems. FrontierMath was one of the benchmarks OpenAI used to demo its upcoming flagship AI, o3.
Transparency Concerns Spark Controversy
Critics, including contributors to the benchmark, expressed frustration over Epoch AI’s handling of OpenAI’s involvement. A contractor for Epoch AI, known as “Meemi” on the forum LessWrong, criticized the organization’s lack of transparency, stating:
“In my view, Epoch AI should have disclosed OpenAI funding, and contractors should have transparent information about the potential of their work being used for capabilities, when choosing whether to work on a benchmark.”
Some contributors were reportedly unaware that OpenAI would have exclusive access to the benchmark, raising concerns about FrontierMath’s credibility as an objective evaluation tool. On X, Stanford mathematics PhD student Carina Hong alleged that six mathematicians involved in the project told her they had not been informed about OpenAI’s privileged access and would have reconsidered their participation if they had known.
Epoch AI Responds to Criticism
In response, Tamay Besiroglu, associate director and co-founder of Epoch AI, admitted to shortcomings in transparency and acknowledged that the organization “made a mistake.” He explained that contractual restrictions prevented Epoch AI from disclosing the partnership earlier but conceded that contributors deserved to be informed sooner.
“Even though we were contractually limited in what we could say, we should have made transparency with our contributors a non-negotiable part of our agreement with OpenAI,” Besiroglu wrote.
Besiroglu also emphasized that OpenAI has verbally agreed not to train its AI models using FrontierMath’s problem set, which would compromise the benchmark’s integrity. He noted that Epoch AI maintains a separate “holdout set” to ensure independent verification of results.
Independent Verification Pending
However, questions remain about the validity of OpenAI’s results. Ellot Glazer, Epoch AI’s lead mathematician, stated on Reddit that the organization has not yet independently verified OpenAI’s o3 scores on FrontierMath.
“My personal opinion is that [OpenAI’s] score is legit, and that they have no incentive to lie about internal benchmarking performances,” Glazer said. “However, we can’t vouch for them until our independent evaluation is complete.”
Broader Implications for AI Benchmarking
The controversy highlights the challenges of developing empirical benchmarks for AI while avoiding conflicts of interest. Benchmarking organizations often require significant resources, leading to partnerships with the very companies whose technologies they evaluate. This dynamic creates potential credibility issues, as seen in Epoch AI’s case.
Besiroglu expressed hope that FrontierMath’s integrity remains intact, emphasizing that OpenAI has supported the use of safeguards like the holdout set. Still, the incident raises important questions about transparency and objectivity in the rapidly evolving field of AI evaluation.
What This Means
Epoch AI’s misstep underscores the importance of transparency in AI benchmarking. As organizations collaborate with major industry players like OpenAI, maintaining trust and objectivity will be critical to ensuring that benchmarks remain credible and widely accepted.
Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.