-
What to know about California’s new ‘Zone Zero’ fire-safety proposal - 21 mins ago
-
Kalshi Promo Code FOXSPORTS Unlocks $10 Bonus for Nuggets vs Timberwolves, Lakers vs Rockets, Any Game 1 Today - 2 hours ago
-
Making the most of your life insurance - 2 hours ago
-
She’s a housekeeper with a side job: cleaning the trashed streets of her own neighborhood - 6 hours ago
-
Summer travel overseas could hit turbulence amid soaring jet fuel prices - 8 hours ago
-
David Duchovny says singing in public was his number one fear - 9 hours ago
-
U.S. delegation visited Cuba last week as Trump heaped pressure on island, official says - 9 hours ago
-
The celebrity influencers warning Russians might ‘snap’ - 10 hours ago
-
Snapchat maker cuts 1,000 jobs in AI-driven restructuring - 11 hours ago
-
Bass appoints Gabrielle Amster as general manager of L.A. Animal Services. - 12 hours ago
ChatGPT Put to the Test Against Students—With Concerning Results
Graduate students at Harvard outperformed a ChatGPT model by more than two letter grades in a study conducted by researchers at the university.
The researchers expected that OpenAI’s chatbot would, “perform similarly to doctoral students on lower cognitive levels,” hypothesizing that ChatGPT would be able to memorize materials sufficiently while struggling with critical thinking problems.
However, ChatGPT was significantly outperformed by the students because the model struggled to “remember” and “apply” tasks, although the researchers were able to improve ChatGPT’s performance with prompts.
“We found a striking deficit in ChatGPT’s ability to interpret scientific graphs and raw data in both short-answer and multiple-choice questions, even when using a version specifically designed for image interpretation,” the researchers wrote.

The researchers conducted the study with students from Harvard’s Principles of Molecular Biology course, a 200-level class that spans the full semester.
Over the course of the study, the students were expected to maintain a minimum grade of 80 percent, which is a passing grade for doctoral students.
The AI’s responses, meanwhile, were produced using GPT-4o, which was released by OpenAI in May 2024.
To make sure that the students didn’t use artificial intelligence themselves, the researchers took out-of-class assignments from 2022, before AI was widely available and adopted.
The Findings
Doctoral students outperformed ChatGPT at every level.
The chatbot did well, but significantly worse than students, on “remember” questions. The researchers noted that the questions are not meant to be challenging, but are intended to encourage students to summarize techniques.
Students outperformed ChatGPT 98 percent to 82 percent.
Meanwhile, students out-performed ChatGPT significantly on long-answer design questions. The students also outperformed ChatGPT on fill-in-the-blank questions.
ChatGPT was particularly poor at “understand”, “apply” and “analyze” questions, where it earned a 66 percent average, compared to 87 percent by the doctoral students.
ChatGPT would have “failed,” and according to the researchers, the poor results were “largely driven by the algorithm’s markedly poor performance on the ‘apply’ level, which refers to identifying, rationalizing and describing experimental controls that students had previously learned through their coursework.”
‘Is this really surprising?’
Commentators on Reddit’s r/science forum were not shocked.
“Anyone who has spent time using [large language models] should know that they are still a long way from being as good as an experienced human,” a critic wrote.
“Even for really focused tasks like coding you need to be very attentive in watching out for hallucinations or bad practices in the code.”
Another contributor asked, “Is this really surprising? The only people claiming that LLMs operate at the ‘PhD level’ are LLM marketers. They constantly fail to solve introductory physics and chemistry questions, so no doubt research level biology is beyond them.”
However, several pundits pointed out that the ChatGPT model used in the experiment was outdated.
“I think it’s critical to point out that when this study was done, LLMs like ChatGPT were nowhere near where they are now,” an individual posted.
“As someone who uses LLMs daily and runs a significant research group, we have found that the difference between now and even just one year ago is an order of magnitude. It can solve many science and engineering problems without significant prompt.”
Newsweek has reached out to the researchers and OpenAI for comment via email.









