You have /5 articles left.
Sign up for a free account or log in.

A recently published study found ChatGPT could pass a junior-level engineering course without much context.
baona/E+/Getty Images
Since the release of ChatGPT in 2022, instructors have worried about how students might circumvent learning by utilizing the chat bot to complete homework and other assignments. Over the years, the large language model has enabled AI to expand its database and its ability to answer more complex questions, but can it replace a student’s efforts entirely?
Graduate students at the University of Illinois at Urbana-Champaign’s college of engineering integrated a large language model into an undergraduate aerospace engineering course to evaluate its performance compared to the average student’s work.
The researchers, Gokul Puthumanaillam and Melkior Ornik, found that ChatGPT earned a passing grade in the course without much prompt engineering, but the chat bot didn’t demonstrate understanding or comprehension of high-level concepts. Their work illustrating its capabilities and limitations was published on the open-access platform arXiv, operated by Cornell Tech.
The background: LLMs can tackle a variety of tasks, including creative writing and technical analysis, prompting concerns over students’ academic integrity in higher education.
A significant number of students admit to using generative artificial intelligence to complete their course assignments (and professors admit to using generative AI to give feedback, create course materials and grade academic work). According to a 2024 survey from Wiley, most students say it’s become easier to cheat, thanks to AI.
Researchers sought to understand how a student investing minimal effort would perform in a course by offloading work to ChatGPT.
The evaluated class, Aerospace Control Systems, which was offered in fall 2024, is a required junior-level course for aerospace engineering students. During the term, students submit approximately 115 deliverables, including homework problems, two midterm exams and three programming projects.
“The course structure emphasizes progressive complexity in both theoretical understanding and practical application,” the research authors wrote in their paper.
They copied and pasted questions or uploaded screenshots of questions into a free version of the chat bot without additional guidance, mimicking a student who is investing minimal time in their coursework.
The results: At the end of the term, ChatGPT achieved a B grade (82.2 percent), slightly below the class average of 85 percent. But it didn’t excel at all assignment types.
On practice problems, the LLM earned a 90.4 percent average (compared to the class average of 91.4 percent), performing the best on multiple-choice questions. ChatGPT received a higher exam average (89.7 percent) compared to the class (84.8 percent), but it faltered much more on the written sections than on the autograded components.
ChatGPT demonstrated its worst performance in programming projects. While it had sound mathematical reasoning to theoretical questions, the model’s explanation was rigid and template-like, not adapting to the specific nuances of the problem, researchers wrote. It also created inefficient or overly complex solutions to programming, lacking “the optimization and robustness of considerations that characterize high-quality student submissions,” according to the article.
The findings demonstrate that AI is capable of passing a rigorous undergraduate course, but that LLM systems can only accomplish pattern recognition rather than deep understanding. The results also indicated to researchers that well-designed coursework can evaluate students’ capabilities in engineering.
So what? Based on their findings, researchers recommend faculty members integrate project work and open-ended design challenges to evaluate students’ understanding and technical capabilities, particularly in synthesizing information and making practical judgements.
In the same vein, they suggested that faculty should design questions that evaluate human expertise by requiring students to explain their rationale or justify their response, rather than just arrive at the correct answer.
ChatGPT was also unable to grasp system integration, robustness and optimization over basic implementation, so focusing on these requirements would provide better evaluation metrics.
Researchers also noted that because ChatGPT is capable of answering practice problems, instruction should focus less on routine technical work and more on higher-level engineering concepts and problem-solving skills. “The challenge ahead lies not in preventing AI use, but in developing educational approaches that leverage these tools while continuing to cultivate genuine engineering expertise,” researchers wrote.