A child interacts with a service robot. (Andy Kelly/Unsplash)
Artificial intelligence systems like ChatGPT have been shown to outperform humans in some basic tasks, including English comprehension, image categorization, and visual reasoning.
This was the conclusion Stanford University’s Institute for Human-Centered Artificial Intelligence (HAI) drew in its 2024 AI Index Report. While stating that “benchmarks and tests [are] quickly becoming obsolete due to rapid advancements,” the HAI also suggested the need for a new method of evaluating AI’s performance on complex tasks such as understanding and reasoning.
Nestor Maslej, the editor-in-chief of the report, stated, “A decade ago, benchmarks would serve the community for 5–10 years, but now they often become irrelevant in just a few years.”
The report did note that AI “trails behind on more complex tasks like competition-level mathematics, visual commonsense reasoning and planning.”
Codes shared among platforms, from 800 to 1.8 million
HAI’s AI Index Report, which has been published annually since 2017, is compiled by academics and industry experts who evaluate the state of the art, costs, ethics, and other aspects of the field. AI was also utilized in the writing and editing of the report itself, which runs over 400 pages.
The report stated that advances in AI began in the early 2010s, based on neural networks and machine learning algorithms, and have since proliferated rapidly. The increase in the number of AI-related projects on the code-sharing platform GitHub was used as an example, with the number growing from around 800 in 2011 to 1.8 million in 2023. The number of academic journal articles on AI also nearly tripled during this period, the report added.
Significantly higher correct answer rates than Ph.D.-level researchers
According to the report, most of the research at the forefront of AI is coming from industry. In 2023, industry developed 51 notable machine learning systems, compared to a mere 15 from academia. Raymond Mooney, the director of the AI Lab at the University of Texas at Austin, told scientific journal Nature, “Academic work is shifting to analyzing the models coming out of companies — doing a deeper dive into their weaknesses.”
Last year, for example, researchers at New York University developed a large language model (LLM) performance evaluation tool called GPQA1. This graduate-level benchmark, which consists of more than 400 multiple-choice questions, is so challenging that Ph.D.-level researchers answered only 65% of questions that were related to their field correctly. Only 34% of the same scholars were able to answer questions outside their field of study correctly, despite having access to the Internet.
As of last year, AI’s accuracy rate was in the 30%-40% range, but this year, Claude 3, the latest chatbot from San Francisco-based AI company Anthropic, scored around 60%. “The rate of progress is pretty shocking to a lot of people, me included,” New York University researcher David Rein told Nature.
Ethical concerns related to the rising development costs and increased energy usage
The rapid evolution of AI is directly proportional to the rising costs required for its development and maintenance.
Open AI’s LLM ChatGPT 4, for instance, which was released in March 2023, reportedly cost around US$78 million to train. Google’s chatbot Gemini Ultra, released nine months later, in December, cost US$191 million to develop. Nature predicted that “within years, large AI systems are likely to need as much energy as entire nations.”
“Moreover,” Nature noted, “AI systems need enormous amounts of fresh water to cool their processors and generate electricity.”
Currently, one of the methods for upgrading the capacity of AI systems is simply to make them larger. Therefore, increased performance leads to increased costs and energy.
Additionally, upgrading the effectiveness of any AI system requires massive amounts of data, information and photo images. The report pointed to the lack of such resources in making AI more accurate and effective than it already is — as ultimately, the pool of data, information and photo images is created and uploaded by humans. Epoch, a non-profit research institute that investigates key trends and issues related to the trajectory and governance of AI, declared in a report last year that “exhausting the stock of data is unavoidable.”
“Our projections predict that we will have exhausted the stock of low-quality language data by 2030 to 2050, high-quality language data before 2026, and vision data by 2030 to 2060,” Epoch stated.
According to Nature, Epoch has since adjusted the data for high-quality language data to 2028.
Ethical concerns about the way AI is designed and used are also growing. The report noted that in 2016, the number of regulations in the US that directly referenced AI technology was only one. Last year, however, that figure jumped to 25.
“After 2022, there’s a massive spike in the number of AI-related bills that have been proposed,” said Maslej.
Concerns are undoubtedly coupled with hopes and expectations. According to the report, a survey conducted on 22,816 people (aged 16-74) in 31 countries indicated that 52% of respondents felt anxiety over AI. This was a sharp increase from the 39% that reported anxiety the previous year. The proportion of respondents who felt more optimistic than pessimistic about AI grew slightly from 52% to 54% in the same period. One of every three respondents (66%) predicted that, for better or for worse, AI is going to drastically change their lives within three to five years.
By Kwak No-pil, senior staff writer
Please direct questions or comments to [[email protected]]