GPT-4.5 passes the Turing Test with flying colors: What does this milestone mean for the evolution of artificial intelligence?

Last update: 07/04/2025

  • GPT-4.5 managed to convince 73% of participants that it was human in a revised Turing test.
  • The experiment revealed that the success of AI depends largely on instructions and the adoption of a "personality."
  • Other models like LLaMa-3.1 had lower success rates, and without personalization, results plummeted significantly.
  • The study raises questions about the current limits of artificial intelligence and the risks associated with its increasing conversational realism.
GPT-4.5 passes the Turing-0 test

Would you be able to differentiate between a conversation with a real human and a machine? The question, first posed more than seven decades ago by Alan Turing, has gained renewed traction following a recent experiment that tested the most advanced models of artificial intelligence in a challenging context.

Researchers at the University of California, San Diego have conducted a A study that challenges our ability to distinguish between the human and the artificial.l. At the center of the experiment, the GPT-4.5 model OpenAI's not only managed to pass the Turing Test under specific conditions, but He was even identified as human more often than real people themselves.

An experiment under rigorous conditions

AI Interviews During the Turing Test

The study, led by researchers Cameron Jones and Benjamin Bergen, focused on a modified version of the classic Turing Test, which involved more than 300 people, including university students and online platform users. During five-minute sessions, each participant held two simultaneous conversations: one with a human and one with an artificial intelligence system. At the end, They had to judge which of the interlocutors was human.

Exclusive content - Click Here  Xiaomi SU7 Ultra conquers the Nürburgring and arrives in Gran Turismo 7

The result was striking: The GPT-4.5 model was perceived as human in 73% of cases when assigned a defined personality—for example, that of a young person with knowledge of digital culture. This percentage exceeded not only the statistical threshold for passing the test, but also the ability of real humans to appear human in this experiment. This approach reinforces the idea of ​​how la artificial intelligence has evolved in recent decades.

However, when the AI ​​was not given a specific personality and was simply asked to appear human without further prompting, its effectiveness plummeted to 36%. This underscores the Importance of context and personalization in prompts given to language models.

Comparison between models and unexpected results

Various AI models in testing

Along with GPT-4.5, the experiment included other models such as LLaMa-3.1-405B of Meta, which achieved a 56% accuracy rate when also given a fictitious personality. Although this figure is lower than that of GPT-4.5, it's still above chance. In contrast, other versions such as GPT-4 or the veteran chatbot ELIZA, from the 60s, failed to exceed 21% and 23% respectively, highlighting the gap between current and primitive technologies.

Estos resultados show that the success of an AI in a task like the Turing Test depends much more on how it is instructed than on the model itselfThe key is to adopt a believable role, not to consciously simulate human intelligence. If you want to delve deeper into how the computer Over time, you will find interesting information.

Exclusive content - Click Here  What is the Agentic AI Foundation and why does it matter for open AI?

Furthermore, it was found that even with sophisticated instructions, some models were unable to maintain a sufficiently convincing conversation. GPT-4o admitted to being an AI with little challenge., which quickly lost credibility among human interlocutors.

Cheating or Thinking? The Turing Test Controversy

Discussion on cognition in AI

Passing the Turing Test does not imply that an AI understands what it says or is aware of its words. This is one of the major debates among experts. While some celebrate this achievement as a significant advance in simulating human behavior, others consider it This type of test is no longer reliable for measuring the "real intelligence" of an artificial system..

Experts such as François Chollet, a Google engineer, have pointed out that The Turing Test is more of a philosophical experiment than a currently useful measurement.. According to this view, just because an AI deceives us doesn't mean it reasons or has a deep understanding of the world. Rather, it leverages patterns learned from millions of texts to construct plausible responses. To better understand this field, you can check out who the founder of AI.

The worrying thing, then, is not so much what these AIs can do, but what we think they do. The human tendency to anthropomorphize conversational systems, as was the case with ELIZA in the 60s, seems not to have disappeared over time. Today, the phenomenon is magnified by much more sophisticated models.

Applications and risks of an AI that sounds too human

The fact that an AI can pass for a human in a short conversation presents opportunities, but also poses significant risks in terms of security, education and social relations.

  • Identity theft: Convincing AI could be used in scam or social engineering campaigns.
  • Desinformación: Models capable of generating human speech could be effective tools for manipulating or spreading fake news.
  • Automatización laboral: Sectors such as customer service or technical support could be replaced by these conversational AIs, affecting human employment.
  • Education and assessment: Detecting whether a text was written by a person or an AI becomes a complicated task, with consequences in the academic field.
Exclusive content - Click Here  Edge Computing: What it is, how it works, and its real-life applications

Researchers have also warned about how The standardization of these technologies may make their detection more difficult. In the future, as we become more accustomed to interacting with automated systems, we may let our guard down, making it easier for these models to become indistinguishable from a human interlocutor without us even realizing it.

Another recurring concern is the ethics of its implementation. To what extent should an AI pretend to be human without disclosing its artificial nature? Should there be clear limits on how and when it can be used in real-life contexts?

GPT-4.5 has not shown that machines reason like us, but it has made it clear that they can imitate us in ways that make them difficult to distinguish. This milestone marks a turning point, not because of what the machine is, but because of what it makes us question: our own ideas about what it means to be "human" in a digital age where the artificial merges with the real.