Comply with ZDNET: Add us as a most popular supply on Google.
ZDNET’s key takeaways
- AI is getting higher at small duties, however nonetheless lags on long-form evaluation.
- The implications of extended interactions with AI could be disastrous.
- Use AI like a software for well-defined duties, and keep away from falling down a rabbit gap.
Higher to do some effectively than an amazing deal badly. So mentioned the good thinker Socrates, and his recommendation can apply to your use of synthetic intelligence, together with chatbots resembling OpenAI’s ChatGPT, or Perplexity, in addition to the agentic AI applications more and more being examined in enterprise.
AI analysis more and more reveals that the most secure and best course with AI is to make use of it for small, restricted duties, the place outcomes could be effectively outlined, and outcomes could be verified, somewhat than pursuing intensive interactions with the expertise over hours, days, and weeks.
Additionally: Asking AI for medical recommendation? There is a proper and unsuitable manner, one physician explains
Prolonged interactions with chatbots resembling ChatGPT and Perplexity can result in misinformation on the very least, and in some circumstances, delusion and demise. The expertise just isn’t but able to tackle essentially the most subtle sorts of calls for of reasoning, logic, frequent sense, and deep evaluation — areas the place the human thoughts reigns supreme.
(Disclosure: Ziff Davis, ZDNET’s mum or dad firm, filed an April 2025 lawsuit in opposition to OpenAI, alleging it infringed Ziff Davis copyrights in coaching and working its AI programs.)
We’re not but at AGI (Synthetic Normal Intelligence), the supposed human-level capabilities of AI, so that you’d do effectively to maintain the expertise’s limitations in thoughts when utilizing it.
Put merely, use AI as a software somewhat than letting your self be sucked down a rabbit gap and get misplaced in infinite rounds of AI dialog.
What AI does effectively – and never so effectively
AI tends to do effectively at easy duties, however poorly at complicated and deep kinds of evaluation.
The most recent examples of which are the primary takeaways from this week’s launch of the Annual AI Index 2026 from Stanford College’s Human-Centered AI group of students.
On the one hand, editor-in-chief Sha Sajadieh and her collaborators clarify that agentic AI is more and more profitable at duties resembling wanting up data on the Net. The truth is, brokers are near human-level on routine on-line processes.
Additionally: 10 methods AI can inflict unprecedented injury
Throughout three benchmark checks — GAIA, OSWorld, and WebArena — Sajadieh and group discovered that brokers are approaching human-level efficiency on multi-step duties resembling opening a database, making use of a coverage rule, after which updating a buyer document. On the GAIA take a look at, brokers have an accuracy charge of 74.5%, nonetheless under the 92% of human efficiency however manner up from the 20% of a yr in the past.
On the OSWorld take a look at, “Computer science students solve about 72% of these tasks with a median time of roughly two minutes,” whereas Anthropic’s Claude Opus 4.5, up till lately its strongest mannequin, reaches 66.3%. Which means “the best model [is] within 6 percentage points of human performance.”
WebArena reveals AI fashions “now within 4 percentage points of the human baseline of 78.2%” accuracy.
Agentic AI is getting higher at on-line duties resembling Net looking however nonetheless falls wanting human-level accuracy.
Stanford
Whereas Claude Opus and different LLMs aren’t good, they present speedy progress in not less than reaching benchmark ranges that come nearer to human-level efficiency.
That is smart, as manipulating an internet browser or wanting one thing up in a database needs to be among the many simpler eventualities during which the natural-language immediate can plug into APIs and exterior sources. In different phrases, AI ought to have a lot of the tools it requires to interface with functions in restricted methods and perform duties.
Additionally: 40 million folks globally are utilizing ChatGPT for healthcare – however is it secure?
Notice that even with well-defined, restricted duties, it helps to examine what you are getting from a bot, as the common rating on these benchmarks nonetheless falls wanting human capability — and that is in benchmark checks, a type of simulated efficiency. In real-world settings, your outcomes might fluctuate, and to not the upside.
AI cannot deal with the arduous stuff
Once they dug into deeper sorts of labor, the Stanford students discovered a lot much less encouraging outcomes.
Analysis has discovered, they famous, that “models handle simple lookups well but struggle when asked to find multiple pieces of matching information or to apply conditions across a very long document — tasks that would be straightforward for a human scanning the same text.”
That discovering aligns with my very own anecdotal expertise utilizing ChatGPT to draft a marketing strategy. Solutions have been fantastic within the first few rounds of prompting, however then degraded because the mannequin snuck in info and figures I had not specified, or which may have been related earlier within the course of however had no enterprise being included within the current context.
The lesson, I concluded, was that the longer your ChatGPT classes, the extra errors sneak in. It makes the expertise infuriating.
Additionally: I constructed a marketing strategy with ChatGPT and it become a cautionary story
The outcomes of unchecked bot elaboration can get extra severe. An article final week in Nature journal describes how scientist Almira Osmanovic Thunström, a medical researcher on the College of Gothenburg, and her group invented a illness, “bixonimania,” which they described as a watch situation ensuing from extreme publicity to blue mild from laptop screens.
They wrote formal analysis papers on the made-up situation, then printed them on-line. The papers acquired picked up in bot-based searches. A lot of the giant language fashions, together with Google’s Gemini, started to faithfully relate the situation bixonimania in chats, pointing to the faked analysis papers of Thunström and group.
The truth that bots will confidently assert the existence of the faux bixonimania speaks to a scarcity of oversight of the expertise’s entry to data. With out correct checking, you possibly can’t know if a mannequin will confirm what it is spitting out. As one scholar who wasn’t concerned within the analysis famous, “We should evaluate [the AI model] and have a pipeline for continuous evaluation.”
Penalties could be severe
A extra severe variant, the place a person appears to have gone down a rabbit gap of confiding in a bot, is described in a latest New York Instances article by Teddy Rosenbluth in regards to the case of an older man grappling with white blood cell most cancers.
Slightly than following his oncologist’s recommendation, the affected person, Joe Riley, relied on intensive interplay with chatbots, particularly Perplexity, to refute the physician’s analysis. He insisted his AI analysis revealed he had what’s known as Richter’s Transformation, a complication of most cancers that might be made extra opposed by the really useful therapy.
Additionally: Use Google AI Overview for well being recommendation? It is ‘actually harmful,’ investigation finds
Regardless of emails from consultants on Richter’s questioning the fabric within the Perplexity summaries of the situation, Riley caught along with his perception in his AI-generated reviews and resisted his physician’s and his household’s pleas. He missed the window for correct therapy, and by the point he relented and agreed to attempt therapy, it was too late.
Rosenbluth makes the connection between the story of Joe Riley and the case of Adam Raine final yr, who dedicated suicide after intensive chats with ChatGPT about his inclination to finish his life.
Riley’s son, Ben Riley, wrote his personal account of his father’s journey with AI. Whereas the youthful Riley does not blame the expertise per se, he factors out that getting immersed in chats and shedding perspective can have penalties.
“The fact remains that AI does exist in our world,” writes Riley, “and just as it can serve as fuel to those suffering manic psychosis, so too may it affirm or amplify our mistaken understanding of what’s happening to us physically and medically.”
Staying sane with unreliable AI
The inclination to have interaction in long-form discussions about despair, suicide, and severe well being situations is comprehensible. Individuals have been habituated to long-form engagements of hours at a time on social media. Some persons are lonely, and a natural-language dialog with a bot is healthier than no dialog in any respect.
Additionally: Your chatbot is taking part in a personality – why Anthropic says that is harmful
Bots tend towards sycophancy, analysis has proven, which may make hours of engagement with a bot extra fulfilling than the bizarre give and take with an individual.
And the businesses that make the expertise, whereas warning customers to confirm bot output, have tended to position much less emphasis on unfavourable reviews from people resembling Riley and Raine.
4 guidelines for avoiding the rabbit gap
A number of guidelines will help mitigate the worst results of placing an excessive amount of emphasis on the expertise.
- Outline what you’ll a chatbot for. Is there a well-defined job that has a restricted scope and for which the predictions of the bot could be fact-checked with different sources?
- Have a wholesome skepticism. It is well-known that chatbots are liable to confabulation, confidently asserting falsehoods. It does not matter what number of chatbots you utilize to attempt to stability the great and the unhealthy; all of them needs to be handled with a wholesome skepticism as having solely a part of the reality, if any.
- Regard chatbots not as buddies or confidants. They’re digital instruments, like Phrase or Excel. You are not making an attempt to have a relationship with a bot however somewhat to finish a job.
- Use confirmed digital overload abilities. Take stretch breaks. Step away from the pc for a non-digital human interplay, resembling taking part in card video games with a pal or going for a stroll.
Additionally: Cease saying AI hallucinates – it does not. And the mischaracterization is harmful
Falling down the rabbit gap occurs partly because of merely being parked in entrance of a display with no downtime.



