back

I Tested 3 Literature Review AIs – Only One Didn’t Lie to Me

AI literature review tools face truth test

In the rapidly evolving landscape of AI research tools, separating genuine innovation from potentially dangerous shortcuts has become increasingly challenging. A recent investigation into three leading AI-powered literature review assistants—Consensus, Elicit, and Scite.AI—reveals concerning disparities in accuracy and reliability. This eye-opening comparison exposes how some AI tools may be misleading researchers and academic professionals with fabricated or misrepresented citations, while others offer promising solutions for legitimate scholarly work.

Key findings from the comparison:

  • Consensus demonstrated high reliability, providing verifiable quotes directly from source papers and accurate citations without fabricating information
  • Elicit exhibited troubling behavior, including "hallucinating" citations that don't exist and making claims not supported by the source material
  • Scite.AI showed mixed results, occasionally referencing papers that weren't actually in their database while providing some useful citation context

The accuracy problem isn't just academic

The most striking revelation from this analysis is the wide gap in factual reliability between these tools. While Consensus consistently delivered verifiable information with links to the original research papers, Elicit often fabricated evidence—a critical flaw that undermines the entire purpose of literature review tools.

This discrepancy matters tremendously in today's information ecosystem. As researchers, students, and professionals increasingly rely on AI tools to navigate overwhelming volumes of academic literature, the risks of propagating false information grow exponentially. Imagine a medical researcher using hallucinated research citations to support clinical recommendations, or policy decisions being made based on non-existent studies. The potential harm extends far beyond academic integrity into real-world consequences.

Beyond the video: The broader implications for research integrity

The findings from this comparison connect to larger challenges facing academia and knowledge work. Even before AI entered the picture, academic publishing faced a replication crisis, with numerous studies showing that significant percentages of published research couldn't be reproduced. AI-powered literature review tools that hallucinate or fabricate citations compound this problem dramatically.

Consider the case of a 2022 Nature survey that found 38% of researchers had trouble reproducing even their own experimental results. When we layer potentially unreliable AI tools onto this existing fragility in research methodology, we risk creating a house of cards where citations point to fabricated claims that reference other fabricated claims

Recent Videos

May 6, 2026

Hermes Agent Master Class

https://www.youtube.com/watch?v=R3YOGfTBcQg Welcome to the Hermes Agent Master Class — an 11-episode series taking you from zero to fully leveraging every feature of Nous Research's open-source agent. In this first episode, we install Hermes from scratch on a brand new machine with no prior skills or memory, walk through full configuration with OpenRouter, tour the most important CLI and slash commands, and run our first real task: a competitor research report on a custom children's book AI business idea. Every future episode will build on this fresh install so you can see the compounding value of the agent in real time....

Apr 29, 2026

Andrej Karpathy – Outsource your thinking, but you can’t outsource your understanding

https://www.youtube.com/watch?v=96jN2OCOfLs Here's what Andrej Karpathy just figured out that everyone else is still dancing around: we're not in an era of "better models." We're in a different era of computing altogether. And the difference between understanding that and not understanding it is the difference between being a vibe coder and being an agentic engineer. Last October, Karpathy had a realization. AI didn't stop being ChatGPT-adjacent. It fundamentally shifted. Agentic coherent workflows started to actually work. And he's spent the last three months living in side projects, VB coding, exploring what's actually possible. What he found is a framework that explains...

Mar 30, 2026

Andrej Karpathy on the Decade of Agents, the Limits of RL, and Why Education Is His Next Mission

A summary of key takeaways from Andrej Karpathy's conversation with Dwarkesh Patel In a wide-ranging conversation with Dwarkesh Patel, Andrej Karpathy — former head of AI at Tesla, founding member of OpenAI, and creator of some of the most popular AI educational content on the internet — shared his views on where AI is headed, what's still broken, and why he's now pouring his energy into education. Here are the key takeaways. "It's the Decade of Agents, Not the Year of Agents" Karpathy's now-famous quote is a direct pushback on industry hype. Early agents like Claude Code and Codex are...