AI Is Great But Stupid [Part 2]: Where AI Goes Wrong

Allyson Edwards (Senior Consultant) • June 16, 2026

Colombia Journalism Review study found that generative search tools were only able to correctly identify the details about specific article citations in 60% of cases.

Short on time? You can listen to this audio summary in under 3 minutes.

Length: 2 min 10 sec

Read Full Transcript
Hi, I'm Allyson Edwards from AdviseUp Consulting, and this is The Bottom Line.

Right now, researchers are finding that up to 81% of AI responses regarding news and facts contain serious errors ranging from poor sourcing to entirely fabricated stories.

Here is the reality on the ground. The key misconception about AI is not that it is intelligent. It is the dangerous belief that it is reliably correct.

In Part One of this series, we established that generative AI is essentially advanced predictive text. It does not actually think or verify the truth, and because of that, it creates four predictable points of failure that every compliance team needs to watch out for.

First, hallucinations with confidence. AI doesn't need to be malicious to create massive organizational risk; it just needs to be convincing. It frequently produces completely fabricated information with the exact same authority it uses for facts.

Second, hidden bias. Bias in AI is rarely intentional, but our systems are only as unbiased as we are. By reflecting historical training data, AI can instantly replicate structural biases in hiring, medical treatments, or evaluations. This creates uneven outcomes that are incredibly difficult to detect without a structured review.

Third, the risk of incomplete context. A technically correct answer can still be practically wrong for your specific organization. If you deploy a model straight out of the box without providing it with regulatory nuance or internal boundaries it needs, the output is essentially unusable for real decision-making.

And fourth, the trap of automation bias. We often think human review is the ultimate control. But if your team is reviewing massive amounts of highly confident AI output under tight deadlines, that human review quickly becomes performative. We start trusting the machine instead of validating the work.

AI introduces the exact same garbage in, garbage out risks we have always faced in audit just at a massive scale and blinding speed.

The organizations that will benefit the most from AI are the ones building thoughtful controls around it.

Stay tuned for our next installment, where we’ll explore exactly what governance looks like in practice.

Thanks for joining The Bottom Line.

In the first article of this series, we explored what artificial intelligence actually is, and more importantly, what it isn’t. At its core, AI doesn’t “think” or “understand” in a human sense. It predicts patterns based on the data that it has seen.

That distinction matters because understanding both AI's strengths and its limitations is what allows us to use it safely and effectively. The goal is to recognize where AI excels and where human judgment still matters.

AI still has a number of well-documented failure points, some more obvious than others. Below, we'll explore a few of the most common issues organizations are encountering today, along with practical ways to reduce their impact.

When AI Hallucinates with Confidence

One of the most well-known failure points in generative AI is what’s commonly known as “hallucinations”: when a model produces information that sounds correct, but is factually wrong or entirely fabricated.

Hallucinations aren’t rare edge cases. They’re a structural byproduct of how large language models (LLMs) act as extremely advanced predictive text rather than actually thinking and verifying the truth.

This can lead to outputs that feel authoritative, when they’re actually not grounded in reality. Research analyzing real-world user experiences with AI systems shows that hallucinations frequently include factual inaccuracies, irrelevant responses, and fabricated information presented with the confidence of reality.(1)

In practice, this has led to real world consequences in multiple arenas. For lawyers, there have been multiple cases where AI has fabricated legal citations, leading to briefs with fake case law and angry judges (when caught).(2) Similarly, Colombia Journalism Review study found that generative search tools were only able to correctly identify the details about specific article citations in 60% of cases.(3) For those who use AI to understand the news, researchers found that 45% of AI responses about news content included at least one significant issue, while a whopping 81% included some problem, ranging from sourcing issues, to outdated content, to straight up fabricated stories.(4) In my own experience with AI, I’ve had it cite links to official websites, such as the SCOTUS Blog or CNN, and found them to be nonexistent.

TIP: One practical habit is to treat AI outputs the same way you would treat information from an unfamiliar source: verify its citations and confirm key facts independently.

Bias Is Not Always Obvious

Another major risk area in AI systems is bias. Importantly, bias in AI is rarely the result of explicit intent. Instead, it is often embedded in the training data and reinforced through patterns in historical information; our AI systems are only as unbiased as we are.

Recent studies have shown that LLMs can produce different outcomes based on subtle demographic cues, even when qualifications are identical. For example, research published by AAAI’s AI ethics proceedings found that AI systems can exhibit systemic disparities in decision-making contexts, specifically in resume evaluation scenarios.(5) In a separate analysis, AI systems evaluating identical resumes were shown to respond differently based on gender indicators, raising concerns about structural bias in hiring-related applications.(6)

The issues don’t end at resumes and hiring. In one study, researchers found that including race markers led to LLMs suggesting inferior treatments for patients either explicitly or implicitly described as African-American.(7) Another study from the UK showed that, when LLMs were used to write case notes for patients in long-term care, at least one model was consistently placing a higher emphasis on male patient’s needs and downplaying those of female patients.(8)

In all of these cases, AI is simply reflecting the biases that are already present in its training data, rather than maliciously adding any new ones, but the impact is the same: uneven outcomes that are difficult to detect without structured review.

TIP: Organizations can reduce some of this risk by periodically reviewing AI outputs for consistency across scenarios, demographics, or use cases. Use identical prompts to confirm that they are truly producing neutral outcomes.

The Hidden Risk of Incomplete Context

Another less discussed but equally important issue is what can be broadly described as lack of training, not in the sense of model development, but in how AI is used in practice.

Even highly capable models are often deployed by organizations straight out of the box, without being provided sufficient context about organizational policies, regulatory requirements, industry-specific nuance, or decision-making boundaries. As a result, AI systems default to general knowledge that may not align with a specific business’s environment.

This is particularly relevant in audit and compliance settings, where context is everything. A technically correct answer can still be operationally incorrect if it does not reflect the right regulatory framework or control environment.

This often leads to situations where AI is technically right, but practically wrong, leading to the challenge of determining what information isn’t clearly incorrect but is still unusable for decision-making.

This is where AI becomes particularly dangerous for audit and compliance professionals, because it isn’t technically incorrect, it’s just failing to provide the amount of nuance required to be useful. In doing so, this might leave the organization at risk if there isn’t enough effort to validate AI outputs.

EXAMPLES

"Technically Right, but Practically Wrong"

A control description may be accurate in general terms, but not align with actual system implementations.
A regulatory summary may be correct on the national level, but omit jurisdictional nuances.
A risk assessment might overlook operational constraints specific to the organization.

Overreliance and Automation Bias

Even when users are aware of AI limitations, there is a second layer of risk: human behavior.

“Automation bias” is the tendency of people to trust system-generated outputs, especially when they are well-formatted and confident.

One of the controls I’ve seen suggested most to prevent AI-generated issues is the idea of human-in-the-loop, where the organization makes sure that person is approving all AI decisions. However, research increasingly shows that human reviewers are not always effective at correcting AI bias. In some cases, humans mirror or reinforce AI-generated patterns rather than challenging them, even when the AI is demonstrably flawed.(9)

This creates a compounding effect: AI introduces bias or error, humans defer to AI judgement, and validation becomes weaker over time.

TIP: Be careful not to confuse “human review” with meaningful review. If employees are expected to review large amounts of AI-generated content under tight timelines, oversight quickly becomes performative, not effective.

“Garbage In, Garbage Out” Still Applies

Despite how advanced AI systems have become, they are still fundamentally dependent on their training data and inputs. If that underlying data is incomplete, skewed, or outdated, the outputs will reflect those limitations.

This principle is not new. It's the same concept audit professionals have long understood in data testing, sampling, and control evaluation: poor inputs produce unreliable outputs. The difference is scale and speed. AI systems replicate those imperfections instantly and at volume.

TIP: Even small process changes can help reduce this risk. Limit what data sources your AI models can rely on. Define approved use cases. Require source validation for high-risk cases. And finally, document where AI is actually being used so that it can be reviewed later.

The Most Important Takeaway

The key misconception about AI is not that it is intelligent. It is that it is reliably correct.
AI doesn’t need to be malicious to create risk; it just needs to be convincing.

Once professionals understand this, the question of validation comes to the forefront, and that shift is where governance begins.

Looking Ahead

AI’s limitations can sound alarming at first, but they’re also predictable, and predictable risks are governable risks.
The organizations that will benefit most from AI are not the ones ignoring its weaknesses; they are the ones building thoughtful controls around them.
In the next article, we’ll explore what that governance can look like in practice.

Build Better AI Controls

Predictable risks require thoughtful oversight. Contact AdviseUp to discuss strategies for implementing effective AI validation frameworks and organizational controls tailored to your environment.

Lets Talk

Resources

https://www.nature.com/articles/s41598-025-15416-8
https://www.cnn.com/2023/05/27/business/chat-gpt-avianca-mata-lawyers/index.html
https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php
https://www.reuters.com/business/media-telecom/ai-assistants-make-widespread-errors-about-news-new-research-shows-2025-10-21/?utm_source=chatgpt.com
https://ojs.aaai.org/index.php/AIES/article/view/31748
https://www.nature.com/articles/s41586-025-09581-z
https://www.nature.com/articles/s41746-025-01746-4
https://link.springer.com/article/10.1186/s12911-025-03118-0
https://www.washingtonpost.com/business/2025/11/25/biased-ai-hiring-research-university-of-washington-study/

"AI is Great but Stupid" Series

Understanding what AI is lays the foundation for everything that follows: how it fails, how it should be governed, and how it can be used This series is designed to move professionals from AI-hype to AI-competence.

Part 1: Understanding What AI Actually Is

Part 2: Where AI Goes Wrong (reading now)
Part 3: Governing AI (upcoming)

Part 4: Practical Tips That Actually Help (upcoming)

< Older Post

Newer Post >

AI robot performing on stage about to be pulled off with a stage hook.

AI Is Great But Stupid [Part 2]: Where AI Goes Wrong

Read Full Transcript