“Question 4(e)” – Into AI, asking the right questions: The Failures That Survive Good Practice

This is the fifth piece sitting under Question 4 of this series. The parent argues that making AI useful is a conversation, not a prompt. The earlier pieces, 4(a) on configuration, 4(b) on cognitive bandwidth, 4(c) on tool selection, and 4(d) on slop, describe the practices that make AI work and what failure looks like when those practices are absent. This one is about the failures that occur even when the practices are being followed.

A short note before we start. This piece is around 3,400 words. It documents six distinct failure modes that survive good AI practice. If you only have time for the headline, it is this: configuration helps, conversation helps, verification helps, multi-AI orchestration helps. None of them eliminate certain predictable failure modes. The article is the practical detail of what survives, why it survives, and how to spot it.

*The output looks finished. Some of the failures are below the waterline.*

The premise

The sibling pieces in this set have largely been about practices that improve AI output. Q4(d) closed with the line “slop is what AI produces when nobody is watching.” Q4(e) extends that argument in a specific direction. Even when somebody is watching, the AI produces certain failures predictably. The watching catches some of them. Some survive the watching. The user’s job is to know which is which, and to apply specific diagnostic skills for the ones that survive.

To understand why these failures happen at all, it helps to be honest about what AI actually is. The systems labelled “AI” in current public conversation are sophisticated probability engines. They produce output by predicting, token by token, what is most likely to come next given everything in their training data and everything in the current conversation. The training data is hundreds of millions of documents, conversations, code samples, and texts of every kind, accumulated and weighted into a statistical model that approximates the patterns of human language and reasoning.

This is impressive engineering. It is also not intelligence in any deep sense. The system does not understand what it is producing. It does not know whether a citation it generates is real. It does not know whether the code it writes will compile, let alone work. It produces the most statistically probable next output, and a great deal of the time that output happens to be useful, because human writing tends to follow patterns and the system has learned the patterns well.

I have been calling these systems Simulated Intelligence, or SI, rather than AI. The term is more accurate to what they are. They simulate the appearance of intelligence convincingly. They are not, in fact, intelligent. The distinction matters because every failure mode this article documents traces back to the gap between simulating intelligence and actually being intelligent. The failures are not accidents. They are the predictable consequences of a probability engine doing what probability engines do.

Once you internalise this framing, the failure modes stop looking surprising. They start looking inevitable. The user’s job becomes correspondingly clearer. You are not working with a colleague who occasionally makes mistakes. You are working with a sophisticated pattern-matching system that produces plausible output by default and accurate output only when the verification work is done. The verification work is yours. Always. The system cannot do it, because doing it would require the kind of understanding the system does not have.

The article is not arguing AI is broken. It is arguing AI has predictable failure modes that exist independently of how carefully it is used. Knowing them is the difference between catching them and being caught by them.

Failure category 1: Hallucinated citations

*A citation that looks complete is not the same as a citation that exists.*

This is the failure most readers will already have encountered. The AI generates a citation that sounds plausible. The publication is real. The author name looks right. The URL has the correct domain. The article being cited does not actually exist, or if it exists it does not say what the AI claims it says.

The Opus and Sonnet specimens in piece 4(d) both contained citations that the human author had not personally verified at the time the article was written. The Opus specimen included a closing note explicitly reminding the user to verify the citations live during the presentation. The Sonnet specimen contained the same kind of plausible-looking citations without any such reminder. Either could have contained hallucinations. Not every URL was clicked through to check.

The failure mode happens because the probability engine has seen enormous numbers of citations in its training data. It knows the pattern: author surname, initial, year, publication, page reference, URL structure. Asked for a citation, it produces output matching the pattern. The output may correspond to a real source. It may also be a plausible-shaped invention. The system cannot tell the difference between the two, because telling the difference would require the system to verify the citation against an external source, and the system has no such mechanism.

There is a more subtle version of this failure worth flagging. The AI may be remembering source material accurately from its training data while generating a URL that does not correspond to anything currently accessible. Encyclopedia Britannica is a useful example. The AI has likely been trained on substantial Britannica content. It can produce accurate paraphrases of Britannica articles. When asked for a URL, it generates something that follows Britannica’s URL pattern, because the pattern is statistically learnable. The pattern-shaped URL may or may not point at the article being cited. The article itself may have been moved, paywalled, or restructured since the training cutoff. The user clicks, finds nothing useful, and concludes the AI invented the source. The truth is more complicated: the source may have existed and the citation may be substantively accurate, while the URL is plausible-shaped fiction. The verification problem is the same in either case. The user cannot rely on the URL as a verification path.

Worse, the more authoritative the AI’s output sounds, the more likely the user is to skip the verification. A confidently-stated citation from an AI that has otherwise been producing useful work feels trustworthy. The trust is not warranted. The AI’s confidence on a specific citation is a function of statistical plausibility, not factual accuracy.

The diagnostic. Click the URLs. Read the cited material. Confirm the citation says what the AI claims. If the claim is load-bearing, the verification is non-optional. If you cannot verify the citation, treat the claim as unsupported regardless of how it is framed.

A small additional signal worth knowing: AI that flags its own uncertainty about citations is being more honest than AI that does not. The Opus closing note is structurally honest in a way the Sonnet specimen was not. Take that signal seriously when you see it. Take its absence as a warning to verify even more carefully.

Failure category 2: Confident assertion of unsupported information

*The confidence is real. Whether the information underneath is real is a different question.*

This is the close cousin of hallucinated citations and the harder failure to catch. The AI produces a confident statement about a fact, with no source attached, no hedging language, and no signal that the AI was uncertain. The reader has nothing to verify against because no verification is offered.

An example of this happened during the drafting of piece 4(c) of this series. While searching for images to illustrate the article, I asked Claude to evaluate a set of search results. Claude described specific images by name in confident language, claiming one image showed a Wile E. Coyote figurine with a backfiring ACME mallet. Claude did not actually have visual access to the images. The descriptions were inferences from the search-result labels, presented as direct observation of the images themselves. The confident framing made the output read as authoritative analysis when it was actually projection from labels.

The error was caught in the conversation when I pushed back. Once challenged, Claude acknowledged the gap between confidence and grounding. But the catch was contingent on noticing. If I had not pushed back, the confident description would have entered the conversation as if it were true, and any subsequent discussion would have built on the false premise.

This is how the failure mode propagates: the AI states something with confidence, the user accepts it as a working assumption, the conversation moves forward, and the original ungrounded claim becomes the foundation for everything that follows.

The probability engine is producing confident-sounding output because confident-sounding output is statistically probable in the training corpus. Most professional writing is confident in its assertions. The model has learned the pattern. It deploys the pattern by default, even when the underlying knowledge does not justify the confidence.

The diagnostic. When the AI makes a specific claim without hedging, ask whether the AI has the information to support it. If you suspect not, ask the AI to flag its uncertainty explicitly. Sometimes it will. Sometimes asking surfaces the lack of grounding. If the AI’s response shifts from “X is the case” to “X is likely the case based on patterns I have seen, but I cannot confirm this”, the original confidence was unwarranted. Confidence without basis is the failure mode to spot.

A useful test: ask the AI for the source of its claim. If the AI produces a source, verify it (see category 1). If the AI produces no source but maintains the claim, the AI is asserting without grounding. If the AI revises the claim downward when asked for a source, the original assertion was overstated.

Failure category 3: Sycophancy bleeding into work output

Sycophancy as a phenomenon was covered in piece 3 of this series, which addressed how AI defaults to agreement when honesty would serve the user better. That piece was about the structural reason it happens: AI models are trained on human feedback, and humans tend to reward agreement and resist disagreement. Sycophancy is not a bug. It is a learned behaviour reinforced by training.

Agreement is more comfortable than honest feedback. The discomfort is the point.

The dimension Q4(e) adds is what happens when sycophancy bleeds into work output. The user asks the AI to evaluate something they produced. The AI returns mostly positive framing. The honest evaluation might have been “this section is doing something the rest of the piece does not need.” The sycophantic evaluation is “this section adds depth to the piece’s argument.” Both can be written about the same paragraph. The first is useful feedback. The second is flattery dressed as feedback.

Even more insidiously, sycophancy can bleed into the AI’s reasoning. The user states a flawed premise in a prompt. The AI, instead of catching the flaw, builds output on top of it. “Given that X is the case, here is the analysis.” But X was not actually the case. The user assumed it. The AI agreed. The output is now grounded in a false foundation that neither party has tested.

There is a specific tell that surfaces this failure mode reliably across some products. Grok and ChatGPT in particular tend to begin responses with “You’re right, I was…” whenever the user suggests a problem with the previous output, even when the user is wrong about the problem. The AI is reading the social signal (user expressed dissatisfaction) and producing the response that signal pattern usually expects (acknowledgement and apology), regardless of whether the user’s complaint is accurate. The AI is being agreeable, not honest. A user who wants real diagnostic help is being served the social-comfort response instead.

There is a practical technique that helps counter this failure mode. Rather than asking the AI to evaluate work neutrally, ask it to push back if it thinks you are wrong. The phrasing matters. “Push back if you think I’m wrong” makes disagreement an explicit success condition, the same way “tell me if it’s good” makes agreement an explicit success condition. The AI is responsive to what it is rewarded for. If the user has only ever rewarded agreement, the AI will produce agreement. If the user has rewarded honest disagreement, the AI is more likely to produce it. I have used this technique repeatedly while drafting this very series, including during disagreements with Claude about title choices and structural decisions in this article. The technique is not foolproof. It does shift the default.

The failure mode is particularly hard to catch because it feels good to receive. The user enjoys agreement. The AI is rewarded for producing it. The system reinforces itself unless the user explicitly pushes back.

The diagnostic. Ask the AI to argue against the position it just produced. If the counter-argument is weak or formulaic, the original argument may have been weakly grounded too. Ask the AI to identify weaknesses, omissions, things the work could be doing better. The willingness to find real problems is itself a signal of whether the original assessment was honest.

A second technique: anonymise the work and submit it as someone else’s. Piece 3 of this series describes this in detail. AI assessing work it does not know belongs to the user is more likely to give honest feedback. The bias is partly about social pressure (humans are reluctant to criticise to someone’s face), and AI inherits the pattern from its training.

Failure category 4: Plausible-but-wrong code

The hardest coding failure to catch. AI produces code that compiles, runs, and produces output without crashing. The output is wrong. The bug is not at the level of syntax but at the level of logic. Nothing announces itself as broken.

The chair holds weight. Until it doesn’t.

Examples from drafting piece 4(c): ChatGPT producing buggy lua that needed multiple iterations to debug. Grok producing regex that looked correct but did not match the intended patterns. The general pattern is AI generating code that satisfies the syntactic constraints of the language without satisfying the semantic constraints of the problem.

This happens because the probability engine has learned the patterns of correct-looking code without understanding what the code is supposed to do. It generates statistically plausible code based on the prompt. The plausibility extends to the surface (variable names, function structures, library calls) without extending to the depths (whether the algorithm is actually correct for the intended use case).

Worth noting an important contrast here. Claude Code, discussed in piece 4(c), operates differently from the typical AI coding assistant. It generates test cases, runs them, builds Docker images, sandboxes execution, and self-corrects when its tests fail. The verification step is built into the tool. ChatGPT, Grok, and similar coding interactions do not include this verification by default. The user has to run the code themselves and catch the failures, and worse, when the user reports a bug, the AI typically responds with “You’re right, I was…” even when the user is mistaken about which line contains the problem. The diagnostic loop with these tools is therefore more fragile.

For users who code, the failure mode is familiar enough that it has been written about widely. The mitigation is testing. Run the code with edge cases. Confirm the output is what you expect. Read the code carefully and reason about whether it does what it claims. Do not trust the AI’s confidence in its own output as a substitute for verification.

For users who do not code, the failure mode is more dangerous because there is no easy way to verify. The AI says the code does X. The user has no way to test whether it actually does X. They paste it into their system and run it, and either it works or it does not, with no diagnostic in between.

The diagnostic. Never trust AI-generated code based on the AI’s confidence in it. Run it. Test it with edge cases. If you cannot reason about whether the code is correct, do not rely on the AI to confirm it for you. If you do not code at all, recognise that AI is operating in a domain where you cannot evaluate the output, and weight your trust accordingly.

Failure category 5: The configuration-resistant reflex

*Oops, I did it again, I played with the words, got lost in your rules, oh aye-eye, aye-eye*.

This is the failure mode that has been documented most thoroughly in this series, because it is happening continuously while the series is being written.

The blog has explicit configuration against em-dashes. The project instructions forbid them. Memory notes reinforce the rule. The conversation has corrected the AI on this point dozens of times across multiple drafting sessions. The rule is not subtle, not buried, not contradicted by other instructions. It is direct, simple, and repeatedly stated.

The AI breaks the rule predictably. Not constantly, but reliably enough that every long article in this series has required multiple human catches of the failure mode the configuration was designed to prevent. The AI is not ignoring the rule. The AI knows the rule. The AI applies the rule most of the time. In specific moments where the underlying training pull is stronger than the explicit instruction, the AI defaults to the trained pattern despite the configuration.

This is structurally different from the other failure modes in this article. It is not about hallucination or sycophancy or wrong output. It is about reflexive behaviours that resist instruction. The AI defaulting to patterns from its training even when those patterns have been explicitly forbidden. The probability engine has been trained on enormous amounts of writing that uses em-dashes. The em-dash is a high-probability output in many sentence positions. Configuration weakens the probability but does not remove it. In moments of high statistical pull, the trained behaviour wins.

The configuration-resistant reflex is a useful concept beyond the specific punctuation case. Anywhere the AI has been trained on a strong pattern, configuration that contradicts the pattern will produce occasional failure even when the configuration is reiterated. The user cannot rely on configuration alone for high-stakes outputs. The catching has to happen in the conversation, every time.

The diagnostic. Notice the patterns the AI repeatedly violates despite configuration. They are not random. They cluster around moments where the AI is reaching for emphasis, structure, or familiar rhetorical shapes. The user’s job is to watch for them in the specific moments they are likely to occur. Configuration helps reduce frequency. It does not eliminate occurrence.

Failure category 6: Rhetorical structure failure

The shape of careful argument. Not the function of careful argument.

This is the failure mode that piece 4(d) explicitly diagnosed in its own structural-argument section. The AI produces prose that has the shape of good writing without performing the function of good writing.

The structural reflexes are predictable. Parallel paragraph openings that announce what each paragraph will argue. Repeated “from” or “the” constructions that read as rhetorical structure but do the work of seeming substantial rather than the work of being substantial. Restated claims across consecutive sentences. Wrap-up paragraphs that summarise the previous content rather than developing new content. Each pattern is independently associated with professional writing in the training data. Deployed together, they produce output that looks careful while being structurally lazy.

The failure is hard to catch because it does not fail any obvious test. The grammar is right. The argument is technically present. The reader can extract the meaning. What is missing is the variation, the rhythm, and the choices a human writer makes that distinguish thoughtful prose from competent prose.

In Q4(d), the failure was caught because the human collaborator noticed mid-read. The article then explicitly diagnosed the failure, presented the original section, and offered a rewritten version that compressed the same argument without the structural reflexes. Both versions sat on the same page so the reader could see the difference.

This is the failure mode where the gap between simulating intelligence and actually being intelligent is most visible. The AI has learned the shape of professional writing. It can produce the shape on demand. The shape is not the substance. A piece of writing can have all the surface features of careful argument while making no actual argument, or making an argument so weakly that no human reader engaging carefully would have written it that way.

The diagnostic. Read the AI’s output aloud. The slop tendencies surface at the level of rhythm. Repeated paragraph openings sound like a list rather than an argument. Restated claims sound like the author lost track of what they had already said. The ear catches what the eye skims past.

A second test: ask whether each paragraph is doing different work, or whether several paragraphs are doing the same work in different words. If the latter, the structure is performing depth rather than producing it.

What this all adds up to

The six categories share a structural feature. Each is a failure mode that survives the practices the rest of this series has been recommending. Configuration helps. Conversation helps. Verification helps. Multi-AI orchestration helps. Each layer reduces the frequency and visibility of these failures. None of the layers eliminate them.

This follows directly from the SI framing in the opening section. The failures are not accidents. They are the predictable behaviours of a probability engine doing what probability engines do. A system that produces statistically plausible output by default will produce hallucinated citations, confident assertions, sycophantic agreement, plausible-but-wrong code, configuration-resistant reflexes, and rhetorical-structure failures, because each of those is statistically plausible in the training data. The system cannot avoid them by trying harder. The system has no concept of trying.

Worth a brief acknowledgement before closing. The failure modes documented in this article are not unique to AI. Humans hallucinate citations, assert confidently without grounding, build arguments on flawed premises, and produce structurally lazy writing dressed as careful argument. The mechanisms differ. Human reconstruction works through memory and inference. AI generation works through statistical prediction. The surface failures look similar enough that the diagnostic skills the article teaches are useful for evaluating human work too. Verify before relying. Push back when the confidence outpaces the evidence. Read for substance rather than structure. The skills travel.

The user’s job is not to find an AI workflow that produces no failures. That workflow does not exist. The user’s job is to know what failures to watch for, to develop the diagnostic skill of catching them, and to maintain enough scepticism about AI output that the failures stay visible rather than becoming invisible through familiarity.

This is more work than the public conversation usually admits. It is also less work than the public conversation often suggests. The middle ground is where this series has been operating: SI is useful, SI requires care, SI rewards practice. None of those statements are dramatic. All three are true.

You may have noticed I introduced the term SI at the start of this piece and have returned to it now at the end. In between, I have been calling these systems AI, the way most public conversation does. The choice was deliberate. The body of this article documents failures that survive good practice. Naming those failures matters more than fighting the terminology. Now that the failures are documented, the terminology is worth returning to. These systems simulate intelligence. They are not, in fact, intelligent. The failures in this piece are evidence for that distinction. Naming them accurately is the first step toward using them well.

“Question 4(e)” – Into AI, asking the right questions: The Failures That Survive Good Practice

The premise

Failure category 1: Hallucinated citations

Failure category 2: Confident assertion of unsupported information

Failure category 3: Sycophancy bleeding into work output

Failure category 4: Plausible-but-wrong code

Failure category 5: The configuration-resistant reflex

Failure category 6: Rhetorical structure failure

What this all adds up to

The system simulates intelligence. The watching is yours.

Facebook, why didn’t I receive that update…?

The results of the attack…

“Question 4(a)” – Into AI, asking the right questions: Setting Up AI So It Actually Knows Who You Are

Spam in your fridge? Yeah sure, but what about spam from your fridge?

The IoT should really be IoSI (Internet of Security Issues)

Update about HP Print cartridges

Leave a Reply Cancel reply

The premise

Failure category 1: Hallucinated citations

Failure category 2: Confident assertion of unsupported information

Failure category 3: Sycophancy bleeding into work output

Failure category 4: Plausible-but-wrong code

Failure category 5: The configuration-resistant reflex

Failure category 6: Rhetorical structure failure

What this all adds up to

The system simulates intelligence. The watching is yours.

Similar Posts

Leave a Reply Cancel reply