Applied Intelligence
Module 5: Output Validation and Iteration

Package Confabulation

The dependency that doesn't exist

API confabulation makes code fail when it runs. Package confabulation makes code fail before it runs, and sometimes enables attack.

When an agent generates pip install fastapi-cache-toolkit or npm install react-form-utils, developers often execute the command without verifying the package exists. The agent suggested it with confidence. The name sounds reasonable. The syntax is correct. But the package is fabricated.

Research analyzing 576,000 code samples generated by AI found that 19.7% of package recommendations reference packages that don't exist. This isn't occasional noise. Nearly one in five dependency suggestions points to nothing.

The fabrication rate varies dramatically by model:

Model CategoryConfabulation RateNotes
Commercial models (average)5.2%GPT-4 Turbo at 3.59% performs best
Open-source models (average)21.7%DeepSeek, WizardCoder, Mistral
CodeLlamaOver 33%Highest observed rate
Best performers (2025-2026)0.7-1.5%Gemini 2.0 Flash, o3-mini, GPT-4o

The gap matters. A team using a well-tuned commercial model encounters fabricated packages one-twentieth as often as a team using CodeLlama. Model selection directly affects how much time goes into dependency verification.

Why confabulation, not hallucination

The previous page introduced confabulation as the term for non-existent API methods. The same reasoning applies to fabricated packages, and understanding the distinction changes how developers respond.

Hallucination, in clinical terms, refers to sensory experiences without external stimuli. The hallucinating person perceives something that doesn't correspond to reality. LLMs have no sensory experience. They cannot perceive, so they cannot misperceive.

Confabulation describes memory filling: producing details that seem plausible but have no factual basis, often to complete an expected narrative. A patient with memory impairment might describe a vacation that never happened, not lying but genuinely believing the confabulated memory. LLMs operate similarly. They generate text that statistically fits the pattern without grounding in external truth.

The distinction matters practically. Calling fabricated packages "hallucinations" implies the model is somehow seeing wrong. Calling them confabulations highlights the actual mechanism: the model fills gaps by pattern-matching against training data. It has seen millions of package names like fastapi-* and react-*-utils, so it generates plausible completions regardless of whether those specific packages exist.

This course uses confabulation consistently. When "package confabulation" appears, understand it as: the model generated a dependency name that follows real naming conventions but references nothing real.

The slopsquatting attack

Fabricated packages create a supply chain attack vector called slopsquatting. The attack exploits the consistency of confabulations and the speed of AI-assisted development.

The attack works in four steps:

  1. Harvest confabulated names. An attacker runs common development prompts through multiple AI models, collecting the fabricated package names that appear. "Build a FastAPI app with caching," "Create a React form with validation," "Set up a Node.js logging system." Each prompt may produce packages that don't exist.

  2. Identify repeatable patterns. Research found that 58% of confabulated package names reappear across multiple runs with the same prompt. Of those, 43% appeared in all ten test iterations. Confabulations are predictable artifacts of how models complete patterns, not random noise.

  3. Register malicious packages. The attacker publishes packages with confabulated names to public registries (npm, PyPI, RubyGems). The package contains malicious code: credential theft, backdoor installation, data exfiltration, or supply chain infection.

  4. Wait for installation. A developer accepts agent-generated code, runs the install command without verifying the package, and downloads the attacker's payload.

The attack requires no sophistication from the victim. A developer following normal AI-assisted workflow, accepting suggestions and running install commands, becomes compromised.

Proof-of-concept experiments demonstrate the scale. One researcher registered a package with a confabulated name (huggingface-cli, a name AI models frequently suggest for a package that doesn't officially exist under that name). The dummy package accumulated over 30,000 downloads in three months. A malicious version would have compromised 30,000 development environments.

Another experiment registered a RubyGem with an explicit warning in its description: "Do not use this! This could be a malicious gem because you didn't check if the code ChatGPT wrote for you referenced a real gem or not." Despite the warning, the package was installed over 2,000 times. Developers don't read package descriptions when running install commands from agent suggestions.

Recognizing confabulated packages

Confabulated packages share characteristics that signal fabrication. The pattern-matching that creates them also makes them identifiable.

Plausible naming that follows conventions too closely. Real packages have quirks, history, and naming decisions that predate current conventions. Confabulated names are often suspiciously well-formed: fastapi-cache-toolkit, react-form-validator-utils, django-auth-middleware. They combine popular framework names with common functionality words. A package named exactly what you'd expect may be a package that doesn't exist.

No established presence. Confabulated packages lack GitHub stars, Stack Overflow questions, tutorials, and mentions in official framework documentation. A quick search reveals absence. Real packages accumulate digital footprint over time.

Registry verification fails. The definitive check is to query the registry directly before running install commands.

# npm: Check if package exists before installing
npm view react-form-utils
# Error: 404 Not Found - react-form-utils@latest

# pip: Check PyPI directly
pip index versions fastapi-cache-toolkit
# ERROR: No matching distribution found for fastapi-cache-toolkit

# Or search PyPI website:
# https://pypi.org/project/fastapi-cache-toolkit/
# 404 page confirms confabulation

Suspiciously short or missing version history. A package created last week to exploit a confabulated name has no version history. Established packages have release histories spanning months or years. Check the "Release history" or "Versions" tab on the package registry page.

Anomalous download counts. Either zero (just registered) or suspiciously high without corresponding community presence (infected packages spreading through compromised systems). Compare download counts against GitHub activity and community size.

Verification before installation

Every dependency added by agent-generated code requires verification before installation. The verification process takes seconds and prevents compromise.

Check existence first. Before running any install command from agent-generated code, verify the package exists in the official registry. Copy the package name, search the registry, confirm it returns a result.

Examine package metadata. Once confirmed to exist, review:

  • Creation date: packages created recently may be slopsquatting attempts
  • Author/maintainer: is this a known developer or organization?
  • Repository link: does the package have a linked source repository?
  • Description: does it describe functionality matching the use case?

Cross-reference with documentation. If the agent suggested a package as part of a framework ecosystem (a FastAPI extension, a Django middleware), check the framework's official documentation. Official plugins and recommended packages are listed. Undocumented packages claiming framework integration deserve suspicion.

Use lockfiles and dependency review. For existing projects, dependency updates should flow through pull requests with lockfile changes visible. Tools like npm audit, pip-audit, and Dependabot flag known vulnerable packages. They don't catch newly-registered malicious packages, but they add a layer of verification.

Sandbox untrusted installations. If a package must be tested before trusting it, install in an isolated environment. Containers, virtual machines, or ephemeral cloud environments limit blast radius if the package contains malicious code.

# Create isolated Python environment for testing unknown packages
python -m venv ./untrusted-test
source ./untrusted-test/bin/activate
pip install suspicious-package-name
# Test in isolation, then discard environment
deactivate
rm -rf ./untrusted-test

The repeatability problem

Confabulation might seem random, a one-off generation mistake. Research disproves this assumption.

When the same prompt goes to the same model repeatedly, confabulated package names recur with high consistency. The 58% recurrence rate means that most confabulations are stable patterns in the model's learned associations, not random noise.

This repeatability makes slopsquatting reliable. An attacker doesn't need luck. They identify prompts that produce specific confabulated names, register those names, and wait. Any developer using similar prompts will generate the same confabulated dependency and potentially install the attacker's package.

Repeatability also means that once a confabulation is identified, expect it again. If a team's workflow involves similar prompts across developers, the same fabricated packages will appear. Documenting known confabulations in team knowledge bases prevents repeated investigation.

When existing packages are wrong

Package confabulation, inventing packages that don't exist, is one failure mode. But agents also suggest packages that exist but don't fit.

Wrong package, similar name. The agent generates requests-cache when the task requires cachecontrol. Both exist. Both do caching. One fits the architecture better. The agent pattern-matched to a plausible name without evaluating fit.

Outdated or deprecated packages. Training data includes packages that have since been deprecated. The agent suggests urllib2 (Python 2) instead of urllib.request (Python 3). The package exists in some form but shouldn't be used.

Typosquatting victims. The agent generates a name close to a real package. If a malicious typosquat exists (reqeusts instead of requests), the misspelling becomes a vector. This isn't confabulation; the agent generated a real (malicious) package name by accident.

All three cases require the same discipline: verify before install. Check that the suggested package is the right package, not just a package.

Integrating verification into workflow

Dependency verification belongs in the same review checklist as code logic and security. The previous pages established review patterns. Add dependency verification as a mandatory check.

For individual developers:

  • Never run pip install or npm install directly from agent suggestions without registry verification
  • Copy package names, search registries, confirm existence
  • For unfamiliar packages, read the package description and check download counts

For code review:

  • Treat new dependencies in agent-generated code as higher-scrutiny items
  • Require justification for packages not in the project's existing dependency tree
  • Verify packages exist and are maintained before approving

For team standards:

  • Document approved packages for common use cases
  • Create templates that reference verified packages instead of asking agents to suggest them
  • Consider allowlists for dependency additions in CI pipelines

The 19.7% confabulation rate makes this non-optional. One in five agent-suggested packages doesn't exist. Verification catches confabulations before they reach install commands, stopping both wasted time and potential supply chain attacks.

On this page