Applied Intelligence
Module 7: Data Privacy and Compliance

What Data Leaves Your Environment

The data transmission question

Enterprise developers adopting AI coding tools face a practical question: what exactly leaves the development environment when these tools run? The answer depends on the tool, how it is configured, and how it is used. Before any security or compliance conversation can be productive, this needs to be clear.

This module covers enterprise concerns around AI coding tools: security, intellectual property, and regulatory compliance. The place to start is data flow. Misconceptions here cause problems in both directions. Some teams refuse adoption based on imagined risks. Others proceed without understanding genuine exposure.

The transmission model

AI coding tools send data in three categories: inference content, operational metadata, and telemetry. Each has different security implications.

Inference content is everything the model needs to respond: your prompts, files the tool reads, conversation history, and project structure. This is the core data flow. Without it, the tool cannot do anything useful.

Operational metadata includes file paths, directory structures, and usage patterns that help the tool function. Some tools bundle this with inference requests; others separate it.

Telemetry data includes performance metrics, error logs, and usage analytics for the provider. This is typically optional and can be turned off.

Claude Code data transmission

Claude Code is a CLI that talks to Anthropic's servers for all inference. No local model exists. Every interaction requires network communication.

When Claude Code reads a file, the full file contents go to Anthropic's servers. When it searches with Glob or Grep, matching file paths and relevant content are sent. Conversation history accumulates in the session and travels with each request.

What Claude Code transmits:

  • All user prompts and model responses
  • Complete contents of files Claude reads (not your entire codebase, only files it accesses)
  • File names, directory structures, and project organization
  • Conversation history within the active session

What stays local:

  • Files Claude Code does not read
  • Databases, external APIs, and running applications (unless you share them explicitly)
  • System configuration and installed packages (unless you share them)
  • Git history and untracked files (unless Claude accesses them)

The boundary is explicit: if Claude Code's tools do not touch something, it does not leave. This is different from tools that index your entire codebase upfront.

Cloud execution mode (Claude Code on claude.ai/code) clones your repository to an Anthropic-managed VM. In this mode, your codebase does leave your environment and resides temporarily in Anthropic's infrastructure. GitHub credentials pass through a secure proxy and never enter the sandbox directly. All outbound traffic routes through a security proxy for audit logging.

Encryption and storage

Data in transit uses TLS encryption. Data at rest on Anthropic servers uses provider-managed encryption, not customer-managed keys by default. Zero Data Retention configurations are available for enterprises requiring additional controls.

Codex data transmission

OpenAI's Codex CLI has two operating modes with different transmission characteristics.

Local CLI mode keeps your source files on your machine. Only prompts and context snippets go to OpenAI's API. When Codex reads files to understand context, those excerpts travel to the API, but your complete project does not.

Cloud mode (codex cloud exec) uploads your repository to OpenAI's isolated containers. The codebase leaves your environment. Tasks run in sandboxed environments with a copy of your code. This mode generates pull requests for review and requires GitHub cloud-hosted repositories.

For compliance purposes: local CLI mode can satisfy data residency requirements that cloud mode cannot.

Sandbox architecture:

Codex uses OS-level sandboxing on local machines:

PlatformTechnology
macOSSeatbelt policies via sandbox-exec
LinuxLandlock + seccomp
WindowsWSL (preferred) or experimental native sandbox

The default sandbox restricts access to the current working directory and /tmp. Network access is disabled by default during execution. The agent cannot make outbound connections unless you explicitly enable them. This prevents scenarios where generated code might attempt to exfiltrate your data.

GitHub Copilot data transmission

GitHub Copilot transmits different data depending on the feature.

For code completions:

  • Content in the file being edited
  • Neighboring or related files in the project
  • Repository URLs and file paths
  • Lines before and after the cursor position

For Copilot Chat:

  • Highlighted or selected code
  • Previous questions and responses in the conversation
  • Context from open files

Suggestions for code completions are discarded once returned to your IDE. They are not stored server-side for Business and Enterprise plans. However, prompts from web, mobile, and CLI interfaces are retained for 28 days even on enterprise plans.

Training data distinction:

GitHub explicitly states that Business and Enterprise data is not used to train models. This extends to third-party model providers (Anthropic, Google, OpenAI) that power Copilot's various features. Individual and Pro users have an opt-in setting for training, currently disabled for all users by default.

What telemetry sends

Beyond inference content, these tools collect operational metrics. For security-conscious environments, understanding telemetry and how to disable it matters.

Claude Code telemetry:

CategoryData CollectedOpt-Out
Statsig (operational metrics)Latency, reliability, usage patterns. No code or file paths.DISABLE_TELEMETRY=1
Sentry (error logging)Operational errors. Encrypted at rest.DISABLE_ERROR_REPORTING=1
Bug reports (/bug command)Full conversation history including code. Retained 5 years.DISABLE_BUG_COMMAND=1
All non-essential trafficCombined opt-out for all aboveCLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

Users connecting via Amazon Bedrock or Google Vertex AI have non-essential traffic disabled by default.

Codex telemetry:

Anonymous usage and health data is enabled by default. This includes feature usage patterns, configuration options, and model performance indicators, but not personally identifiable information. Disable via [analytics] enabled = false in ~/.codex/config.toml.

Copilot telemetry:

User engagement data (pseudonymous identifiers, accepted/dismissed completions, error messages) is retained for 2 years. Feedback data (thumbs up/down) is retained as long as needed. These cannot be fully disabled, though the data does not include code content.

The file reading behavior

There is a real difference between tools that proactively index your codebase and those that read files on demand.

Claude Code and Codex read files reactively. They access files when their internal logic determines context is needed, or when you explicitly reference a file. Your 10,000-file monorepo does not automatically upload to provider servers. Only files the agent actually opens get transmitted.

This matters for compliance. The question is not "can this tool access my proprietary code?" but "what triggers file access, and which files get accessed?"

For Claude Code, you can observe what files are read during a session. The tool logs its actions, making the transmission boundary visible. For security-sensitive work, this observability allows verification of what left your environment.

Implications for enterprise adoption

These transmission patterns inform policy decisions:

Data classification should account for the difference between entire-codebase access and on-demand file reading. A blanket prohibition on "sending code to AI services" may be too broad. A nuanced policy might permit AI tool use on lower-classification codebases while restricting use on crown-jewel intellectual property.

Network controls can limit which AI services developers access. If only Claude Code is approved, DNS or proxy rules can block Codex endpoints. If cloud modes are prohibited, only local CLI access might be permitted.

Audit capability varies by tool. Claude Code sessions can be reviewed to see what files were accessed. Enterprise configurations may enable additional logging. Compliance teams should understand what audit trails exist before approving tools.

The next section examines local versus cloud processing options, including deployment models that keep data within your infrastructure.

On this page