✉️ Why LLMs Write Code More Easily Than They Modify It?

Olivier Vitrac, PhD, HDR — Adservio Innovation Lab November 2025

Do you Claude Ecolically


Abstract

Large Language Models (LLMs) demonstrate a clear asymmetry between generation and modification tasks. They can generate code fluently from concise specifications, yet they struggle to revise or refactor large, structured codebases. This limitation is not merely practical — it is theoretical: editing involves higher information entropy and conditional complexity than writing from scratch.

In short, writing is a linear act of construction, whereas editing is a branched act of reconstruction. It requires maintaining the coherence of dependencies, names, and states — comparable to reweaving a Turing machine’s tape rather than writing it anew.

In simpler words

LLMs are brilliant architects but clumsy electricians: they design clean new systems from short briefs but struggle to rewire existing ones without tripping over their dependencies.

This asymmetry stems from information entropy and cognitive load, not raw computational power. It reflects a fundamental constraint rooted in computation theory and verified experimentally across recent benchmarks.


1. From a coding experience to complexity theory

💡 NOTE: Two formal measures of complexity are used in information theory: entropy and conditional Kolmogorov complexity. Both are defined and illustrated in Appendices A and B. Before discussing them, it is useful to see the problem.

Let us consider two very small programs: one extended sequentially, the other edited internally. Both end up producing the same visible effect, yet their token-level complexity for an LLM is drastically different.


1.1 Minimal illustration: extension vs. revision

Case A – Extension (linear writing)

Let the original file contain m = 1 line, and we append n = 2 lines. Tokenization (GPT-2-style, approximate):

LineCode fragmentTokens
1print("Hello, world!")5
2print("Welcome to Adservio Lab.")7
3print("Enjoy your day.")5
Total 17 tokens

The model performs pure linear generation: each new token follows the previous one with minimal uncertainty. Entropy is dominated by local lexical choices, and positional encoding is monotonic. Formally, H(P)n,H(X).


Case B – Revision (contextual editing)

Here, the final output is similar (greetings), but the operation is an edit of Program A: it introduces a loop, branching, and state variables.

Approximate tokenization:

Code fragmentTokensContextual links
names = ["Alice", "Bob", "Charlie"]9introduces variable names
for name in names:6depends on names
if name.startswith("A"):8adds conditional branch
print(f"Hello, {name}!")9depends on branch variable
else:1contextual token
print("Welcome to Adservio Lab.")7reused literal
Total≈ 40 tokensmultiple cross-dependencies

Although the visible code only doubled, the effective token count more than doubles, and several tokens now carry contextual meaning (variable scopes, conditions, indentation, string reuse). Each of these relationships must be re-evaluated by the model, inflating the conditional entropy H(PP) and, equivalently, the conditional complexity K(PP).


1.2 Immediate observation

Appending n lines to an m-line codebase mainly increases the lexical sequence length. Editing n lines inside an m-line codebase forces the model to re-interpret all tokens that might depend on the modified region. Hence, although fewer characters are produced, more information is processed.

(1)K(P|spec)n,whereasK(P|P)nlog(dependencies).

where


1.3 From illustration to general principles

Figure 1 contrasts the cognitive and computational asymmetry between writing a few lines of code and revising those same lines within a complex environment. The discrepancy arises from the extra information — i.e., additional tokens — required to describe what must be changed, where, and how dependencies are preserved.

Contextual editing (revision)

Existing code P (m lines)

Identify targets (≈ m tokens)

Apply edits (≈ n tokens)

Re-encode dependencies
(≈ n·log m tokens)

Output P' (revised)

Linear generation (extension)

Spec (short)

Write new lines (low entropy)

Output P' (≈ n tokens)

Figure 1 — Entropy in code generation vs. modification

LLMs consume more tokens to maintain coherence than to produce text. Editing forces them to recompute positional, syntactic, and semantic dependencies—an operation that scales faster than the visible diff.

The remainder of this note generalizes this observation. Sections 2–4 and the appendices formalize it using information entropy and conditional Kolmogorov complexity, providing a quantitative basis—and thermodynamic analogy—for the energetic cost of reasoning during code modification.


1.4 What complexity theory tells us

When an LLM writes from scratch, it generates P directly from spec, and the informational cost is its unconditional complexity K(Pspec). When it edits existing code, the model must transform P into P, which involves the conditional complexity K(PP) — how much new information must be injected while preserving all prior constraints.

In simple terms:

Generation = write everything anew → linear reasoning. Editing = modify while preserving coherence → contextual reasoning.

  1. Information-theoretic asymmetry The informational cost of editing is captured by the conditional Kolmogorov complexity K(PP) — the minimal description required to transform an existing program P into its revised version P. When edits propagate through shared dependencies or hidden registers, K(PP) increases sharply, whereas the linear generation complexity K(Pspec) (from a blank specification) remains low.

    Editing is nonlinear: its cost scales with the entropy of the dependency graph, not merely with the size of the change.

    Figure 2 summarizes this behavior:

    • As dependency density increases, the minimal description length K(PP) grows linearly or faster.

    • The corresponding token cost (context + generation) grows in parallel.

    • Editing reliability decreases roughly inversely with conditional complexity as attention and memory saturate.

  2. Time–memory duality In transformer architectures, reasoning cost increases with context length (O(n2) attention). Maintaining long-range dependencies across files, classes, or configurations demands both more tokens and more memory — echoing the classical time–space trade-off formalized by Hartmanis & Stearns (1965).

  3. Software-evolution entropy Empirical studies show that source-code entropy spikes during major refactors or architectural shifts. These are precisely the conditions under which LLMs falter: high entropy yields unpredictable propagation of changes and reduced determinism in dependency resolution.

Growth of Conditional Kolmogorov Complexity with CouplingLowModerateHighVeryHighRepository coupling / dependency density →1.21.110.90.80.70.60.50.40.30.20.1Relative K(P'|P) (orange), cost (blue), reliability (green)
Figure 2 — Evolution of relative conditional Kolmogorov complexity K(P'|P) (orange), token cost (blue), and editing reliability (green) as repository coupling and dependency density increase.

2. What experiments show (empirical evidence made simple)

The theoretical arguments can be tested in practice. Over the last two years, several research teams have built benchmarks—large collections of real programming tasks—to measure how well LLMs perform when generating, debugging, or editing code. The table below summarizes five of the most widely cited ones.

BenchmarkWhat it testsKey finding
HumanEvalSmall, isolated coding tasks such as “write a function that reverses a string.”LLMs perform very well (above 85% correct for GPT-3.5/4). These tasks require no external context.
SWE-benchReal-world bug fixes drawn from open-source repositories.Success rates drop below 25%, even for top models. Once multiple files and dependencies are involved, reasoning collapses.
RepoBench (ICLR 2024)Understanding and editing entire repositories rather than single files.Performance decreases sharply with project size and cross-file links.
CodePlan (TOSEM 2024)Planning multi-step code edits (understanding, proposing, modifying, and verifying).Models must “think in steps.” Without planning or memory, they get lost mid-edit.
Lost in the Middle (TACL 2024)How well models use very long contexts (tens of thousands of tokens).Models tend to ignore information located in the middle of long inputs—critical for editing long codebases.

In plain terms: Models are great when they can focus on a single self-contained problem (a function, a paragraph, an equation). They struggle as soon as they must reason about interconnections—the very fabric of software engineering.

These results empirically confirm what complexity theory predicts:

The more intertwined the context, the more information must be recalled, recomputed, and rewritten—raising both entropy and computational cost.


3. Conceptual analogy (why it feels harder)

The difference between writing and editing large codebases can be understood through analogies that bridge computer science, physics, and engineering.

PerspectiveLinear generationRepository-level editing
Turing machineWriting tape sequentially—each symbol depends only on the previous one.Rewriting linked cells while preserving state—one change ripples through the whole tape.
Transformer attentionSparse and local: focus on a few relevant tokens.Dense and global: attention must cover many tokens and dependencies at once.
Information flowLow entropy—information flows in one direction.High conditional entropy H(P!!P)—information must be preserved and recombined.
Engineering metaphorDrafting a clean new blueprint.Rewiring an entire factory while it’s still running.

Reading this table

Each row is a way of describing the same asymmetry:

From a thermodynamic perspective, editing is like maintaining order in a system full of moving parts: it requires energy just to avoid chaos.


4. Practical implications for AI-driven development

1. Strategic level — Favor modular regeneration

Instead of asking an LLM to rewrite existing code line by line, it is often more efficient to generate a clean replacement module from its specification and re-integrate it. This keeps conditional complexity K(PP) low and avoids dependency explosions.

2. Architectural level — Control the context

Enhance editing workflows with:

3. Operational level — Plan, then edit, then validate

Adopt a three-step loop:

  1. Plan — identify what must change and which files are involved.

  2. Edit locally — apply minimal, well-scoped changes.

  3. Validate globally — run tests or consistency checks.

This sequence prevents entropy from spreading through the system—exactly as a good thermodynamic process prevents heat loss.

Closing insight

All these principles exploit the same asymmetry:

Generation lowers entropy by building from first principles, Editing increases entropy by juggling dependencies.

Minimizing the number and scope of edits is therefore not only good software practice—it is good energy practice as well.


5. Conclusion and takeaways

Conclusion — Complexity-Conservation Principle

Every additional token has a cost — not only in computation, but in energy. When an LLM edits existing code, it must re-evaluate dependencies, positions, and states: this increases the conditional complexity K(P'|P) and consumes disproportionately more resources than simple linear generation. In information-theoretic terms, added entropy becomes extra work; in thermodynamic terms, repeated unnecessary edits accumulate as heat and emissions.

At scale, millions of avoidable “just one more edit” requests translate into significant power usage. Editing is therefore not only a matter of correctness or productivity — it is also a matter of computational and environmental responsibility.

“Ask only when entropy deserves it.”

Complexity–Conservation Rule (entropy-aware editing)

# Before requesting or applying an edit:
# Estimate ΔK = K(P'|P)_after - K(P'|P)_before per edited token.
if ΔK_per_token > 0:
    reject_or_rethink_edit()   # edit increases global coupling / dependencies
else:
    perform_edit()             # edit simplifies, modularizes, or localizes effects

Or in natural language, suitable for both humans and LLMs:
“Does this edit reduce dependencies, or just shift them?
Will it increase K(P'|P) or reduce it?”

In other words, we cannot enforce strict conservation — the second thermodynamical law still holds — but we can enforce a design bias: only edits that lower conditional complexity should be favored and automated. Anything else is not only bad engineering; it is wasted entropy on the planetary budget.


References

  1. Hartmanis J., Stearns R. E. (1965). On the Computational Complexity of Algorithms. Trans. AMS, 117, 285–306. doi:10.2307/1994208

  2. Vitányi P. M. B. (2022). Information, Complexity, and Meaning. Springer.

  3. Jain S. et al. (2023). SWE-bench: Can LLMs Fix Real Bugs?. NeurIPS 2023.

  4. Zhu Z. et al. (2024). RepoBench: A Repository-Level Benchmark for LLMs.. ICLR 2024.

  5. Bairi R. et al. (2023). CodePlan: Repository-Level Code Editing with LLMs.. ACM TOSEM.

  6. Liu N. et al. (2024). Lost in the Middle: LLMs Struggle with Long Contexts.. TACL.

  7. Torres R. et al. (2022). Entropy of Source Code as a Predictor of Software Evolution. Empirical Software Engineering, 27, 45.

  8. Dao T. et al. (2023). FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. arXiv:2205.14135.


Global warming


Appendix A. Entropy and its meanings

In information theory, the entropy of a random variable X represents the minimal number of bits required—on average—to describe one outcome of X:

(2)H(X)=ipilog2pi[bits]

If X denotes the next token to generate, pi is the model’s predicted probability of each token i. The expected token cost is then:

(3)Bits per token=H(X)

and, since most tokenizers encode roughly 1 token ≈ 4 characters, we can approximate the information load of a sequence of n tokens as:

(4)Information loadn,H(X) bits.

When the model must reason about positions and dependencies (cross-file references, scopes, imports), the entropy grows because each token depends on a larger conditional context. This corresponds to conditional entropy:

(5)H(P|P)=H(P,P)H(P),

which quantifies the extra bits—or extra tokens—needed to describe a modified program P given the existing one P. As dependencies increase, H(P|P) rises super-linearly: more tokens are consumed merely to re-express known structure and maintain positional consistency.


A.2 Concrete example with text and token counts

Consider the English word sequence:

“The cat sleeps.”

Using a standard tokenizer (e.g., GPT-2 BPE), it is 4 tokens: ["The", " cat", " sleeps", "."]

If we want to generate this sentence from scratch, the information load is roughly:

Now imagine an edit requiring insertion of an adjective (“black”) between The and cat, plus agreement on verb tense:

“The black cat was sleeping.”

The model must:

  1. Insert two tokens (" black", " was") in correct order → additional content entropy ≈ 12 bits.

  2. Re-encode all subsequent tokens with updated positions → each positional vector (≈ 8 bytes/token) must be recomputed.

  3. Propagate tense consistency (“sleeps” → “was sleeping”) → another 8 bits of conditional decision entropy.

Thus, even for this tiny edit, total information load nearly doubles (from 24 → ~44 bits) and positional recomputation affects all following tokens. In long documents—or codebases with hundreds of linked identifiers—the same proportional inflation applies: entropy compounds with the number of tokens that must stay coherent.


A.3 Common-sense interpretation

Entropy measures how many questions must be answered to make something unambiguous.

In human terms, entropy is the cognitive and computational load of preserving order while changing detail.


SituationTypical tokens processedEffective entropyComputational implication
Generate new file from prompt200–800Low (few dependencies)Fast, cheap inference
Edit function with cross-refs2 000–8 000ModerateQuadratic attention cost
Refactor multi-module repo20 000 +High H(PP)Very expensive and slow

Hence, entropy translates directly into token count × bits per token, which drives the energy, time, and memory required by the model. Large or branched edits therefore consume far more computational entropy than linear code generation.


Appendix B. Conditional Kolmogorov Complexity K(PP) and its practical meaning

B.1 Formal definition

Kolmogorov complexity K(X) is the length (in bits) of the shortest possible program that outputs X on a universal Turing machine. The conditional Kolmogorov complexity of P given P is defined as:

(6)K(PP)=minπ,|π|:U(π,P)=P,,

where U is a universal computer (or model), and |π| is the size of the minimal description π—essentially the edit program that transforms P into P.

Intuitively:

If the modification touches deeply coupled regions, K(P|P) can approach or even exceed K(P); that is, editing may be as hard as rewriting.


B.2 Relationship with entropy and tokens

Shannon entropy and Kolmogorov complexity coincide in expectation for computable sources:

(7)E[K(P)]H(P).

Thus, we can interpret K(P|P) as the expected number of bits (or tokens) the model must process to make all dependent edits consistent. When a codebase is large, the minimal “edit program” grows because:

  1. Dependencies must be re-specified explicitly in text (imports, signatures, docstrings).

  2. Positional changes cascade—adding one symbol often forces dozens of updated references.

  3. The model must check consistency (syntactic and semantic), implying extra reasoning tokens.

Hence, K(P|P) directly scales with the token budget required for context + generation + validation.


B.3 Example: small vs. coupled edit

Consider two tasks:

  1. Local edit: change a numerical constant

    • Minimal description length: “replace 0.95 → 0.97” (≈ 3 tokens).

    • K(P|P) is constant, almost independent of file size.

  2. Coupled refactor: rename a class and propagate it

    • All calls (UserSession()), docstrings, imports, tests, and configuration keys must change.

    • For a 50 k-token repository, the LLM must locate and regenerate 500–2000 token spans with consistent semantics.

    • K(P|P)O(affected tokens × log dependencies) → often thousands of tokens.

Thus, while K(P) (writing a simple class) may be ~200 tokens, K(P|P) (editing across dependencies) can exceed 2000 tokens—an order of magnitude more information.


B.4 Operational analogy for LLMs

OperationDescriptionApprox. computational complexity
Generate a new file from promptThe model writes code directly from a specification; there are few prior constraints or dependencies.K(Pspec)O(1) — linear growth with sequence length only.
Edit a single isolated functionThe model must reason locally within a bounded scope and preserve syntax; minimal propagation of side effects.K(PP)O(local scope) — modest increase with number of dependent tokens.
Refactor interdependent modulesThe edit touches multiple files, type hierarchies, or APIs; requires cross-module reasoning and positional re-encoding.K(PP)O(nlogn) — super-linear growth with repository size and coupling.
System-wide migration (e.g., API version bump)The entire repository must remain consistent; imports, configs, and tests are rewritten coherently.K(PP)K(P) — editing cost approaches that of rewriting from scratch.

This progression formalizes why LLMs lose efficiency when editing complex systems: the edit description itself becomes as large as the new code.


B.5 Common-sense interpretation

Kolmogorov complexity measures compressibility of transformation. If a small, elegant “diff” suffices to move from P to P, the system is modular and predictable (low K(P|P)). If every edit triggers ripple effects, the system is entangled (high K(P|P)).

For LLMs, high K(P|P) means:


B.6 Linking back to Appendix A

Entropy H(P|P) measures the uncertainty of the change, while K(P|P) measures the shortest possible message describing it. Both are expressed in bits and, when mapped through a tokenizer, in expected token counts. The scaling relation is:

(8)Expected tokens for editK(P|P)bits per token.

Therefore, high conditional complexity directly translates into larger prompts, longer inference times, and increased cost—precisely the symptoms observed when LLMs attempt large-scale code modifications.


Adversion Innovation Lab – nov 2025