Olivier Vitrac, PhD, HDR — Adservio Innovation Lab November 2025

Large Language Models (LLMs) demonstrate a clear asymmetry between generation and modification tasks. They can generate code fluently from concise specifications, yet they struggle to revise or refactor large, structured codebases. This limitation is not merely practical — it is theoretical: editing involves higher information entropy and conditional complexity than writing from scratch.
In short, writing is a linear act of construction, whereas editing is a branched act of reconstruction. It requires maintaining the coherence of dependencies, names, and states — comparable to reweaving a Turing machine’s tape rather than writing it anew.
– In simpler words –
LLMs are brilliant architects but clumsy electricians: they design clean new systems from short briefs but struggle to rewire existing ones without tripping over their dependencies.
This asymmetry stems from information entropy and cognitive load, not raw computational power. It reflects a fundamental constraint rooted in computation theory and verified experimentally across recent benchmarks.
💡 NOTE: Two formal measures of complexity are used in information theory: entropy and conditional Kolmogorov complexity. Both are defined and illustrated in Appendices A and B. Before discussing them, it is useful to see the problem.
Let us consider two very small programs: one extended sequentially, the other edited internally. Both end up producing the same visible effect, yet their token-level complexity for an LLM is drastically different.
xxxxxxxxxx# Program A: sequential extensionprint("Hello, world!")print("Welcome to Adservio Lab.")print("Enjoy your day.")Let the original file contain m = 1 line, and we append n = 2 lines. Tokenization (GPT-2-style, approximate):
| Line | Code fragment | Tokens |
|---|---|---|
| 1 | print("Hello, world!") | 5 |
| 2 | print("Welcome to Adservio Lab.") | 7 |
| 3 | print("Enjoy your day.") | 5 |
| Total | 17 tokens |
The model performs pure linear generation: each new token follows the previous one with minimal uncertainty.
Entropy is dominated by local lexical choices, and positional encoding is monotonic.
Formally,
xxxxxxxxxx# Program B: internal revisionnames = ["Alice", "Bob", "Charlie"]for name in names: if name.startswith("A"): print(f"Hello, {name}!") else: print("Welcome to Adservio Lab.")Here, the final output is similar (greetings), but the operation is an edit of Program A: it introduces a loop, branching, and state variables.
Approximate tokenization:
| Code fragment | Tokens | Contextual links |
|---|---|---|
names = ["Alice", "Bob", "Charlie"] | 9 | introduces variable names |
for name in names: | 6 | depends on names |
if name.startswith("A"): | 8 | adds conditional branch |
print(f"Hello, {name}!") | 9 | depends on branch variable |
else: | 1 | contextual token |
print("Welcome to Adservio Lab.") | 7 | reused literal |
| Total | ≈ 40 tokens | multiple cross-dependencies |
Although the visible code only doubled, the effective token count more than doubles, and several tokens now carry contextual meaning (variable scopes, conditions, indentation, string reuse). Each of these relationships must be re-evaluated by the model, inflating the conditional entropy
Appending n lines to an m-line codebase mainly increases the lexical sequence length. Editing n lines inside an m-line codebase forces the model to re-interpret all tokens that might depend on the modified region. Hence, although fewer characters are produced, more information is processed.
where
Figure 1 contrasts the cognitive and computational asymmetry between writing a few lines of code and revising those same lines within a complex environment. The discrepancy arises from the extra information — i.e., additional tokens — required to describe what must be changed, where, and how dependencies are preserved.
Contextual editing (revision)
Existing code P (m lines)
Identify targets (≈ m tokens)
Apply edits (≈ n tokens)
Re-encode dependencies
(≈ n·log m tokens)
Output P' (revised)
Linear generation (extension)
Spec (short)
Write new lines (low entropy)
Output P' (≈ n tokens)
LLMs consume more tokens to maintain coherence than to produce text. Editing forces them to recompute positional, syntactic, and semantic dependencies—an operation that scales faster than the visible diff.
The remainder of this note generalizes this observation. Sections 2–4 and the appendices formalize it using information entropy and conditional Kolmogorov complexity, providing a quantitative basis—and thermodynamic analogy—for the energetic cost of reasoning during code modification.
When an LLM writes from scratch, it generates
In simple terms:
Generation = write everything anew → linear reasoning. Editing = modify while preserving coherence → contextual reasoning.
Information-theoretic asymmetry
The informational cost of editing is captured by the conditional Kolmogorov complexity
Editing is nonlinear: its cost scales with the entropy of the dependency graph, not merely with the size of the change.
Figure 2 summarizes this behavior:
As dependency density increases, the minimal description length
The corresponding token cost (context + generation) grows in parallel.
Editing reliability decreases roughly inversely with conditional complexity as attention and memory saturate.
Time–memory duality
In transformer architectures, reasoning cost increases with context length (
Software-evolution entropy Empirical studies show that source-code entropy spikes during major refactors or architectural shifts. These are precisely the conditions under which LLMs falter: high entropy yields unpredictable propagation of changes and reduced determinism in dependency resolution.
The theoretical arguments can be tested in practice. Over the last two years, several research teams have built benchmarks—large collections of real programming tasks—to measure how well LLMs perform when generating, debugging, or editing code. The table below summarizes five of the most widely cited ones.
| Benchmark | What it tests | Key finding |
|---|---|---|
| HumanEval | Small, isolated coding tasks such as “write a function that reverses a string.” | LLMs perform very well (above 85% correct for GPT-3.5/4). These tasks require no external context. |
| SWE-bench | Real-world bug fixes drawn from open-source repositories. | Success rates drop below 25%, even for top models. Once multiple files and dependencies are involved, reasoning collapses. |
| RepoBench (ICLR 2024) | Understanding and editing entire repositories rather than single files. | Performance decreases sharply with project size and cross-file links. |
| CodePlan (TOSEM 2024) | Planning multi-step code edits (understanding, proposing, modifying, and verifying). | Models must “think in steps.” Without planning or memory, they get lost mid-edit. |
| Lost in the Middle (TACL 2024) | How well models use very long contexts (tens of thousands of tokens). | Models tend to ignore information located in the middle of long inputs—critical for editing long codebases. |
In plain terms: Models are great when they can focus on a single self-contained problem (a function, a paragraph, an equation). They struggle as soon as they must reason about interconnections—the very fabric of software engineering.
These results empirically confirm what complexity theory predicts:
The more intertwined the context, the more information must be recalled, recomputed, and rewritten—raising both entropy and computational cost.
The difference between writing and editing large codebases can be understood through analogies that bridge computer science, physics, and engineering.
| Perspective | Linear generation | Repository-level editing |
|---|---|---|
| Turing machine | Writing tape sequentially—each symbol depends only on the previous one. | Rewriting linked cells while preserving state—one change ripples through the whole tape. |
| Transformer attention | Sparse and local: focus on a few relevant tokens. | Dense and global: attention must cover many tokens and dependencies at once. |
| Information flow | Low entropy—information flows in one direction. | High conditional entropy |
| Engineering metaphor | Drafting a clean new blueprint. | Rewiring an entire factory while it’s still running. |
Each row is a way of describing the same asymmetry:
Sequential systems (writing) move forward smoothly: one decision after another.
Contextual systems (editing) are constrained backward and sideways by everything that already exists.
From a thermodynamic perspective, editing is like maintaining order in a system full of moving parts: it requires energy just to avoid chaos.
Instead of asking an LLM to rewrite existing code line by line, it is often more efficient to generate a clean replacement module from its specification and re-integrate it.
This keeps conditional complexity
Enhance editing workflows with:
Retrieval-augmented editing (RAE): dynamically fetch only the parts of the repository relevant to the change.
Repository-graph embeddings: pre-compute dependency maps so the model “sees” structure, not just raw text. Both methods reduce unnecessary token consumption and memory use.
Adopt a three-step loop:
Plan — identify what must change and which files are involved.
Edit locally — apply minimal, well-scoped changes.
Validate globally — run tests or consistency checks.
This sequence prevents entropy from spreading through the system—exactly as a good thermodynamic process prevents heat loss.
All these principles exploit the same asymmetry:
Generation lowers entropy by building from first principles, Editing increases entropy by juggling dependencies.
Minimizing the number and scope of edits is therefore not only good software practice—it is good energy practice as well.
Every additional token has a cost — not only in computation, but in energy. When an LLM edits existing code,
it must re-evaluate dependencies, positions, and states: this increases the conditional complexity
K(P'|P) and consumes disproportionately more resources than simple linear generation.
In information-theoretic terms, added entropy becomes extra work; in thermodynamic terms, repeated
unnecessary edits accumulate as heat and emissions.
At scale, millions of avoidable “just one more edit” requests translate into significant power usage. Editing is therefore not only a matter of correctness or productivity — it is also a matter of computational and environmental responsibility.
“Ask only when entropy deserves it.”
Complexity–Conservation Rule (entropy-aware editing)
# Before requesting or applying an edit:
# Estimate ΔK = K(P'|P)_after - K(P'|P)_before per edited token.
if ΔK_per_token > 0:
reject_or_rethink_edit() # edit increases global coupling / dependencies
else:
perform_edit() # edit simplifies, modularizes, or localizes effects
Or in natural language, suitable for both humans and LLMs:
“Does this edit reduce dependencies, or just shift them?
Will it increase K(P'|P) or reduce it?”
In other words, we cannot enforce strict conservation — the second thermodynamical law still holds — but we can enforce a design bias: only edits that lower conditional complexity should be favored and automated. Anything else is not only bad engineering; it is wasted entropy on the planetary budget.
Hartmanis J., Stearns R. E. (1965). On the Computational Complexity of Algorithms. Trans. AMS, 117, 285–306. doi:10.2307/1994208
Vitányi P. M. B. (2022). Information, Complexity, and Meaning. Springer.
Jain S. et al. (2023). SWE-bench: Can LLMs Fix Real Bugs?. NeurIPS 2023.
Zhu Z. et al. (2024). RepoBench: A Repository-Level Benchmark for LLMs.. ICLR 2024.
Bairi R. et al. (2023). CodePlan: Repository-Level Code Editing with LLMs.. ACM TOSEM.
Liu N. et al. (2024). Lost in the Middle: LLMs Struggle with Long Contexts.. TACL.
Torres R. et al. (2022). Entropy of Source Code as a Predictor of Software Evolution. Empirical Software Engineering, 27, 45.
Dao T. et al. (2023). FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. arXiv:2205.14135.

In information theory, the entropy of a random variable
If
and, since most tokenizers encode roughly 1 token ≈ 4 characters, we can approximate the information load of a sequence of
When the model must reason about positions and dependencies (cross-file references, scopes, imports), the entropy grows because each token depends on a larger conditional context. This corresponds to conditional entropy:
which quantifies the extra bits—or extra tokens—needed to describe a modified program
Consider the English word sequence:
“The cat sleeps.”
Using a standard tokenizer (e.g., GPT-2 BPE), it is 4 tokens:
["The", " cat", " sleeps", "."]
If we want to generate this sentence from scratch, the information load is roughly:
Each token has entropy ≈ 6 bits → total ≈ 24 bits.
Now imagine an edit requiring insertion of an adjective (“black”) between The and cat, plus agreement on verb tense:
“The black cat was sleeping.”
The model must:
Insert two tokens (" black", " was") in correct order → additional content entropy ≈ 12 bits.
Re-encode all subsequent tokens with updated positions → each positional vector (≈ 8 bytes/token) must be recomputed.
Propagate tense consistency (“sleeps” → “was sleeping”) → another 8 bits of conditional decision entropy.
Thus, even for this tiny edit, total information load nearly doubles (from 24 → ~44 bits) and positional recomputation affects all following tokens. In long documents—or codebases with hundreds of linked identifiers—the same proportional inflation applies: entropy compounds with the number of tokens that must stay coherent.
Entropy measures how many questions must be answered to make something unambiguous.
Low-entropy writing: starting from nothing, each new token narrows uncertainty; once written, it fixes its own structure.
High-entropy editing: changing one token reopens many questions—where it fits, what it breaks, and how to reconcile all references.
In human terms, entropy is the cognitive and computational load of preserving order while changing detail.
| Situation | Typical tokens processed | Effective entropy | Computational implication |
|---|---|---|---|
| Generate new file from prompt | 200–800 | Low (few dependencies) | Fast, cheap inference |
| Edit function with cross-refs | 2 000–8 000 | Moderate | Quadratic attention cost |
| Refactor multi-module repo | 20 000 + | High | Very expensive and slow |
Hence, entropy translates directly into token count × bits per token, which drives the energy, time, and memory required by the model. Large or branched edits therefore consume far more computational entropy than linear code generation.
Kolmogorov complexity
where
Intuitively:
If the modification touches deeply coupled regions,
Shannon entropy and Kolmogorov complexity coincide in expectation for computable sources:
Thus, we can interpret
Dependencies must be re-specified explicitly in text (imports, signatures, docstrings).
Positional changes cascade—adding one symbol often forces dozens of updated references.
The model must check consistency (syntactic and semantic), implying extra reasoning tokens.
Hence,
Consider two tasks:
Local edit: change a numerical constant
xxxxxxxxxx# Originalthreshold = 0.95# Modifiedthreshold = 0.97Minimal description length: “replace 0.95 → 0.97” (≈ 3 tokens).
Coupled refactor: rename a class and propagate it
xxxxxxxxxx# Beforeclass UserSession: ...# Afterclass AuthSession: ...All calls (UserSession()), docstrings, imports, tests, and configuration keys must change.
For a 50 k-token repository, the LLM must locate and regenerate 500–2000 token spans with consistent semantics.
Thus, while
| Operation | Description | Approx. computational complexity |
|---|---|---|
| Generate a new file from prompt | The model writes code directly from a specification; there are few prior constraints or dependencies. | |
| Edit a single isolated function | The model must reason locally within a bounded scope and preserve syntax; minimal propagation of side effects. | |
| Refactor interdependent modules | The edit touches multiple files, type hierarchies, or APIs; requires cross-module reasoning and positional re-encoding. | |
| System-wide migration (e.g., API version bump) | The entire repository must remain consistent; imports, configs, and tests are rewritten coherently. |
This progression formalizes why LLMs lose efficiency when editing complex systems: the edit description itself becomes as large as the new code.
Kolmogorov complexity measures compressibility of transformation.
If a small, elegant “diff” suffices to move from
For LLMs, high
More tokens must be re-emitted (higher compute cost).
Longer context windows are required (memory cost).
Higher risk of coherence loss or hallucinated rewrites (entropy cost).
Entropy
Therefore, high conditional complexity directly translates into larger prompts, longer inference times, and increased cost—precisely the symptoms observed when LLMs attempt large-scale code modifications.