On (the future of) Git

Team

Blog

Team

Blog

Published on

Jul 10, 2025

Written by

Shri Kolanukuduru

The Semantic Chasm in a Text-Based World

Any developer who has navigated a complex merge conflict has felt the chasm between their logical intent and the textual reality that version control systems manage. Git, for all its power, is a master of text, not meaning. This foundational principle, once a source of elegant simplicity, is starting to feel like a bottleneck.

Take a straightforward refactoring example: an AI agent renames a utility function from utils.get_user() to auth.fetch_profile() and updates all 15 references across the codebase. For a developer, this is logically one atomic action. For Git, however, it's numerous unrelated text changes, creating unnecessary friction and merge complexity.

This isn't just an inconvenience; it's a fundamental conflict between how modern tools think and how our foundational tools work. As we enter an era where agents can write, refactor, and ship code at scale, Git’s text-centric model is on a collision course with agents that operate on the abstract syntax tree (AST in this piece) and the semantic meaning of code.

From Textual Reconciliation to Semantic Merging

At its heart, Git is a content-addressable filesystem — a remarkably efficient model for tracking changes in text. Its content-agnostic design, however, means it has no intrinsic understanding of code structure. It cannot discern that renaming a function's parameters is one logical operation, instead registering it as a series of unrelated line changes. For a human developer, this is a manageable annoyance. For an agent attempting to reconcile its work with a human’s concurrent changes, it’s a source of constant, unnecessary merge conflicts.

This has spurred the creation of a new, essential semantic layer. The first step in this evolution is the structural diff tool. Tools like difftastic use AST parsing to compare code based on its structure, not just its text. They intelligently ignore whitespace or comment changes and can represent a function rename as what it is: a single, comprehensible modification.

But this only scratches the surface. The deeper challenge lies in building true semantic awareness into the conflict resolution process itself. This isn't a hypothetical future; it's an active field of development. For years, tools like SemanticMerge have offered language-aware three-way merging. They parse the code to identify refactors, moved methods, and other changes, allowing developers to resolve conflicts at a logical level. This is the jump from textual reconciliation to true semantic merging, enabling a future where a version control system can reason: "The agent's goal was to rename get_user. The human's goal was to add a new parameter to it. I will apply both changes to the new function, auth.fetch_profile."

Signal vs. Noise: Curating the Agent-Generated Commit History

A (sort of) tension emerges around the integrity of version history. This is best illustrated by contrasting two modern workflows. A developer using Cursor still acts as the primary author, leveraging a model for generation and refactoring but ultimately crafting their own clean, descriptive commits, thus preserving the "signal" of intent.

In stark contrast, an autonomous agent like Devin might be given a high-level task and work for hours, generating hundreds of internal micro-steps (explorations, tests, and corrections). Its final contribution could be a single, massive pull request that represents the squashed sum of that chaotic work. For a human reviewer, this "noise" makes identifying a subtle regression a needle-in-a-haystack problem, rendering the commit history almost useless for auditing or debugging. The solution is clearly not to abandon automated agents, but to demand they adhere to human-centric standards. That is, I think something like an "agent-curated history" is essential.

In modern CI/CD, tools like semantic-release already automate versioning from structured commit messages. This concept can be inverted for agents: a higher-level system must observe an agent's workflow, then squash and synthesize its micro-commits into a single, coherent message that describes the feature or fix. This isn't reinventing the wheel; it's targeting existing standards like Conventional Commits (feat:, fix:, refactor:). The ability for an agent to produce a clean, machine-readable commit will become a non-negotiable requirement for its adoption in any serious engineering organization.

The Architectural Path: Git as a Kernel and Control Plane

This brings us to the core architectural question: should we evolve Git, layer intelligence on top of it, or rebuild from the ground up? While radical alternatives like Pijul, which uses a "theory of patches" to elegantly sidestep entire classes of merge conflicts, are theoretically appealing, Git's massive incumbency makes a ground-up replacement impractical.

The most pragmatic and proven path is the "Git as a kernel" model. When faced with the challenge of versioning large binaries and ML models, the community didn’t abandon Git; it built powerful layers like Git LFS and DVC on top of it. These tools use Git to version lightweight pointers while storing the heavy assets elsewhere, proving that Git's core can serve as a robust foundation for specialized, high-level tooling.

This layered approach also reframes Git's role from a simple versioning tool to a critical control plane for AI governance. In this model, Git's own features become the non-negotiable API for managing agent collaborators. A pull request from Devin or Claude Code would be blocked by default via branch protection rules. Merging would only be permitted after a series of automated checks are passed: unit tests, security scans, and, crucially, a sign-off from a review agent like Graphite or Greptile. These AI reviewers, with their full-repository context, act as the semantic gatekeepers. This ensures that Git, augmented with these intelligent layers, becomes the final, trusted arbiter of quality, enforcing standards on human and agent contributors alike.

This layered evolution allows us to benefit from decades of investment in Git's stability and tooling while seamlessly integrating the semantic intelligence required for an agent-driven future. It's a pragmatic path that harnesses the elegance of new theories without forcing the entire industry to abandon the stable, universal foundation that Git provides. The future of version control isn't a replacement for Git; it's a smarter, semantic layer built upon it.

Let's build something together.

build@categoryvc.com

Let's build something together.

build@categoryvc.com

Let's build something together.

build@categoryvc.com

Disclosures

Disclosures