The New Software Development Lifecycle (SDLC): From Vibe Coding to Agentic Engineering
From vibe coding to agentic engineering: how AI rewrites the software development lifecycle, making specs, verification and judgment the new craft.
For most of computing history, programming has been an act of translation: understand the problem, design a solution, then render it in syntax a machine can execute. Every step introduced friction. That friction is now collapsing - and a recent Google whitepaper (The New SDLC With Vibe Coding, by Addy Osmani, Shubham Saboo and Sokratis Kartakis) argues that the most significant shift in our field isn't a new language or framework.
It's the move from writing code to expressing intent, and trusting capable systems to turn that intent into working software.
If you lead a team or own architecture, this is the part worth internalizing: the bottleneck is no longer typing speed 😄.
It's how well you specify, constrain, and verify what the machine produces.
I've spent years of leading engineering teams and owning architecture for SaaS platforms, and over the past year this shift has quietly rewritten how I spend my days. So this isn't theory for me - it's what I've watched change firsthand.
Here's a condensed, architect-oriented walkthrough of the ideas that matter most 👇
The spectrum, not the binary
"Vibe coding" entered the vocabulary in early 2025: describe what you want in natural language, accept the output, paste the error back when something breaks, repeat.
The term went viral because it named something many developers were already doing - but it got stretched until it lost meaning.
By 2026 even its originator conceded the framing was too narrow and introduced agentic engineering for the disciplined end of the scale.
The useful mental model is a spectrum rather than a switch. The differentiator is not whether you use AI - almost everyone does now - but how much structure, verification, and human judgment surround the output:
- Vibe coding: casual prompts, "does it seem to work?", minimal codebase understanding. Fine for prototypes, scripts, hackathons, throwaway code.
- Structured AI-assisted coding: detailed prompts with constraints, manual testing, selective review of critical paths. Suited to features inside an established codebase.
- Agentic engineering: formal specs, architecture docs, memory files, automated test suites, CI/CD gates and LLM judges, comprehensive architectural review. The mode for production systems at team scale.
The single biggest separator between the ends is how outputs get verified.
Tests cover the deterministic parts (this input yields that output). Evaluations ("evals") cover the non-deterministic parts: did the agent take a sensible trajectory, pick the right tools, and meet the quality bar?
Without both, you're vibe coding - no matter how sophisticated your prompts look.
The practical takeaway: the right spot on the spectrum depends on the stakes.
A weekend prototype can be pure vibes. A payment API cannot.
The skill is knowing where to draw that line for each task - and making the boundary explicit on your team, so prototypes don't ship to production by accident.
Context engineering is the real skill
As the field matured, one insight kept surfacing: output quality depends far less on clever prompts and far more on the quality of the context you provide.
The discipline of supplying agents with rich, structured information about your codebase, conventions and intent has a name now - context engineering - and it's arguably the core competency of this era.
There are six kinds of context worth deliberately managing: instructions (role and boundaries), knowledge (docs, diagrams, domain data), memory (session logs plus long-term project state), examples (reference patterns), tools (the APIs and scripts the agent can call), and guardrails (hard constraints and safety rules).
The key architectural decision is what's static versus dynamic:
- Static context is always loaded - system instructions, rule files (
AGENTS.md,CLAUDE.md,GEMINI.md), global memory, persona. Reliable, but expensive: every token is present in every interaction. - Dynamic context is loaded on demand - skill instructions triggered by the task, tool results, retrieved documents, windowed history. Efficient: you pay the token cost only when you need it.
Too much static context wastes tokens and dilutes signal; too little and the agent forgets critical rules. Treat that boundary as a first-class design decision - reviewed and versioned like any other configuration.
The most powerful pattern for the dynamic side is Agent Skills: portable packages of procedural knowledge the agent loads only when a task calls for it.
Through progressive disclosure, a lightweight generalist can carry dozens of specialist capabilities while paying for only the one it's actively using.
The mental shift is from "how do I trick the AI into writing good code?" to "what would a new team member need to know to contribute well, and how do I encode that?"
How AI reshapes each SDLC phase
AI compresses the life cycle dramatically but unevenly.
Implementation that took weeks can take hours; requirements, architecture and verification remain stubbornly human-paced.
The result isn't a faster waterfall - it's a different shape of work:
- Requirements & planning stop being a hand-off document and become a conversation that produces a spec and a working prototype simultaneously. Specification quality becomes the new bottleneck.
- Design & architecture stay the most human-centric phase. The hard trade-offs - consistency vs. availability, build vs. buy - depend on business context AI can't fully grasp. AI excels at implementing an architecture once it's decided; the developer's job shifts to making and documenting those structural decisions.
- Implementation sees real gains, but the picture is nuanced. One METR study found experienced developers were sometimes slower on familiar tasks, because the time moved into verifying, debugging and correcting output. AI doesn't remove implementation work so much as transform it from writing into reviewing and guiding.
- Testing & QA now means evaluating both what was produced and how it got there - output evaluation plus trajectory evaluation. Tests and evals become the primary way to communicate intent to an agent.
- Code review & deployment gain AI as a first-pass reviewer for bugs, style and security, reducing reviewer load without replacing human judgment on design and maintainability.
- Maintenance is the most underrated win. Legacy code that was "too risky to touch" because only its authors understood it can now be navigated, refactored and migrated - work that was previously so tedious it simply never happened.
The factory model and the harness
Two framings tie all of this together.
- The factory model: the developer's primary output is no longer code - it's the system that produces code. Specs and context define what to build; agents translate them into implementation; tests and quality gates verify; feedback loops route failures back; guardrails keep behavior safe. A factory manager doesn't hand-assemble every widget - they design the line and own quality control. Give agents success criteria, not step-by-step instructions, then let them iterate.
- The harness: a raw model is not an agent. It becomes one when scaffolding gives it state, tool execution, feedback loops and enforceable constraints. The whitepaper's tidy equation is Agent = Model + Harness. The harness includes instruction and rule files, tools and MCP servers, sandboxes, orchestration logic, guardrails/hooks, and observability - and crucially, all of that is your surface area, not the model provider's.
Why this matters for leaders: the harness effect is large and measurable. On the Terminal Bench 2.0 benchmark, one team moved a coding agent from outside the Top 30 into the Top 5 by changing only the harness - no model swap. So when an agent misbehaves, the reflex to blame the model is usually wrong. More often the failure traces to a missing tool, a vague rule, an absent guardrail, or a context window stuffed with noise. Most agent failures, examined honestly, are configuration failures.
Two modes for the developer: conductor and orchestrator
The role is evolving, not disappearing. Most developers now move fluidly between two modes:
- Conductor - real-time, in-IDE, keystroke-level control. You watch code appear and steer with prompts and corrections. Best for complex logic, tricky debugging, and unfamiliar codebases. The risk: if you direct every keystroke, you become the bottleneck.
- Orchestrator - asynchronous, higher abstraction, multi-agent. You define goals, delegate to agents working in the background (often in parallel), and review outcomes rather than keystrokes. Best for well-specified work: bug fixes, migrations, test generation.
Orchestrating well demands a different skill set than syntax fluency: specification, decomposition, evaluation, and system design.
Both modes run into the 80% problem: agents quickly generate ~80% of a feature, but the remaining 20% - edge cases, error handling, integration points, subtle correctness - needs deep context the models often lack. And the nature of AI errors has shifted from obvious syntax mistakes to insidious conceptual ones: wrong business-logic assumptions, missed edge cases, architectural choices that quietly accrue maintenance debt. These are harder to catch precisely because the code looks right and may pass basic tests. The developers who thrive use AI for what it's good at and reserve their own attention for what it isn't.
The economics: it's about TCO, not velocity
For engineering leaders, the more honest metric than "how fast can we ship?" is total cost of ownership - and in the AI era, operational cost is dominated by the token economy.
- Vibe coding looks cheap (low CapEx, just a subscription) but hides a compounding OpEx burden: a high token burn rate from low first-pass success, a "maintenance tax" from unstructured AI-generated spaghetti, and expensive security remediation when un-evaluated code ships vulnerabilities to production.
- Agentic engineering flips this (high CapEx, low OpEx): you invest upfront in schemas, deterministic tests and structured context, and the marginal cost of shipping and maintaining each feature drops sharply.
Two levers move the OpEx needle: context engineering (a dense, high-signal payload raises first-pass success and avoids costly retry loops) and intelligent model routing (use frontier models for hard tasks like architecture, route deterministic work like test generation and CI monitoring to smaller, cheaper models).
Where to start
If you're a tech lead or architect, a few moves compound quickly:
- Make context engineering a first-class practice. Treat
AGENTS.md, system prompts, eval suites and skill libraries as code - reviewed in PRs, versioned, owned by named engineers. Otherwise the harness drifts and behavior becomes irreproducible. - Set the bar at the eval, not the demo. A demo proves an agent can succeed once; a passing eval suite with an explicit rubric proves it succeeds reliably. Gate shared-workflow agents on eval coverage the way you gate services on test coverage.
- Re-shape code review for generated code - extra scrutiny on hallucinated dependencies, weak error handling, and correctness gaps that look fine at a glance.
- Distinguish prototyping from production in your norms, so exploration stays fast and production stays disciplined.
- Invest in the harness as a shared asset. Reusable prompts, skill libraries, MCP connections and eval harnesses compound across projects. The teams that win build the harness once and refine it many times.
The bottom line
Three principles look durable: structure scales, vibes don't; AI amplifies your existing engineering culture - multiplying both strengths and weaknesses; and the human role is evolving, not shrinking - toward specification, evaluation, architectural judgment and system design.
Or, as the whitepaper puts it: Generation is largely solved. Verification, judgment, and direction are the new craft.
Comments ()