What four developers taught me about AI coding tools in 48 hours

Four developers, four working styles, one quiet conversation about AI coding tools

Four developers, four working styles, one conversation about how AI coding tools actually land in practice.

I work in customer-facing engineering. My job is to help teams adopt new infrastructure software in production environments where mistakes are expensive. So when AI coding tools started reshaping how my friends and colleagues actually build software, I wanted to understand the practical reality — not the hype.

Over 48 hours this week, I sat down with four developers. They use different tools, work in different stacks, and have very different temperaments. I asked them how they actually use AI in their work, what's gotten better, what still doesn't, and what they wish their organizations would do differently.

What I heard surprised me — not in any one detail, but in the shape of it. There are patterns here that the marketing pages don't capture, and that engineering leaders should probably hear before they decide how to roll any of this out.

Who I talked to

I deliberately picked four people who would disagree with each other. Anonymized:

The Independent — Self-employed developer based in India. Multi-tool user (Copilot, Claude Code). Pragmatic, broad use across web and infra work.
The Newcomer — Software developer at a media company. Newer adopter, prefers the Codex desktop app. Values UI separation and visibility.
The Skeptic — Senior software engineer. Full-stack JavaScript. Opinionated about code style. Uses Claude Code, Cursor, Zed, Gemini — pits them against each other.
The Architect — Principal architect at a B2B software company. POC-heavy role: tech exploration, design, integration. Uses Claude Code, Cursor, Gemini.

I didn't survey them. I asked them to talk, and I listened. What follows is a synthesis of the patterns that emerged.

What everyone agreed on

Three things came up in every conversation — and they're the things AI marketing tends to skip past.

1. The bottleneck moves from typing to judgment

Every single one of them said some version of "80–85% of code now writes itself." But none of them said this was an unalloyed win. The cost moves elsewhere: scoping, context-setting, deciding what shouldn't be built, reviewing what got generated. As The Skeptic put it: "AI was like an over-eager intern. I want to review what you're thinking — not undo what you've done."

This matters for engineering managers because it means productivity gains don't show up where you'd expect. Lines-of-code metrics will tell you the team is faster. The reality is the work shifted up the seniority stack — senior engineers became reviewers and context-setters; junior engineers got to ship more boilerplate; PMs started writing code they don't always understand the implications of.

2. POCs are easy. Production is still hard.

All four developers drew a sharp line between greenfield exploration and production code. The Independent: "POCs, you can let it go crazy." The Skeptic: "In a mature codebase that's well documented, I might be open to agentic coding." The Architect: "We cannot just let AI push to production. We had many issues."

The pattern: the same tool needs completely different organizational scaffolding depending on context. Indie devs and prototypers welcome autonomy. Teams shipping production code want trust ladders, review gates, observability, and clear ownership. The deployment problem isn't installing the tool — it's building that scaffolding.

3. Trust is the gating factor

Adoption follows trust, and trust follows transparency and control. The Newcomer told me what he loves most about his daily-driver agent: "I can see which branch it's working on, and which changes are persistent." The Architect was more cautious: "It seems very confident always. What if it's 5% wrong? Maybe we need more verification loops."

When agentic incidents happen — and they do; one developer pointed me to a recent post about an AI agent that deleted both a production database and its backups — enterprise adoption resets by months. Trust is fragile and expensive. The tools that protect it through visibility, sandboxing, and rollback are the ones that win the cautious adopter.

Where they disagreed: the curious vs cautious divide

This was the strongest pattern I found, and it didn't appear in any of the marketing materials I read after.

There are roughly two developer archetypes when it comes to AI coding tools:

Curious devs — junior, exploratory, working on greenfield or non-critical code. They welcome agentic behavior. They want speed. They will accept some slop because they can undo and retry. The AI gives them leverage.
Cautious devs — senior, opinionated, working on mature codebases or high-stakes systems. They want review-first interaction, customization, and control. They reject over-eager generation. The Skeptic's framing was perfect: "I'm curious by default, but for my stack I have to be cautious — it can go completely in the wrong direction."

Adoption strategy has to match the archetype. You can't sell autonomy to a senior dev and you can't ask a junior to write meticulous context discipline. And here's the subtle part: in most organizations, the senior, cautious dev is the gating audience. They're the hardest to win — but their buy-in unlocks the broader org. Win them with customization, PR-review use cases, and lower-trust entry points. Not with autonomy pitches.

If you're an engineering leader rolling out AI tools, your first question isn't "which tool?" — it's "which archetype am I solving for first, and what's my adoption ladder for the others?"

The non-obvious insight: the harness matters as much as the model

The Skeptic said something I haven't been able to shake. She'd tried the same underlying model wrapped in two different tools. The experience was completely different. "Everything that wraps around the model is harness."

The harness is the prompt structure, the context management, the memory, the tool calls, the UI, the customization layer (rules, skills, project-specific behaviors). It's the entire scaffolding around the model that turns it into something a developer can actually use.

This is why she can use the same underlying model in two products and have one feel like a brilliant collaborator and the other feel like a broken intern. It's also why, when an enterprise customer asks "which AI coding tool should we adopt?", the right answer is rarely about the model. It's about the harness — and how well that harness fits your codebase, your conventions, your trust requirements, and your developers' archetypes.

For tool vendors, this means the model is table-stakes — and the harness is where the war is fought. For customers, it means evaluation has to include real workflows, not feature lists.

The enterprise gap nobody is filling well yet

The most striking thing I heard came from The Architect. Asked about how her company is approaching AI tools, she said:

"There's no centralized effort to bring AI skills, no initiative to use AI in a more standardized way. Getting approvals creates many bottlenecks. I am using AI in my individual way. At the org level, we don't have any streamlining of how to use AI."

This was a sophisticated, security-conscious engineering organization. And yet, on AI tools, every developer is essentially figuring it out alone. There are no playbooks. No model selection guidance. No standardized prompt libraries or context templates. No clear governance on what data can be fed to what tool. No measurement of outcomes. No skill-building program.

This isn't unique to her company — I heard echoes of it from the others. The gap between individual developer adoption and org-level enablement is wide. And it isn't a tool problem. It's a deployment problem.

It's the problem of:

Access governance for agents — what can the AI touch, where, with what guardrails?
Tool selection at scale — when teams want different tools, who decides and how?
Cost optimization — token economics in a world where the "best" models burn through budgets quickly.
Integration approval — every MCP server or plugin needs security review. Today this is a bottleneck, not a feature.
Code review at higher volume — more generated code means more review burden, which means new review tooling and policies.
Observability + ops scaling — if you double your code throughput, your ops practice has to keep up or you'll drown in incidents.
Skill-building and developer enablement — most developers learned this by themselves. That doesn't scale.
Outcome measurement — beyond "are people using it," what does AI ROI actually look like?

These are real organizational problems, and most enterprises are nowhere near solved on any of them.

Closing thought: adoption is a discipline, not an install

Here's what I keep coming back to.

AI coding tools are not a productivity install. They're a workflow change, an org-design problem, and a culture shift — all running simultaneously. The default output of these tools in a complex codebase isn't value — it's slop. Organizations that mandate "use AI" without scaffolding scale slop at speed, not productivity.

But the failure mode isn't AI. It's the absence of the discipline that turns AI from a slop generator into a value generator. That discipline includes everything the four developers told me about: context discipline, trust ladders, review practices, archetype-matched rollout, customization, governance, observability, and adoption sequencing.

If you're an engineering leader thinking about this, my unsolicited advice is to do what I did this week. Talk to four developers. Don't survey them. Don't send them a questionnaire. Listen. The patterns will tell you what to build and in what order — and you'll be more useful to your engineering teams in 48 hours of listening than in six months of evaluating tools.

Thanks to the four developers who shared their time. I learned a lot.

If you're working on this problem inside your organization and want to compare notes, I'm easy to find. — Kapil