LLM Coding Agents: What Claude and Codex Change for Engineers
A noticeable shift has happened in software engineering.
AI coding tools are no longer only autocomplete helpers. The better agents can now inspect a codebase, make changes across files, run tests, fix failures, and continue working towards a goal. That changes the developer’s role from writing every line to specifying, supervising, reviewing, and correcting.
This is not a small productivity tweak. For many engineers, it is the biggest workflow change in years.
The uncomfortable part is that the new workflow works well enough to be hard to ignore, but not well enough to be trusted blindly. That is the main tension.
LLM agents can save hours. They can also produce confident, complicated, subtly wrong code. The teams that benefit will be the ones that learn how to use them without giving up engineering judgement.
The workflow has moved from typing to directing
Earlier AI coding tools mostly helped inside the developer’s existing flow.
You typed code. The tool completed a line, suggested a function, or explained an error. The developer still drove the work through the IDE.
Coding agents change the unit of work.
Instead of asking for a function, you can ask for a task:
“Add password reset using the existing email service. Follow the current auth patterns. Add tests. Do not change login behaviour.”
The agent may read the auth module, edit several files, add tests, run them, and summarise the change.
That is a different style of programming. The developer is still responsible, but the work starts to feel less like writing and more like directing a fast assistant.
A practical example:
Suppose you need an admin feature to export employee data as CSV.
A weak instruction would be:
“Add employee CSV export.”
A better instruction would be:
“Add CSV export for the employee list in the admin dashboard. Include employee ID, name, department, location, and joining date. Use the existing admin authorisation pattern. Do not include salary, PAN, personal email, or phone number. Add tests for access control and CSV escaping.”
The second prompt works better because it carries engineering judgement. It tells the agent what matters, what to avoid, and how success should be checked.
This is where experienced developers get more out of these tools. They know which details are dangerous to leave unstated.
“Programming in English” still requires engineering skill
It is tempting to describe this as programming in English. That is partly true, but casual English is not enough.
Good agent instructions are closer to short technical specifications.
For example:
“Add retry logic.”
This leaves too much open. Retry what? Which failures? How many times? What about duplicate requests?
A stronger version:
“Add retry logic for payment gateway calls. Retry only on network timeouts and gateway 5xx errors. Do not retry declined cards or validation errors. Use exponential backoff with a maximum of three attempts. Keep requests idempotent. Add tests for timeout, gateway failure, declined card, and duplicate callback.”
That is not prompt cleverness. That is clear engineering.
The agent can fill in syntax, framework details, and repetitive code. It cannot reliably infer business policy, operational constraints, or risk tolerance unless those are given to it.
When the instruction is vague, the agent does not always stop and ask. It often chooses an assumption and continues.
That is useful for speed. It is dangerous for correctness.
The mistakes are more subtle now
The older generation of AI coding mistakes was easy to spot: broken syntax, missing imports, wrong API calls.
The newer mistakes are more like the mistakes of a rushed junior developer.
The code may compile. It may pass basic tests. It may look clean in a quick review. The problem is usually conceptual.
Common failure modes include:
- assuming a business rule that was never confirmed
- changing unrelated code while fixing a small issue
- adding abstractions that are not needed
- removing comments or checks it does not understand
- creating a large solution for a small problem
- following the wrong pattern from the codebase
- writing tests that confirm the current implementation rather than the intended behaviour
Consider this instruction:
“Add support for multiple addresses per customer.”
An agent may create tables, APIs, UI changes, migrations, and tests. On paper, it has done the task.
But an engineer still has to ask:
- Is billing address different from delivery address?
- What happens to old orders?
- Does invoice generation expect one legal address?
- Do downstream ERP or finance systems read the old field?
- Which address is used for tax calculation?
- Can a customer delete an address used in a past order?
These are not syntax questions. They are system questions.
Current agents often do not surface these trade-offs unless pushed. They act before they clarify.
That is why the “no IDE needed” view is premature. If the code matters, you still need a proper IDE, a diff view, and a human watching carefully.
A sensible setup: agent on one side, IDE on the other
The most practical workflow today is not to hand over the codebase and walk away.
A better setup is:
- agent sessions in terminal windows or tabs
- IDE open beside them
- agent handles bounded code actions
- developer reviews diffs, edits manually, and runs checks
A good sequence looks like this:
- Ask the agent to inspect the relevant files.
- Ask for a plan before it edits anything.
- Review the plan.
- Let it make a small change.
- Inspect the diff.
- Ask it to simplify if it overbuilds.
- Run tests yourself.
- Commit only what you understand.
This sounds slower than full automation. In production work, it is usually faster because it prevents the agent from spending thirty minutes building the wrong thing.
A useful prompt is:
“Inspect the code and propose a plan. Do not edit files yet. Mention any assumptions or unclear requirements.”
Even if the agent does not catch everything, this step forces a pause before code starts changing.
That pause is valuable.
Agents are unusually good at persistence
One reason these tools feel different is stamina.
A human developer gets tired, bored, or frustrated. An agent does not. It can keep running tests, reading failures, editing code, and trying again.
This helps in tasks that have a clear success condition.
For example:
“Write tests for this parser using these five input-output examples. Then update the parser until all tests pass. Do not weaken the tests.”
Or:
“Reproduce the login issue in the browser, inspect the console error, fix the cause, and verify the login flow works.”
When the agent has access to tools — test runner, browser automation, logs, file search — it can loop. That loop is often where the value appears.
The trick is to give it a target rather than micromanaging every step.
Instead of:
“Change this line, then change that function.”
Try:
“The API must return these exact responses for these cases. Add tests first. Then make the smallest implementation change needed.”
This uses the agent better. You define success. It works towards it.
The gain is not only speed
People often ask, “How much faster does this make developers?”
That question misses part of the change.
Yes, agents can reduce the time needed for routine work. But the bigger shift is that engineers start doing things they would previously have skipped.
Examples:
- writing a small internal dashboard instead of repeating manual queries
- adding tests around legacy code before touching it
- building a throwaway prototype to check a product idea
- writing a migration helper script
- exploring an unfamiliar library or framework
- cleaning up low-grade duplication that nobody had time for
This is expansion, not just acceleration.
Work that felt too small to justify now becomes worth trying. Work that felt blocked by lack of familiarity becomes approachable.
That is useful, but it has a cost.
More generated software means more software to maintain. Every internal tool can become someone’s dependency. Every quick script can fail at month-end. Every prototype can quietly become production.
So the question should not be only:
“Can we build this faster?”
It should also be:
“Should this exist, and who will own it after it exists?”
Where coding agents work well
Agents are strongest when the task is clear and the codebase already has patterns to copy.
They are useful for repetitive implementation:
- adding another CRUD endpoint
- creating request validators
- wiring a new route
- generating DTOs
- adding standard logging
- writing first-pass tests
- converting one data format to another
For example:
“Add a vendors module following the structure of the existing customers module. Include list, create, update, and deactivate endpoints. Use the same validation and error response patterns. Do not add new dependencies.”
This is a good task because the agent has a nearby model to follow.
Agents are also helpful for small refactors:
“Extract invoice due-date calculation into a pure function. Keep behaviour unchanged. Add tests for monthly, quarterly, and custom payment terms.”
The scope is clear. The expected result is reviewable.
They are useful for onboarding too:
“Explain how order cancellation works in this codebase. List the main files, state transitions, and external systems. Do not make changes.”
This can save time. Still, treat the answer as a starting map. Verify it against the code before making decisions.
Where they need close control
Some areas are too risky for casual agent use.
Business rules
If the rule is not documented, the agent will often invent a clean version.
Take late payment charges. The code may need to know:
- whether weekends count
- whether enterprise customers have different terms
- whether GST applies
- how partial payments are handled
- whether support can waive charges
- whether old contracts follow older rules
The agent cannot guess this correctly. Someone in the organisation must know or decide.
Security
Authentication, authorisation, payments, encryption, and privacy require strict review.
An agent may write code that works for the normal case but fails under abuse.
Examples:
- checking whether the user is logged in, but not whether they own the record
- exposing fields that should remain private
- trusting client-side roles
- logging tokens or personal data
- skipping webhook signature verification
- adding broad admin access
Generated security code should be treated as a draft. Never as proof.
Data migrations
Migrations are risky because mistakes can be hard to reverse.
A migration that works on a local database may lock a large production table, mishandle null values, or fail halfway through.
A safer instruction:
“Propose a migration plan first. Include batching, rollback, locking risk, null handling, and verification steps. Do not write the migration yet.”
For data changes, the plan matters as much as the code.
Architecture
Agents can edit many files. That does not mean they understand the system.
“Make this scalable” is not a meaningful instruction.
Scalable for what?
- more users
- more tenants
- larger data
- lower latency
- lower cost
- better fault isolation
- faster deployments
Each answer leads to different architecture.
Agents can help compare options, but humans must make the trade-off.
The bloat problem is real
A common issue with agent-generated code is overbuilding.
You ask for a small change. The agent creates a new service, interface, factory, config layer, helper class, and error hierarchy.
The result looks organised, but it is harder to maintain.
For example:
“Validate an optional referral code.”
This may need a simple function. The agent may produce a mini-framework.
A useful response is:
“This is too much. Reduce it to the smallest change that fits the existing code style. Do not introduce new abstractions unless required.”
Often the agent will immediately cut the solution down.
That tells us something important: agents can generate complexity faster than humans can review it. Simplicity has to be actively demanded.
Good review questions:
- Why did this file change?
- Can this be done with less code?
- Is this abstraction needed today?
- Did it remove or alter unrelated code?
- Is there dead code left behind?
- Does this follow the existing style?
The best AI-assisted code is often not the first output. It is the second or third version after a human has pushed it towards clarity.
Skill atrophy should be taken seriously
There is a personal risk here.
Writing code and reading code are different mental skills. If an engineer mostly reviews generated code, their ability to write from scratch may weaken.
That may seem acceptable until production breaks and the agent is also confused.
Debugging still needs first-principles thinking. You may need to understand memory, concurrency, database locks, network failures, runtime behaviour, or build systems. If those muscles weaken, the tool becomes a crutch.
The answer is not to avoid agents. The answer is to practise deliberately.
Useful habits:
- write small pieces manually from time to time
- debug for a few minutes yourself before asking the agent
- read every generated diff carefully
- explain the code before merging it
- ask the agent to teach, not only produce
- keep learning the language and runtime
A calculator does not remove the need for number sense. Coding agents do not remove the need for system sense.
More output will make judgement more valuable
As generation gets cheaper, the volume of digital material will rise: code, tutorials, packages, papers, posts, demos, internal tools, and half-finished projects.
Some of it will be useful. Much of it will be polished rubbish.
For software teams, this means the filtering problem gets harder.
There will be more GitHub projects that look credible but are weak. More libraries that solve narrow problems badly. More documentation that sounds clean but misses operational detail. More portfolios that show output without proving understanding.
The scarce skill will be discrimination.
Can you tell whether the code is safe?
Can you see when a tutorial is shallow?
Can you spot a library that will become a maintenance burden?
Can you separate working demos from production-quality systems?
AI increases production. Engineering leaders must increase review quality.
If code volume rises but review discipline stays the same, systems will get worse.
What happens to strong engineers?
There is a popular question: will AI reduce the gap between average and excellent engineers?
It may help weaker engineers produce more working code. It may help juniors understand unfamiliar areas faster. It may help generalists cross boundaries.
But it can also widen the gap.
Strong engineers know what to ask for. They see subtle errors. They simplify bloated designs. They understand production consequences. They can connect a small code change to a larger system risk.
A weaker engineer may accept generated code because it looks right.
A stronger engineer asks why it is right, where it fails, and whether it should exist.
The same tool in different hands produces different outcomes.
That is likely to become more visible, not less.
How teams should adapt
Teams do not need dramatic new rituals. They need better discipline around the workflow.
Ask for plans before large edits
Before the agent changes code, ask what it intends to do.
This catches wrong assumptions early.
Keep changes small
AI makes large diffs easy. Large diffs are hard to review.
A simple rule helps:
If the diff is hard to review, the task was too large.
Break work into smaller changes.
Review generated code like human code
Generated code does not get a lower standard.
Check correctness, simplicity, security, test quality, naming, maintainability, and unwanted side effects.
Improve internal documentation
Agents perform better when the codebase gives clear signals.
Useful documents include:
- architecture notes
- API conventions
- testing patterns
- error handling rules
- security guidelines
- examples of preferred implementations
If the codebase is inconsistent, the agent will learn the inconsistency.
Make ownership explicit
The rule should be clear:
If you merge it, you own it.
It does not matter whether the code came from Claude, Codex, autocomplete, Stack Overflow, or your own hands.
Production responsibility stays with the engineer and the team.
A practical way to use coding agents
For most serious work, use four steps.
1. Specify
Write the task clearly.
Example:
“Add invoice reminder emails at 7, 15, and 30 days overdue. Use the existing daily billing job. Send only one reminder per stage. Do not create a new scheduler. Add tests for each reminder stage and for already-sent reminders.”
2. Plan
Ask the agent to inspect and propose.
Example:
“Show the files you plan to change and why. Do not edit yet.”
3. Implement in small pieces
Let the agent change a limited part. Review before moving on.
Example:
“Implement only the reminder selection logic first. Do not send emails yet.”
4. Verify
Run tests. Add missing cases. Check the behaviour manually where needed.
Example:
“Add tests for duplicate reminders, already-paid invoices, and invoices below the overdue threshold.”
This keeps the agent useful without letting it take over the engineering decision.
The real change
The important point is not that AI can write code. That has been true for a while.
The important point is that coding agents can now work across a codebase with enough coherence to change how serious engineers spend their day.
They can take larger tasks. They can loop on failures. They can reduce drudgery. They can make unfamiliar code approachable. They can also create subtle bugs, unnecessary abstractions, and more software than teams are ready to maintain.
The future workflow is not “AI replaces engineers”.
It is closer to this:
- humans define the right problem
- agents produce candidate solutions
- humans simplify, verify, and own the result
The engineer’s value moves upward. Less time typing every line. More time deciding what should be built, how it should behave, what can go wrong, and whether the generated solution is good enough to carry into production.
