The real question isn't "can it generate code?"
We all know it can. That stopped being impressive a while ago. The question is: where does the intelligence actually live? Most people's instinct is to dump everything into one giant prompt: context, requirements, constraints, the works, and let a heavy model sort it out. In practice, that produces inconsistent results and is impossible to trace or reproduce.
A better approach is to spread the logic across the workflow. Smaller, focused prompts. Skills and agent instruction files that define how tasks should be executed. Artifacts at every step that you can actually go back to. The prompts stay compact; the guidance lives in the structure.
Artifacts are the thing most people skip
Here's the uncomfortable truth about AI chat sessions: they're temporary. No memory, no trail, no way to hand off to a colleague or pick up after a context window runs out.
The fix is treating every major step as something that produces a real, inspectable output. Requirements refinement produces a PRD. Planning produces an implementation plan. Code review produces a review document. Pull request generation consumes all of the above.
This isn't overhead. It's what makes the workflow reproducible. If a developer joins a project mid-stream, or a session breaks, or you want to know why something was built a certain way, the artifacts are the answer. Traceability and auditability, without the post-hoc documentation nobody ever writes.
What the workflow actually looks like
We walked through a real example: adding an advisory recommendation layer to an existing paint calculator app. Simple feature, but the point was to show every step, not just the code generation part.
- 1
Refinement: Start from a vague stakeholder request and run it through a requirements refinement prompt. Output: a structured PRD with scope, acceptance criteria, open questions, and explicit out-of-scope constraints. That last part matters more than it sounds. AI has a strong instinct toward scope creep. Defining what you're not building is as important as defining what you are.
- 2
Tickets: Generate Jira tasks from the PRD using the Atlassian MCP. The goal here isn't to have AI write tickets so you don't have to. It's consistency. If everyone on the team uses the same skill to generate tickets, the tickets look the same, contain the same structure, carry the same labels. Predictable input for the next step.
- 3
Implementation plan: Before a single line of code gets written, generate a plan document from the ticket. This becomes your source of truth. You can check it, correct it, regenerate it with a different model if needed. The implementation agent works from this plan, it doesn't improvise.
- 4
Implementation: The orchestrator delegates to a specialized coder agent. It runs tests as it goes. If something fails, it tries to fix it in the same session. When it's done, it automatically marks the ticket in progress, because that's in the skill definition.
- 5
Local review: Before pushing anything, a review agent reads the artifacts from earlier steps alongside the git diff. It checks the code against the acceptance criteria, flags scope issues, notes risks. It’s not just a code review but a delivery review, grounded in what was actually agreed.
- 6
Pull request: A dedicated PR skill generates a description that includes what changed, why, how to test it, and what review artifacts exist. The reviewer sees everything they need without having to go hunting.

Human in the loop isn't optional
None of this runs unattended. At every step, a person verifies the output before moving forward. The refinement document gets checked. The implementation plan gets read. The review findings get acted on.
This is intentional. The value isn't in removing human judgment, it's in reducing the cognitive overhead so judgment can be applied where it actually matters. You stop worrying about whether the ticket has all the right fields. You focus on whether the plan makes sense. The agents handle consistency. You handle correctness.
Skills and agent files are the infrastructure layer
An agents.md file in your repository is where you define the rules your agents follow: coding conventions, architectural decisions, testing expectations, naming standards. Write it once, and every prompt that runs in that project inherits those constraints. You don't need to re-explain your stack on every interaction.
Skills are reusable prompt templates for specific tasks; create PR, generate implementation plan, refine requirements. The vision is a shared skill library across Two Point O projects: consistent ticket structure, consistent PR descriptions, consistent review format, regardless of which project or which developer triggers them. We're not there yet. But the architecture for it is clear.
What this actually changes
Teams that apply AI only at the code generation step get incremental speed improvements on writing code. Teams that apply it across the full delivery workflow get something different: fewer handoff failures, better traceability, more consistent output, and a junior developer who can follow a structured process that a senior developer designed.
The code generation part is almost a side effect. The real gain is in the workflow scaffolding around it. That's where we're focusing next.
Curious how this could work in your team's delivery cycle? We're happy to walk you through it.
Book a call with us