Agentic Product Management
When the system refuses to let you ship a feature you haven't actually decided on.
A few weeks ago our refinement system blocked our team for a day.
We'd been working on our in-app notification center. The human PM had written the proposal, the agents had analyzed feasibility and explored the codebase. We were ready to create issues and start implementation. The system stopped us. It produced a list of unresolved ambiguities and refused to proceed.
The list was short:
Which activity types trigger notifications in-app vs just lock-screen push?
Should a tap on a notification open the in-app notification center first or the target screen depending on notification type?
Does the copy need to be translated?
Should users configure notification preferences per type?
These were all things we'd been about to leave implicit. We'd assumed our coding agents would figure it out. They would have figured something out, but it wouldn't have been the answer the team would have arrived at if we'd actually had the conversation.
So we had the conversation. It took an hour. Then the system unblocked.
If refinement is no longer a series of handoffs, what is the new shape of product management, and what new failure mode does it produce?
This is the second piece in a five-part series. The first argued documentation is context infrastructure. This part is about the layer upstream of implementation: where refined issues come from, and what changes about product management when refinement happens in real time with agents.
The PRD problem
The traditional model is well-known. A product manager writes a PRD or fills out a ticket template. A designer makes mockups. An engineer reads both, finds the gaps, asks questions, waits for answers, and starts coding with incomplete context.
Implementation issues surface mid-sprint. The team scrambles into urgent meetings to figure out what to do, and the result is often a half-baked feature.
The model has been broken for a long time. Knowledge degraded at every handoff. PRDs were specifications written before the people who'd implement them had a chance to ask questions about feasibility, challenge the complexity of certain requirements, or warn about risky migrations. Engineers built what was specified, even when what was specified was internally inconsistent or technically expensive in ways the PM didn't realize.
The fix wasn't more thorough PRDs. We've all written those. They get longer, not better. The fix was changing who's in the room when refinement happens.
The new refinement model
Refinement now happens in a single session with three parties: the human PM, the Product Manager agent acting as a sparring partner, and on-demand specialist agents that bring the rest of the team's knowledge, the codebase, and current metrics.
The human PM opens a session with a prompt. Something like: "We believe we have under-invested in re-engagement strategies. Users open the app once and drop because they forget about the app. Can you design an effective notification system to re-engage users in the first few weeks?"
The Product Manager agent reads the vision document, the strategy, and the current product metrics. It asks clarifying questions:
Is this worth building?
Why now?
Is there anything more important?
Are we confident it will work?
These aren't new questions. They've been on every "good PM" checklist for two decades. What's different is that they get asked in the moment, against the full context of the team's strategic situation, rather than surfacing in retrospectives after the wrong feature shipped.
The strategy gate
The Product Manager agent's job here isn't to be deferential. It synthesizes the strategic context (vision, strategy, prior decisions) and challenges the human PM's thinking. The first barrier against becoming a feature factory is the agent that says "your strategy says you prioritize X, and this isn't X. Explain why we're doing it anyway."
The decision stays with the human PM. The agent surfaces the case for and against; the call to proceed (or to drop the work) belongs to the human.
When the why is settled, the conversation shifts to scope and approach. Other agents enter as needed:
The Architect agent assesses feasibility against the existing system. It reads the building block view and the relevant ADRs, identifies which components are affected, flags conflicts with prior decisions, and produces an effort estimate.
The Developer agent explores the codebase. It identifies existing patterns the feature could reuse, surfaces dependencies, and flags edge cases that aren't visible from the outside.
The Designer agent extracts relevant design tokens and patterns from Figma. It links existing components that could be reused.
The Product Analytics agent reviews the current tracking plan and proposes changes to measure adoption and overall success of the feature.
The QA agent creates instructions for automated end-to-end tests, integration tests, and unit tests.
The Product Manager agent synthesizes all of this in real time. The human PM sees technical complexity emerge as they describe the feature.
The agents aren't autonomous. Each specialist on the team shapes the agent for their role around their thinking, knowledge, and non-negotiable rules. The Architect agent answers the way our staff engineer would. The Designer agent applies the same principles our designer would push back with. Those agents then ship to the whole team through a shared package, so any human PM can run sessions against the same Architect agent without pulling the engineering team into long refinement meetings.
This is the structural change: the same conversation, the same artifact, with all the relevant context loaded for each role. The engineers, designers, and analysts still get involved. By the time they do, the plan is already aligned with the strategy and anchored in the reality of the current system.
What often happens is that the change turns out too big to ship as one piece. The Project Manager agent (distinct from the Product Manager agent driving refinement) proposes splitting it into a sequence of shippable slices: one or two go out now, the rest land in later iterations.
This doesn't mean the result is always perfect. Specialists can't easily watch each other's refinement sessions, so we lean on an auto-healing loop instead: when a human spots a bug in how an agent handled something, the agent analyses its own process and proposes a fix to its own workflow.
That's the real leverage: humans see the friction, agents codify the fix, and productivity compounds over time.
The ambiguity gate
When the team feels the refinement is done, the system runs an ambiguity check.
It looks at every claim in the proposed scope and tags entries as resolved, assumed, or open. Resolved entries cite a source: an ADR, a section of documentation, a decision the human PM made during the session. Assumed entries are flagged for explicit confirmation. Open entries block the workflow.
The system will not produce issues until all entries are either resolved or explicitly deferred. There's no way to silently proceed with unresolved questions. This is structural, not advisory.
This is the part of the system that surprised us most. Most teams discover ambiguity during implementation, usually when an engineer hits a moment where the spec doesn't tell them what to do, asks Slack, gets contradictory answers from the human PM and the designer, and picks one. The ambiguity was always there; it just surfaced too late.
The gate moves the ambiguity discovery upstream. It feels painful in the moment (the team is blocked), but the cost of resolving the ambiguity is much lower at this stage than during implementation, where it would mean reverting code, re-running review, and renegotiating scope.
Complexity scoring
Once ambiguities are resolved, the Project Manager agent scores the change across six dimensions:
Components affected
Database migration complexity
Cross-cutting concerns
Number of acceptance criteria
Blast radius
Estimated effort
The score determines whether the change becomes a single issue or a project with multiple milestone-ordered issues. The threshold is conservative. We'd rather break something into two issues than try to ship a single issue that's actually two changes glued together.
The Product Manager agent and the human PM agree (or disagree and adjust). When they agree, the Project Manager agent creates the issues, or the project with its issues, populated with the full output of the refinement session.
What's in a refined issue
The output isn't a brief. It's a complete specification.
Why we're doing it: problem statement grounded in mission, vision, strategy
For who: target user persona and their specific need
How: technical approach from the Architect agent, informed by the Developer agent's codebase exploration
Tests: BDD (Behavior-Driven Development) scenarios that double as acceptance criteria and test definitions, so any non-technical teammate can read what's being implemented and how it will be verified. Test responsibility is shared across every human on the team, not just engineering.
Tracking plan: just the analytics events and schemas this issue will add or change, scoped to the issue as it moves through backlog, in progress, and done
Release notes: draft notes ready to be posted on merge
Documentation references: the specific arc42 sections, ADRs, and glossary terms relevant to this change
Each BDD scenario describes observable behavior: no API status codes, no implementation details. They're conditions anyone in the company can verify and that agents translate directly into test cases.
This is what gets pasted into the AI tool when implementation begins. There's no follow-up what did you mean by X? message. Everything is in the issue. The agent reads it and starts building.
What changes for PMs
The metric for "good PM work" has shifted.
The old metric was throughput: how many tickets did you write, how detailed was the spec, how quickly did you respond to engineering questions. The PM was a translator, turning fuzzy product ideas into structured engineering work.
The new metric is closer to judgment quality. Did you challenge the right ideas? Did you resolve the ambiguities that mattered? Did you choose the right level of engineering for this feature based on how it would evolve later? Did you correctly identify the strategic alignment, the user need, the success metric? Did you bring the right context into the room? The agent handles the translation. The PM handles the thinking.
This sounds like a downgrade. It isn't. The human PM on our team is doing more product work, not less. The reason is that the operational overhead (managing tickets, chasing engineers for clarifications, triaging mid-sprint scope changes, writing release notes) has collapsed. What's left is the high-leverage work: deciding what to build, why, how much to invest and how to know if it worked.
There's more time for the upstream work: talking to users, digging through analytics, prototyping and testing with real users. These have always been the things PMs say they wish they had more time for. Removing the operational overhead actually creates that time.
Takeaways
Refinement is structural, not advisory. The ambiguity gate won't produce issues with unresolved questions, and the complexity score forces big changes into multi-issue projects. Both are guardrails the team can't talk past.
Agents synthesize and challenge; humans decide. The Product Manager agent reads the strategy, vision, and prior decisions, then pushes back when a feature doesn't fit. The call to proceed stays with the human PM.
Refined issues are complete specifications. The why, the who, the how, BDD tests anyone can read, the analytics events this issue will emit, the doc references that anchor it. There's no follow-up what did you mean by X? message.
The PM role shifts from translator to judge. Throughput collapses; judgment quality becomes the metric. Did you challenge the right ideas, resolve the right ambiguities, bring the right context?
The system improves itself. When a human spots a bug in how an agent handled something, the agent analyses its own process and proposes a fix to its own workflow.
What we don't know yet
Refinement quality is downstream of human judgment. The agent challenges, scores, and blocks. But it can't tell you whether you're solving the right problem in the first place. If the human PM's strategic intuition is wrong, the refinement system will produce a beautifully specified version of the wrong feature. The bottleneck is judgment, not process.
The ambiguity gate has a calibration problem. Sometimes it blocks on questions that don't matter. Sometimes it lets through assumptions that should have been challenged. Tuning the gate is ongoing work, and there's no objective metric for "right calibration."
Refinement scales differently than execution. The build pipeline (the next piece) can run many issues in parallel. Refinement is a synchronous human conversation that can't trivially be parallelized. If the bottleneck moves to refinement, we'll need a different solution than "more agents."
This shifts what the PM hire looks like. A PM whose primary skill was writing detailed specs has less leverage in this model. A PM whose primary skill is asking the right questions, zooming in and out, exploring all options and making the right micro decisions has much more. We don't know yet how the hiring market absorbs this shift.
The honest claim is narrower than PMs are obsolete or PMs are 10× more productive. It's: the operational overhead of product management collapses, and the judgment work becomes more visible. That's a meaningful change. Whether it's the right model for every team is a question we'll only answer after we've seen this work outside our own context.
Next in this series, in two weeks: Agentic DevOps. The build pipeline that takes a refined milestone and produces working, reviewed, merged code, with the same kind of structural reliability that CI/CD brought to deployment.



