The State of Engineering Teams in 2026: What DORA Data Actually Shows About AI-Assisted Development

The DORA Report 2025 — Google’s annual analysis of 39,000+ software development professionals — documents something unusual. AI tool adoption among software developers has reached 90%. Individual metrics are up: developers using AI tools daily complete 21% more tasks and merge 98% more pull requests. And yet: organizational delivery metrics are flat or declining. PR review time is up 441%. Bugs per developer are up 54%. Incidents per pull request are up 242.7%.

Individual output is up. Team quality is down. This is the central engineering paradox of 2026, and understanding it changes how you structure teams, set review expectations, and evaluate the real productivity impact of AI tools.

What the Productivity Numbers Actually Mean

The 98% increase in merged pull requests sounds unambiguously positive until you read the associated metrics. PR size is up 51.3%. Review time is up 441%. Incidents per PR are up 242.7%. This describes a specific failure mode: AI tools are enabling individual developers to generate and merge code faster than teams can absorb and review it.

The bottleneck has shifted. Previously, the bottleneck in software development was writing code. AI coding tools have moved that bottleneck downstream — to code review, integration, and quality verification. Teams that have adopted AI tools without updating their review practices are finding that code volume has increased while review capacity has remained constant. More code, same reviewers, less time per review, more errors.

An independent analysis by CodeRabbit published in December 2025 found that AI-coauthored pull requests contain approximately 1.7 times more issues than human-only pull requests, when reviewed under existing review processes. The model generates more code, but does not flag its own uncertainty on security-relevant logic, edge cases, or framework-specific anti-patterns with the same reliability it flags syntax errors.

The 30% of developers who report little or no trust in AI-generated code are not technophobic. They are calibrated. They have seen AI models generate code that compiles, passes linters, and fails in production in ways that human-authored code with equivalent test coverage would not.

The Team Structure Implication: Judgment Is the New Bottleneck

Before AI tools, software development required roughly equivalent amounts of specification, implementation, and review. AI tools have changed this ratio: implementation is dramatically faster, while specification (what do we actually need?) and review (is this what we specified, and is it correct?) have not changed in proportion.

This has a direct implication for team composition. The skill that scales with AI tools is not the ability to write code quickly — AI handles that. It is the ability to:

Understand business requirements well enough to specify precisely what the AI should build
Review AI-generated output with enough architectural understanding to catch what the model got wrong
Make the judgment calls that determine whether a system design is appropriate, not just whether it compiles

These are senior engineering skills. They were always valuable. They are now the bottleneck.

The team composition implication: a smaller team of senior engineers with AI tools outperforms a larger team of junior engineers with the same tools. Not because the junior engineers are less capable, but because directing AI tools effectively and evaluating their output requires the contextual knowledge and architectural judgment that comes from experience. A junior developer using an AI agent generates code faster. Without the judgment to evaluate what was generated, the velocity gain converts directly into technical debt.

This insight reshapes the hiring decision for engineering leaders. The question is not “how many engineers do I need?” It is “what is the senior-to-junior ratio that maximizes throughput quality given AI tool adoption levels?” The answer in 2026 is skewing more senior than it was in 2020.

Platform Engineering as the Prerequisite for AI Value

One of the most specific findings in DORA 2025 is a direct correlation between internal developer platform quality and an organization’s ability to extract value from AI coding tools. Organizations with high-quality internal platforms — consistent development environments, predictable deployment pipelines, standardized toolchains, self-service infrastructure — are extracting AI value. Organizations without them are not.

The mechanism is intuitive once stated: AI coding agents need consistent environments to operate in. When an agent generates code that depends on a specific version of a library, a deployment pipeline configuration, or an environment variable, it needs those to be predictable and documented. In organizations without a functioning internal developer platform, AI agents generate code that cannot be consistently tested or deployed because the surrounding infrastructure is inconsistent.

The practical implication: investment in platform engineering is not a prerequisite to adopting AI tools — it is a prerequisite to AI tools delivering their expected return. Organizations that have adopted AI tools and are not seeing the expected productivity gains should audit their developer platform quality before concluding the tools don’t work.

By 2026, 73% of platform engineering teams have integrated AI assistants into at least one developer workflow. The organizations that are getting the most value are the ones that treated platform engineering as infrastructure for AI adoption, not as a separate initiative.

The Quality Debt Accumulation Risk

The DORA 2025 data documents a specific quality degradation pattern: AI-assisted development is producing more code, merged faster, with less review time per line, at higher incident rates. If this pattern continues without intervention, organizations adopting AI tools broadly are accumulating quality debt at a rate that will surface as a significant reliability problem.

The signals to monitor:

Mean time to recovery (MTTR) trends. If MTTR is increasing while deployment frequency is increasing, you are shipping faster than your recovery capabilities are improving. This suggests review and testing are not keeping pace with output.

Incident classification. What percentage of production incidents are traceable to recently merged AI-coauthored code? If this is disproportionate relative to the percentage of your codebase that is AI-generated, the review process for AI output is insufficient.

Test coverage trends on AI-coauthored code. AI tools are often better at generating implementation than tests. If AI-coauthored PRs have systematically lower test coverage than human-authored PRs, this is a leading indicator of future quality problems.

Code review throughput vs. code volume. If PR volume is increasing faster than reviewer throughput, the review process is becoming a rubber stamp rather than a quality gate. The 441% increase in review time documented in DORA 2025 suggests this is already happening at many organizations.

The Process Changes That Matter

The DORA data points to specific process adaptations that organizations using AI tools effectively have made. These are not adjustments around the edges — they represent meaningful changes to how engineering teams are structured and how review works.

Extended automated review pipelines. AI-generated code has different failure modes from human-generated code: it tends to be syntactically correct, pass existing tests, but fail in subtle ways related to security logic, edge case handling, and architectural anti-patterns. Organizations that are managing this well have extended their CI pipelines with automated security scanning, static analysis, and architecture linting that specifically targets the failure modes AI models are known to miss.

Tiered review requirements by code category. A generated boilerplate file and a generated payment processing function have different risk profiles. Organizations with effective AI governance apply different review requirements based on code sensitivity: security-relevant code, payment processing logic, and compliance-adjacent functionality require senior review regardless of origin; lower-risk code can flow through standard review processes.

Specification discipline as a practice. The bottleneck has moved from implementation to specification. Teams that have adapted their process have invested in specification quality: precise task definitions, explicit acceptance criteria, documented architectural constraints that the AI tool must operate within. The quality of the specification determines the quality of the AI output more than model selection does.

How we approach this at Insoftex

We have been using AI coding tools in production client work since Claude Code’s release in May 2025 — with customer approval as a standing requirement, not an assumption. Our experience with the DORA paradox is direct: the tools are productive, and the risk they introduce is real.

The operational principle we apply: AI tools generate faster; senior engineers evaluate and own. Every AI-generated output that goes into client code has been reviewed by a senior engineer who takes architectural accountability for it — not reviewed in the sense of “it looks fine,” but reviewed against the specific constraints of the system it is entering, the compliance requirements it touches, and the failure modes that would not be visible from the code alone.

The team composition we apply for AI-assisted work is more senior-heavy than a traditional staffing model for the same scope. Not because AI tools require more senior engineers — they require fewer total engineers. But because the leverage they provide accrues to senior judgment, and the risk they introduce is concentrated in review rather than implementation. A team of 3 senior engineers with AI tools delivering the quality output of 5-6 traditional engineers is the ratio we see working consistently. A team of 5 junior engineers with AI tools is a technical debt accelerator.

The specification investment is the practice we have found most consistently valuable. Precise task boundaries, explicit constraints, documented acceptance criteria — written before the AI tool is invoked, not after. The overhead is real and the return is substantial: the AI output is better, the review is faster, and the rate of rework is lower.

Building or scaling an engineering team in the AI coding era? Our Scale & Evolve service delivers named senior engineering capacity with AI-assisted development under architectural oversight — the same ratio that the DORA data suggests works, without the hiring timeline. Book a technical call to discuss your specific situation.