---
title: "Can you prove AI ROI in Software Engineering? (120k Devs Study) – Yegor Denisov-Blanch, Stanford"
type: "News"
locale: "en"
url: "https://longbridge.com/en/news/269440745.md"
description: "Yegor Denisov-Blanch from Stanford addresses the challenge of proving AI ROI in software engineering. His research, presented at the AI Engineer Code Summit, highlights the pitfalls of relying on activity metrics and emphasizes the importance of measuring engineering outcomes. Analyzing data from 120,000 developers, the study reveals that strategic AI adoption and clean code environments amplify productivity gains. Denisov-Blanch proposes a framework focusing on engineering output and guardrail metrics to ensure effective AI integration, underscoring that AI ROI depends on thoughtful implementation and measurement."
datetime: "2025-12-11T22:50:52.000Z"
locales:
  - [zh-CN](https://longbridge.com/zh-CN/news/269440745.md)
  - [en](https://longbridge.com/en/news/269440745.md)
  - [zh-HK](https://longbridge.com/zh-HK/news/269440745.md)
---

> Supported Languages: [简体中文](https://longbridge.com/zh-CN/news/269440745.md) | [繁體中文](https://longbridge.com/zh-HK/news/269440745.md)


# Can you prove AI ROI in Software Engineering? (120k Devs Study) – Yegor Denisov-Blanch, Stanford

The Quest for Measurable AI ROI in Software Engineering

“Can you prove AI ROI in Software Engineering?” This question, posed by Yegor Denisov-Blanch, a researcher from Stanford, cuts to the heart of a critical challenge facing enterprises today. As companies pour millions into AI tools for software development, the ability to demonstrate tangible returns on this investment remains elusive for many. Denisov-Blanch’s presentation at the AI Engineer Code Summit aimed to demystify this complex issue, offering data-driven insights and a practical playbook for measuring AI’s true impact.

Denisov-Blanch began by highlighting a common pitfall: the overreliance on activity metrics. While metrics like pull request (PR) counts or DORA scores can indicate increased activity, they often fail to prove actual improvement. “Benchmarks show models can write code, but in enterprise deployments ROI is hard to measure, easy to bias, and often distorted by activity metrics (PR counts, DORA) that say ‘more’ without proving ‘better’,” he stated. This disconnect between activity and genuine value creation is a key reason why many AI initiatives fall short of expectations.

To address this, Denisov-Blanch’s research team employed a sophisticated methodology. They analyzed data from over 120,000 software engineers across more than 600 companies. Their approach involved a combination of time-series analysis using Git historical data and cross-sectional analysis across companies. Crucially, they developed a machine learning model capable of replicating panels of expert evaluations for every code commit. This model assessed factors such as implementation time, quality, maintainability, and complexity, correlating these with expert judgments to achieve “exceptional correlation” with an R-squared value of 0.85.

One of the initial insights revealed was that “the rich get richer.” The data suggested a widening gap between early AI adopters who master its application and those who struggle. Denisov-Blanch illustrated this with a projection: “Illustrative AI Productivity Impact: Accelerating Divergence Through 2030.” This projected that top-performing teams, by effectively leveraging AI, could see their productivity gains accelerate, potentially creating a “10x gap” with laggard teams by 2030. This underscores the urgency for organizations to not only adopt AI but to do so strategically.

Further analysis revealed that simple metrics like “token spend” are a weak predictor of AI productivity gains. The research indicated a complex relationship, with a “sweet spot” for token usage. “AI usage quality matters more than usage volume,” Denisov-Blanch emphasized, pointing out that “token spend tells you who is using AI, not who is getting benefit.” This suggests that focusing solely on the quantity of AI interaction can be misleading.

The environment in which AI is deployed also plays a crucial role. Denisov-Blanch presented a “Task Composition by AI Involvement vs. Environment Cleanliness Index” chart, illustrating how clean engineering environments amplify AI’s benefits. He noted that “clean code amplifies AI gains,” allowing AI to “complete a larger share of sprint tasks.” Conversely, “AI use degrades cleanliness” in messy codebases, leading to negative outcomes. This highlights the importance of investing in code hygiene and good engineering practices as a foundation for successful AI adoption.

Measuring ROI effectively requires moving beyond simplistic metrics and focusing on engineering outcomes. Denisov-Blanch proposed a framework that ties AI usage to concrete engineering achievements. The primary metric should be “engineering output,” which is measured not just by lines of code or PRs, but by the quality and impact of the work. He stressed the importance of “guardrail metrics” to ensure that AI adoption doesn’t negatively affect critical aspects like code quality or introduce excessive rework. “Keep guardrail metrics healthy… while increasing the primary metric (engineering output),” he advised.

The research also identified different patterns of AI adoption. From “no observable AI use” (Level 0) to “orchestrated agentic workflows” (Level 4), the maturity of AI integration varied significantly. The data suggested that companies achieving the most substantial gains were those that moved beyond basic “systematized prompting” or “agent-backed development” to more sophisticated integrations where AI actively participates in complex workflows.

Ultimately, Denisov-Blanch’s findings underscore a critical truth: AI adoption is not a one-size-fits-all solution. The ROI of AI in software engineering is heavily dependent on how it’s implemented, measured, and integrated into existing workflows and company culture. By focusing on clean code, thoughtful measurement of engineering outcomes, and strategic adoption patterns, organizations can move beyond the hype and unlock the true potential of AI to drive meaningful productivity gains.

### Related Stocks

- [C3.ai, Inc. (AI.US)](https://longbridge.com/en/quote/AI.US.md)

## Related News & Research

- [ROI-Does the AI business model have a fatal flaw?: Joachim Klement](https://longbridge.com/en/news/281309369.md)
- [Marc Andreessen Says 'Every Large Company Is Overstaffed:' AI Layoffs Are Just An Excuse, Not Job Loss Reality](https://longbridge.com/en/news/281342808.md)
- [TCS Rewires Enterprise Tech With AI](https://longbridge.com/en/news/280993412.md)
- [06:05 ETgateretail and JK Tech Partner to Advance AI-Powered Inflight Retail Intelligence](https://longbridge.com/en/news/280996267.md)
- [Navi AI emerges from stealth to accelerate pilot training with AI](https://longbridge.com/en/news/280443484.md)