Fall 2025: Identifying Soccer Talent by Estimating OBV using Standard KPIs

This work was conducted in collaboration with Denver Summit FC. As a result, certain technical details and proprietary aspects of the project cannot be shared publicly.

Background

On-Ball Value (OBV), developed by StatsBomb, is one of the most widely respected value metrics in soccer analytics. It captures the true impact of a player’s actions on increasing a team’s chances of scoring, allowing us to evaluate performance beyond traditional statistics like goals and assists.

However, while powerful, StatsBomb data covers fewer leagues and seasons and comes at a significantly higher cost compared to Wyscout. Wyscout, on the other hand, offers broader global coverage across leagues and years, making it a more scalable data source for long-term talent identification.

The goal of this project was to bridge that gap by estimating OBV per 90 using standard Wyscout KPIs. Specifically, I aimed to project OBV for the 2026 season in the NWSL for forwards, wingers, and attacking midfielders playing across Europe’s top women’s leagues—England, Spain, France, Germany, and Sweden—as well as the NCAA and the NWSL itself.

By adjusting for league strength and leveraging all available historical data, this approach provides a data-driven framework to identify high-impact attacking players worldwide who could translate their performance to the NWSL.

Methodology

The project followed a three-step approach to estimate OBV for players using standard Wyscout KPIs.

1. League Translation

Player-match data from Wyscout was used to translate performance KPIs from multiple leagues into their NWSL equivalents. This adjustment accounts for differences in league strength and style of play and was implemented using a Poisson regression framework. The result is a set of KPIs that are directly comparable across leagues, all expressed on an NWSL scale.

2. KPI Projection

Each translated KPI was then projected forward to the 2026 NWSL season using a Marcel-style projection method. Historical player performance was aggregated from match-level data into player-season estimates. More recent seasons were weighted more heavily, and seasons with greater playing time received higher weight, allowing the projection to balance recency, sample size, and long-term performance trends while regressing appropriately toward league averages.

3. OBV Estimation

Finally, projected KPIs were combined to estimate On-Ball Value using a linear regression model trained on StatsBomb OBV. This required aligning player-season data between StatsBomb and Wyscout, enabling the model to learn how specific KPIs translate into overall on-ball impact. A carefully selected subset of influential KPIs was used to ensure interpretability while preserving predictive power.

Conclusion and Future Work

This project demonstrates how On-Ball Value can be approximated using widely available performance data, enabling talent identification across leagues that are not directly covered by proprietary models. By adjusting KPIs for league strength and projecting future performance, the framework provides a scalable way to compare players on a common baseline and identify high-impact talent ahead of time.

There are several clear avenues to further strengthen this approach. Adjusting KPIs for strength of schedule and incorporating possession-based normalization (for example, per 30 minutes of team possession) would help better isolate a player’s true impact from the opportunities provided by their team and opponents. This would allow the model to distinguish between volume driven by context and genuine efficiency.

Additionally, integrating off-ball information, such as off-ball runs, defensive engagement, and passing options from SkillCorner, would capture aspects of player contribution that are not reflected in traditional on-ball KPIs. These behaviors often shape attacking value before a touch is ever taken and could meaningfully influence OBV estimates. Incorporating this richer context would make the model more accurate, more predictive, and better aligned with how players are evaluated in real match environments.

Comments

Leave a comment