Analyzing data for the four major tennis tournaments

Between 2011 and 2017, I was working closely with the IBM group that was responsible for IBM's technical partnership with the four major tennis tournaments. These tournaments are the Australian Open, Roland-Garros, Wimbledon, and the US Open.

I pioneered the development of the “Keys to the Match” for the tournaments and I was solely responsible for the analytical work for the solution from the initial conceptualization through the data management to the development required to create the fully-automated system that was in use during each of the tournaments from July 2011 through 2017.

The solution was a part of IBM's SlamTracker application alongside the real‑time match scores on each of the tournament websites and it was used by by ESPN in their broadcasts starting in 2012.

I was presented with IBM's Outstanding Technical Achievement Award in 2012 for my work on the project along with the team that was responsible for the technical infrastructure of the solution.

How it worked

The solution utilizes the massive amounts of data that for many years have been collected in for each point by courtside statisticians and by equipment such as radar guns that record the speed of serves and other shots. The data is acquired in real-time and immediately made available to the fans through the SlamTracker application on the tournament websites.

The "Keys to the Match" (or just "Keys" for short) is an attempt to identify the most relevant of these many statistics for each player in a match and to find value that the player should target for those statistics to increase their chances of winning a set. The idea is to identify something that the player does not always do or is not always important, but which will be of significance against this particular opponent. The objective was to avoid the obvious objectives related to winning a set, such as "win more points" or "break the opponent's serve".

I started out by developing a profile for each player that was based on the available player statistics and these profiles were then used to train a clustering model that divided the players by their playing style and another algorithm derived the players' ability based in their seedings and prior results. Each player was assigned to the most appropriate playing style and ability-level before each new match as more and more recent information became available.

For each upcoming match, the solution used data from past sets from matches between the two players and if needed also brought in data from sets between comparable players based on the playing style and results. I first used a feature screening algorithm to rank the features based on the correlation with the outcome of a set and then continued to train a propensity model using each of the ranked features one at a time. The propensity model was then interpreted and the relevant cut-points were used as the "Keys to the Match".

In total, this approach generated around 5,500 distinct predictive models during each tournament and required no human supervision or intervention.

The visual presentation evolved over time

How the Keys to the Match were presented visually in the IBM SlamTracker changed over time along with the general design guidelines and the technology that powered it.

A look at how the IBM SlamTracker works

While I was at the All England Lawn Tennis and Croquet Club (AELTC) in 2012 to support both the IBM online presence and the ESPN broadcasting team, I was Interviewed by British sports commentator Rob Walker for Live@Wimbledon. In the interview, I explain the fundamentals of the "Keys to the Match" in less than 5 minutes.

During the Roland-Garros tournament in 2012, I joined the ESPN production and graphics units in Bristol, CT, to find the best way to present the "Keys to the Match" to the TV viewers in the live broadcasts. It was a challenge to find a way to display such complex information on the screen without taking up too much real estate and for the commentators to present these results in the time available between two points.

At the following tournament at Wimbledon, we were ready for the tennis commentators to start using what had now been rebranded as the "IBM Insights" during the live broadcasts from the tournament.

I spent the duration of the 2012 Wimbledon tournament and the US Open the same year with the ESPN tennis commentators. I educated them on how the statistical models worked and how each of the metrics was defined, so that they could speak confidently and accurately about each of them. They gave some valuable feedback that was incorporated into the solution as quickly as I could and they taught me more about tennis than I had ever hoped to learn. I primarily worked with Chris Fowler, Patrick McEnroe, Brad Gilbert, and Mike Tirico (now with NBC Sports), but also with Darren Cahill, Mary Joe Fernandez, and John McEnroe.

The video below shows the introduction of the "IBM Insights" as narrated by Tom Rinaldi from the fourth round match between Roger Federer (SUI) and Xavier Malisse (BEL) at the 2012 Wimbledon tournament and includes a post-match evaluation of the "IBM Insights" that were presented earlier in the match.

Working with ESPN

Sitting between Mike Tirico and Chris Fowler at ESPN's Wimbledon studio in 2012

Just like the visual representation of the IBM SlamTracker evolved over time, so did the on-screen graphics used by ESPN.

"Kenneth and I worked together on a project called "Keys to the Match" which was a very innovative and exciting tennis analytics project. This project used existing tennis statistics to create entirely new tennis statistics using a complex mix of technologies and innovative approaches.

Kenneth was the perfect fit for this project. Not only was he an expert data scientist with remarkable analytics skills, he was also an expert in tennis statistics and an expert in several different technologies like SPSS and DB2. Kenneth's wide skillset allowed him to integrate with the team quickly and build a very impressive solution which more than accomplished the project goals. He was the perfect guy for the project.

Additionally, Kenneth was able to describe the solution and defend the science behind the solution. He did this with many different types of people ranging from tennis experts to data scientists that were not familiar with tennis and to the average person. The work was relatable to people who work in many different types of industries and was highly relevant for our project stakeholders.

Anyone who might search for "Kenneth Jensen Keys to the Match" will find references to this work. It generated a lot of interest at the time and also led to an internal company Outstanding Technical Achievement Award.

Kenneth is highly capable and was great to work with. I hope to have the opportunity to work with him again."

Stephen Hammer
Sports CTO and Distinguished Engineer
IBM Corp.

Media coverage

The New York Times (August 21, 2013)

As Tennis Stats Proliferate, Software Tries to Make Sense of It All

When fans think of statistics, baseball, football and basketball often come to mind first. But motion-capture cameras and other automated technologies are ubiquitous enough that a sport like tennis is being overrun with data ...

When the "Keys to the Match" were initially launched, IBM was able to use the solution more broadly to part of the marketing strategy to promote the advanced analytical software (IBM SPSS Modeler) that was the heart of the solution.

I spoke to many journalists and technology writers where I explained the work that I had done to develop the underlying data structures and the complex analytics, however most of the articles were fairly short and did not go into any depth or detail on the technical side.

ESPN Playbook (July 2, 2012)

Wimbledon Analyzed Like Never Before

Not a tennis expert? That’s OK. With the technology being utilized at Wimbledon, anyone can feel like a tennis expert -- even someone who still calls Anna Kournikova his favorite tennis player. Wimbledon has partnered with IBM to bring predictive and analytic technologies ...

ESPN (September 6, 2012)

Insightful ways to beat Serena

Although she still has to win two more matches, Serena Williams is considered the overwhelming favorite to win the 2012 U.S. Open.
Based on the numbers, you can understand why ...

Fast Company (February 14, 2014)

The world's top 10 most innovative companies in sports

With SlamTracker, IBM has created an online stats dashboard that cuts through reams of new data provided by motion-capture cameras and automated sensors.
The software’s specialty is its predictive analysis ...