Thursday, March 10, 2016

Is Kenpom biased in favor of significantly lower-ranked teams? (Yes.)

NOTE: Kenpom changed his rating system significantly for the 2017 season (after I did this study) so the results don't necessarily apply to his current system—indeed, there's reason to think the changes he made alleviate the issues discussed below.

One of the big problems in college basketball—particularly with regards to NCAA tourney selection—is giving proper weight to road games. It is difficult to intuitively grasp how much harder it is to win on the road against a lesser team than it is to win at home against a better team.

This question came to a head when Greg Shaheen was asked about Wichita State’s big home win over Utah, and specifically what he thought an equivalent road win would be. He answered somewhat absurdly: “Utah.”

Seth Burn set out to answer the question using Kenpom numbers, and what he found was that Wichita State playing No. 25 ranked Utah at home is equivalent to them playing No. 109 rated New Mexico on the road. What this means is that the Kenpom system gives them the same chance (about 73%) of winning in each game.

This is kind of shocking, and my own investigation confirms that it is a correct statement about the Kenpom system. But that doesn’t mean it is necessarily a true fact about the universe. Because Kenpom could be wrong about these things. Importantly, it could be biased in favor of lower-rated home teams.

Indeed, it has been my anecdotal observation that this in fact the case: Kenpom seems to have a systematic bias in favor of significantly lower rated teams when they play higher rated teams, particularly at home. So I set out to test this hypothesis.

First, let’s look at all games that Kenpom would predict the home team to have a margin of victory of +/- one point (favored by 1 or a 1-point underdog). By definition, these will be games where the home team is lower rated than the road team, because home-court advantage is built in. Here are the results:

Games where projected MOV is +/- 1 (Kenpom)
Total games: 372
Home team avg expected MOV: 0.01
Home team actual MOV: -1.81
Home team expected win %: .500
Home team actual win %: .438

What we see in these 372 games that Kenpom would expect to be pretty much pick ‘ems is that the better team (the road team) actually won 56.2% of the time, and had an average margin of victory of +1.8 expectations.

Now, this could just be a quirk of this season—maybe the home teams are just underperforming in close games. But when I run the same test using the T-Rank algorithm, the bias pretty much disappears:

Games where projected MOV is +/- 1 (T-Rank)
Total games: 345
Home team avg expected MOV: 0.06
Home team actual MOV: -0.21
Home team expected win %: .502
Home team actual win %: ..487

This still shows a slight bias toward the (better) road team, but it is much less, and looks much more like random variance.

Next I wanted to test my impression that Kenpom particularly breaks down when the spread between the teams is larger (e.g., games like the nearly 100-spot spread between Wichita St. and New Mexico St.). So I looked at games where the home team was rated more than 50 spots lower than the road team. Using Kenpom projections, here are the results:

Games where home team is more than 50 spots lower than road team (Kenpom)
Total games: 1227
Home team avg expected MOV: -4.34
Home team actual MOV: -6.65
Home team expected win %: .345
Home team actual win %: .271

In this rather large set of games, the home team now actually performs 2.3 points worse, on average, than Kenpom’s system would project, and wins only 80% as often as projected. Compare this to the same experiment with T-Rank:

Games where home team is more than 50 spots lower than road team (T-Rank)
Total games: 1150
Home team avg expected MOV: -6.05
Home team actual MOV: -6.80
Home team expected win %: .311
Home team actual win %: .272

Again, we still see a slight bias in favor of the home team, but a much lower one than with the Kenpom system.

Overall, Kenpom has a very good record of prediction and projection, so if there is a systematic bias in these mismatches, we would expect it to be counterbalanced by good results between more evenly matched teams. And that is in fact the case:

Games where home team and road teams ranked within 50 spots of each other (Kenpom)
Total games: 1804
Home team avg expected MOV: 4.1
Home team actual MOV: 3.58
Home team expected win %: .645
Home team actual win %: .652

(For the record, T-Rank performs similarly in these games, but not quite as good.)

In this set of games, the Kenpom algorithm is extremely well calibrated. My speculation is that the shift from Kenpom 1.0 to Kenpom 2.0 made the algorithm more accurate in these (more common and more important) games, at the expense of some loss of calibration in more mismatched games. This seems worth it, and you can see why it would improve the overall performance of the system.

But I think it’s important to consider this evidence that Kenpom’s projections are systematically based in favor of significantly lower ranked teams when doing this calculation about home/road equivalencies, because the results using Kenpom are not only unintuitive, but in all likelihood just plain wrong.

For example, using T-Rank the equivalent road game to Utah at home for Wichita State is a hypothetical team between No. 83 Temple and No. 84 Hawaii. Utah is ranked 25th in both systems, so this is a great comparison: Kenpom says the road equivalent is No. 109 New Mexico St., and T-Rank says it’s No. 83 Temple or No. 84 Hawaii. Since T-Rank gets better results in these kinds of games, and its results are more in line with most of our intuitions, I’m sticking with T-Rank.

***As to why these difference between T-Rank and Kenpom exist, I believe it has to do with the different "spread" the systems create, and the different "exponents" we use to calculate the Pythagorean Expectation (Kenpom uses 11.5, T-Rank uses 10.25). The different spread is caused mainly by the fact that Kenpom much more aggressively caps margin of victory and the effect of blowouts in mismatches than T-Rank does. My hypothesis is that by mostly ignoring those mismatched games, his system ends up being less accurate in mismatched games, but the upside is that it may be more accurate in games between teams of similar quality. 


  1. I know this is a bit old, but very insightful. Thanks for the post.