Friday, September 30, 2016

Kenpom esoterica


As you may be aware, I've spent the college basketball offseason upgrading and backfilling the T-Rank website. I've now got player stats back to 2009-10 and team stats back to 2008-09.

For quality control, I checked some of the results against the stats on Kenpom.com. From 2013 forward, they are basically identical. But for 2012 and earlier, there are small but systemic differences. For example, the raw points per game for each game (available on the T-Rank "results" page and the Kenpom "Game Plan" page for each team) are usually off by a point or two per 100 possessions.

Based on the nature of the discrepancies, I deduced the differences were likely caused by differences in calculated possessions. Although "possessions" is pretty much the foundation of tempo-free basketball statistics, it's not an officially kept stat, and has to be calculated from other box score stats. The basic formula that both T-Rank and Kenpom use is:

Field Goal Attempts + Turnovers + (Free Throw Attempts * .475) - Offensive Rebounds

The first thing I wanted to check was whether I had different underlying boxscore data than Kenpom was using. My box scores for those older games are from ESPN, and I spot checked them against official team boxscores to feel confident that they are correct. But I couldn't check what Kenpom was using, because he doesn't publish his box score data for games prior to 2013. I took this as a clue that his box score data for 2012 and earlier is somewhat lacking.

The main stats lacking in old box scores that can affect tempo-free statistics are team rebounds (particularly team offensive rebounds) and team turnovers. Sometimes you'll see box scores that show team totals which are just the sum of player totals, and therefore don't include team rebounds and team turnovers. Most games have a few team rebounds and a couple team turnovers, and if we don't have those stats the calculated possessions will be less accurate.

The ESPN boxscores I used do include team rebounds and team turnovers. Other old boxscores, like those available at basketball reference, do not. This gave me an opportunity to see if I could somehow calculate Kenpom's results using the incomplete boxscores.

And I did! What I figured out is that Kenpom's underlying boxscore data from that era apparently doesn't include team offensive rebounds or team defensive rebounds, but does include total rebounds that include team rebounds. So Kenpom knows how many team rebounds each team got, but doesn't know whether they are offensive or defensive team rebounds.

What he apparently did with this data was to assume half of the team rebounds were offensive and half were defensive. I am pretty much positive this is what he did, because doing this also solves another mystery, which is how Kenpom was calculating his rebounding percentages for these older games.

Just for fun, let's walk though an example picked relatively at random: Ohio State's 62-60 loss to Kentucky in the 2011 NCAA tournament. According to T-Rank, Ohio State's PPP that game was 101.6, and Kentucky's was 105.0, based on 59.05 calculated possessions per team:


Team FGA FTA ORB DRB TRB TO Score Poss PPP
Ohio St. 58 22 16 20 36 7 60 59.45 101.6
Kentucky 48 14 7 25 32 11 62 58.65 105.0
Avg: 59.05

Ohio State's offensive rebounding percentage (ORB / (ORB + opponents DRB)) was 39% and Kentucky's was 25.9%.

But if you look at Kenpom, it gives Ohio State a PPP of 99.5 and has Kentucky at 102.8 on 60 possessions. A fairly significant difference! But I can produce those numbers using the incomplete box score available at basketball reference (and at the old version of ESPN, which is secretly still accessible at "proxy.espn.com"). Here are the raw stats you can get there:

Team FGA FTA ORB DRB TRB TO
Ohio St. 58 22 10 20 36* 7
Kentucky 48 14 7 24 32* 11

I've put asterisks in the total rebound columns because that's not actually the data available on the incomplete boxscores I have found (which actually show 30 and 31, just the sum of the incomplete parts) but I'm assuming that Kenpom must have had access to that total rebound figure that included team rebounds, and I have reason to believe that data used to be available. The next step is to divide those "missing rebounds"—6 for Ohio State and 1 for Kentucky—equally into the offensive and defensive columns, yielding:

Team FGA FTA ORB DRB TRB TO Score Poss PPP
Ohio St. 58 22 13 23 36 7 60 62.45 99.5
Kentucky 48 14 7.5 24.5 32 11 62 58.15 102.8

Avg: 60.3

This exactly nails the Kenpom PPP for both teams by adding an extra 1.25 possessions per team, thanks to 2.5 fewer offensive rebounds. It also matches up with the rebounding percentages Kenpom has for this game: Ohio State at 34.7% (13 / (13 + 24.5)) and Kentucky at 24.6% (7.5 / (7.5 + 23). 

This game had an unusually large number of team rebounds, and all but one were offensive. As a result, the Kenpom possession estimation is quite a bit off (over a full possession from that calculated using complete data) and the rebounding percentages are even more skewed.

Moral of the story: always trust T-Rank.


Tuesday, September 27, 2016

UW Michigan

I'll start by agreeing with my co-blogger on his last post. My attempts to predict the result of football games is entirely futile. I don't think that I have any special insight into this team, or any other team for that matter. I just like to throw out my predictions for fun and to track what I was thinking at the time of the games. If you look back at my record against the spread over the time I have posted predictions, (surprise, surprise) I get about 50% wrong. I'm now 2-2 on this season.
But that means I'm getting 50% right which is the only half that matters, so here I go.
Defense still rules the day in this match up with another low over under of 45 points, but Badgers are a 10.5 point underdog. I like double digit dogs when you have a low over under, so I'm taking the Badgers and the points.

Sunday, September 25, 2016

How good is Wisconsin?

Last week Chorlton foolishly bet against the Badgers. I called him out for his lack of faith and predicted that the Badgers would win by 19.*

At this point I guess I should break something to the more earnest among you: all my predictions are jokes.

The truth is I don't have strong opinions about what will happen in college football games. The ancients called this "wisdom." Because no one knows what will happen in college football games. This is why when some charge me of "overconfidence" about the Badgers I am genuinely befuddled. I have no confidence in the Badger football team.** I have no confidence in any football teams. I just watch the dang games and cheer and hope and drink and pass out. I do make predictions but -- and I can't emphasize this enough -- all my predictions are jokes.

But sometimes, very rarely, it happens that life makes my jokes unfunny. Saturday was such an occasion. My joke prediction* of a 19-point Badger win over MSU win became a straight man: the Badgers won by 24, dominating all three phases. Well, at least two. But really three. (Is there a fourth phase? I think it's plasma.)

Are the Badgers 24-points-plus-home-field-advantage better than the Spartans? That seems unlikely. They benefitted from some freak plays -- a crazy fumble returned for an unlikely touchdown; a dropped snap on a punt from the 5-yard-line, etc. -- that made the score what it was. But college football is crazy plays. I mean, come on, you've watched it before, right? Sometimes you get em, sometimes they get you. MSU got got Saturday, and it was great.

But that leaves the question: how good is Wisconsin, really? It shouldn't surprise you to learn that I have no fucking idea. I think they're "pretty good." And they've already won enough games this year to prove that, objectively. This is the great thing about being a Badgers fan: beat a couple top-10 teams, and the season is a success. We don't feel entitled to national-championship contenders, and we don't feel bad when they don't materialize. We just cheer and hope and drink and pass out. Then we wake up in Spring to find the basketball team is in the Sweet 16 again.

Life is good.


*Later, my account was hack'd.

**As someone who came of age in the 80s, this is constitutional.

Tuesday, September 20, 2016

UW MSU

2-1 after the almost debacle last week. Bucky is beat up, has QB issues, and Sparty is tough at home. UW is only a 5.5 point underdog. I like Sparty at home to win and cover.

Thursday, September 15, 2016

UW Georgia State

2-0 after last week. This is going to be a quick one. Badgers are favored by 35 points. 

Odds makers are still not giving the UW offense much credit. The over/under for this game is only 50 points. I know the UW defense has been great which will keep total scoring down, but that is too low. In 2 losses so far this year Georgia State gave up 31 points at home to Ball State and 48 on the road to Air Force. UW put up 54 last week against a similar but maybe slightly better Akron defense. Seems very likely that UW goes over 50 again this week. 
I'm taking UW and giving the points. 

Wednesday, September 14, 2016

Who's gaming the RPI this year?

Among the RPI's well-known flaws is that it can be easily gamed. As Luke Winn explained several years ago:
Seventy-five percent of the RPI formula is about strength of schedule (SOS), and because the RPI uses the flawed metric of raw winning percentage to assess SOS, it fails to provide a true measure of the quality of opponents. The truest measure available is kenpom.com's NCSOS ranking, which creates a pythagorean winning percentage based on opponents' adjusted efficiency, and even adjusts for home/neutral/road situations, which the SOS portion of RPI does not.
So in the RPI, your schedule is essentially your destiny. To show this, I set up a hypothetical bubble team (with a pythag of .8000 on a neutral court) and ran the RPI for that team playing every team's announced 2016-17 schedule. Obviously, this excludes later rounds of holiday tournaments and unannounced games, but these have just minor effects at this point.

The results are available the T-Rank bubble-rpi page. About half the schedules produce a bubble-team rank between 40-60, which is what you'd expect since the hypothetical bubble team in question would be around #50 in the T-Rank. So bubble-rpi rank around 50 shows that a team's current schedule is reasonably neutral for RPI purposes.

The team with the "best" schedule for maximizing the RPI of a bubble team is North Carolina's. A bubble team playing North Carolina's announced schedule (notably missing two rounds in Maui, including a possible game against Wisconsin), would be expected to go 17-12 and rank 16th in the RPI. Would that be enough to get into the tournament? Assuming those 17 wins include a number of top 50 conquests, I think so. It compares to what Oregon State did last year: 18-12 on Selection Sunday with an RPI rank of 33 and a number of "good wins" got them a 7-seed (!) despite a Kenpom / T-Rank around 60th.

That said, a schedule like North Carolina's is probably not the most advisable for a true bubble team, because it comes to its high ranking rather honestly: by playing a lot of tough games. Sure, the average bubble team would win 17 games, but a bubble team that got a few bad bounces could easily miss the NIT with that schedule.

The schedule with the best mix of good projected record and good projected RPI rank is probably Rhode Island's. A bubble team playing Rhode Island's schedule would project to 21-8 with an RPI rank of 18 -- pretty much a sure thing for the tournament. Rhode Island will also play either Duke or Penn St. in their preseason tournament, in which case the projected RPI rank changes to 17 or 20, respectively. In any case, a bubble team playing that schedule is looking at very likely at least 19 regular season wins and a top 20 RPI. Well done Rams!

How did they do it? The old-fashioned way: lots of beatable mid-majors, and no worthless sub-250 cupcakes. The only downside of their schedule is that it doesn't provide a lot of opportunity for resume-building top-50 wins, and that's why T-Rank currently projects the Rams among the last 4 teams into the tournament (FWIW).

On the flip side, the schedule with the absolute worst RPI profile belongs to North Carolina Central out of the MEAC. A bubble team playing that schedule would be expected to go 22-3 but rank 128th in the RPI. The big problem for NC Central is the MEAC: it has no good teams, and they're all going to get ground to dust in the non-conference.

But of course no potential bubble team plays a schedule like NC Central's, so let's look instead at the worst RPI schedules among high major teams that might have designs on a tournament berth. In that cohort, there are really just four teams that have unusually unfavorable schedules:


Texas Tech rather famously gamed the RPI last year, but Tubby's successor will have a much less favorable slate this year. Their non-conference schedule includes a pathetic seven games against sub-250 projected teams, plus #232 North Texas.  That said, they benefit from playing in the Big 12, which projects to be strong top to bottom, so even if they are a bubble-quality team this year (and T-Rank thinks they'll be slightly better than that) they should pick up enough quality wins in conference play to neutralize the stigma of a low raw RPI rank.

Two teams that could suffer from their unfavorable schedules are Utah and Northwestern. Of Utah's eight D-I scheduled non-conference games, six are of the RPI-killing cupcake variety. Throw in Utah's two games against non-DI teams (which don't count for RPI) and 80% of Utah's scheduled non-conference is garbage. Utah does have two games TBD in the Diamond Head Classic, and if they play Illinois St. and San Diego St. that would lift the bubble-projected RPI to 58th. But they've got probably an equal chance of playing Hawaii and Tulsa, which would change the projection back to 68th. Given that Utah could well be a bubble team this year, this schedule could do them in.

Northwestern is less likely to be a bubble-quality team, but if it is its schedule could be a limiting factor. Northwestern also plays six RPI-killing cupcakes. Even more respectable opponents like DePaul and Wake Forest are probably a negative, because those are major conference doormats likely to end up with a bad record -- but they're also very capable of pulling an upset. So it's taking a risk of a loss without any SOS bump.

Ultimately, this exercise illustrates the most damning thing about the RPI: a hypothetical bubble team could finish anywhere between 16th and 128th, completely dependent on its schedule. In other words, the RPI is primarily a metric that measures schedule quality, not team quality.

Sunday, September 11, 2016

Wisconsin is better than michigan

This will be the first feature in what I hope to make an ongoing series on this blog. My girlfriend Tina often speaks of how great her alma mater is, and I often have to point out how much better UW is. Last season I heard quite a bit about how easy UW's schedule was as an excuse for why michigan couldn't win more games than UW. I have heard less of that this season, but still heard critiques when UW played Akron (nevermind michigan played Hawaii and UCF).

UW has had trouble scheduling top teams in the past, mostly because no one wants to play (and likely lose) at Camp Randall. I decided to check the past 10 years and see if UW or michigan had a tougher go of it. Over the last 10 years from 2007-2016 michigan played 34 teams ranked in the top 25 when the game was played, and UW played 35. michigan played 4 teams ranked in the top 5, and UW played 6.

Here's the best part. UW has always been criticized for being able to beat teams they should beat, but not beating the best teams with elite athletes. Over the 10 years from 2007-2016, UW was a respectable 16-19 vs the top 25, and 2-4 vs the top 5. Over the same period michigan was 10-24 vs the top 25, and 0-4 vs the top 5.

Suck on that michigan.

Thursday, September 8, 2016

UW Akron

Pretty fun game last week. Badgers covered the spread so I start off 1-0. Akron is 1-0 after a home win vs. VMI. Badgers are a 23.5 point favorite at home.

Akron put up 47 points and 576 yards in the win, with 425 coming through the air. Could have been better as the Zips were rather undisciplined committing 2 turnovers, and 13 penalties for 111 yards. Despite all the offense, Akron held the ball for less than 24 minutes. Akron has had a similar high flying offense in the past and I don't anticipate UW will have much trouble stopping it. The only question is if UW pulls defensive starters late and if the Zips are able to put up some points against the backups. Will there be a hangover from LSU? I'm doubtful. This defense is high energy and I think they are more likely to lick their chops at a team that can't block Watt and Biegel, but still wants to throw it 50 times.

The O/U is a mere 47.5,which leads me to believe people are less than sold on the UW offense that only put up 16 against LSU. I am not as concerned. In a road game VMI put up 24 points and 386 yards, and had 20 first downs. It's not like they were getting blown out and racked them up late. The game was 26-24 going into the 4th before Akron put up 21 unanswered in the 4th quarter. UW should be able to run easily, and if Clement's speed is back he should have some long TD runs.

I am picking UW and giving the points. The question here may be more about Chryst than anything else. Bucky only had 2 huge blowout wins last season (58-0 over Miami OH, and 48-10 against Rutgers). I'm not sure if we know yet if Chryst has the same rack up the score mentality as Brett and Gary, but I'm guessing he does.