Monday, September 17, 2018

T-Rank Methodology Update

For the first time since 2014, I'm making significant changes to the T-Rank ratings. The output will be the same: adjusted offensive and defensive efficiencies, used to create the "Barthag" pythagorean expectancy. But I've changed how I create the adjusted efficiencies.

Specifically, I'm going to incorporate the "GameScript +/-" stat that I derive from the play-by-play (where available), and I'm going to alter the GameScript stat so that it disregards anything that happens during "garbage time." The resulting ratings will therefore more reflect how well teams actually perform when the outcome is still in question.

In my backtesting, the new ratings perform slightly better than the old ones. But frankly I just thought it would be cool to incorporate this sort of unique data I track, so I went ahead and did it. All the ratings on the site have been updated, back to 2008 (though I don't have any GameScript data for 2008 or 2009, so there are no substantive changes for those years).


For the past two seasons I have produced separate ratings, the "Implied T-Rank," using the GameScript stat. What I do is use the GameScript stat—which represents a team's average lead or deficit during a game—to infer a final score, and then use this derived final score instead of the actual final score to create the ratings.

To explain how I get this derived final score, I'll use Wisconsin's home game against Michigan last year, which is a good example of a game where the actual final score (83-72, Michigan) gives a different picture than the GameScript (Michigan +14.5 when its lead became safe, which is equivalent to about a 29-point win):

1) Calculate the GameScript using play-by-play data. Going forward I will use the GameScript at the moment the winning team's lead becomes "safe" (using Bill James's famous formula), unless there is a miraculous comeback. Thus, the GameScript will not reflect any scoring during "garbage time," whether it's running up the score or the scrubs coming in to allow the final number to be more respectable. As it reflects a team's average lead/deficit over the entire game, GameScript was already resistant to late-game shenanigans (it can change only so much in the last few minutes, no matter what happens), but this will make it even more so.

2) To derive a score, add up both teams' actual scores, divide that by two, then add or subtract the GameScript. In the case of the Michigan at Wisconsin game, there were 155 points scored, so Michigan's derived score would be 77.5 + 14.5 = 92 and Wisconsin's derived score would be 77.5 - 14.5 = 63. Derived score is Michigan, 92-63. That's the 29-point margin.

The new T-Rank will use both the actual score and this GameScript-derived score (where available) from each game to calculate adjusted offensive and defensive efficiencies, and then everything else will be the same.

One potentially controversial aspect of this method is that a team can win the game but have a negative GameScript, and therefore have its "derived" score be a loss. To somewhat mitigate this, I divide the GameScript in half in those situations.

Wonky note: the new ratings have a narrower "spread" in the adjusted efficiencies, so I've upped the "exponent" used to calculate the Barthag from 10.25 to 11.5. I'm sure this will cause some bugs on the site, so please let me know if you notice any weirdness.

No comments:

Post a Comment