T-Rank FAQ

As I've added more interesting features to the T-Rank website, it's gotten some more attention. As a result, I've gotten a fair number of good questions about what T-Rank is supposed to be. The easy answer is that it's supposed to be fun, but this doesn't seem to satisfy people. Beyond that, I've explained much of it in old blog posts here, but even I have a hard time finding them. So I decided to change this page (which used to be a mirror of the T-Rank site for some stupid reason) into a FAQ. This is a work in progress.

How is T-Rank calculated?


The core of T-Rank is calculating offensive and defensive efficiency: points scored and points allowed per possession ("PPP" = points per possession, often rendered as points per 100 possessions). Although coaches like Dean Smith and Bo Ryan have long relied on PPP, it really hit the big time when Ken Pomeroy popularized it about a decade ago.

Kenpom's innovation was to separate out the offensive and defensive PPP and then adjust them for opponent quality and venue. Although Kenpom has made some changes over the years, that's still the core of his ratings, and that is also the core of T-Rank.

Calculating adjusted efficiency for a given game is fairly straightforward:

Game Adj. OE = PPPo / (Opponent's Adj. DE / Average PPP)
Game Adj. DE = PPPd / (Opponent's Adj. OE / Average PPP)

For example, assume the average PPP league wide is 100, and Team A scores 110 PPP against Team B, which has an Adj. DE of 90.0. Team A's Adj. OE for that game will be:

110 / (90 / 100) = 122.2 

This is for a game on a neutral court. If it's a home game, each team's Adj. OE and DE are adjusted by 1.4 %. So if Team A was on the road it would be:

110 / (90 * .986 / 100) = 124.0

And if Team A was at home it would be:

110 / (90 *1.014 / 100) = 120.5

The tricky part of calculating the efficiencies is that every result affects its own inputs. If a team comes into a game with an Adj. DE of 90.0 and it gives up more points than expected, its Adj. DE will go up—and then you have put that new Adj. DE in as the source for calculating the game's efficiency. Fortunately, computers can do all that stuff relatively quickly: you just keep doing it and doing it until the numbers stop changing.

Once the numbers have stopped changing, for each team you average their Adj. OE and Adj. DE from each game to get their overall adjusted efficiencies. From the adjusted efficiencies, I use Bill James' "pythagorean expectation" formula to calculate the actual rating, which I jokingly call its "Barthag" (a play on "pythag," which is the correct term). The Barthag is an estimate of what a team's chance of winning would be against the average DI team. So it is between 0 and 1, and higher is better.

There is a constant, called the exponent, used in calculating the Barthag. For my system, I have found that an exponent of 11.5 gives the best results from a predictive standpoint.

From each team's Barthag, we can use another Bill James creation, the log5 formula, to calculate their expected chance of winning against any other team. This allows me to do fun stuff like project records, and run simulations, etc.

UPDATE:

I've made a small but significant change / addition to the ratings, which is that I now incorporate a metric I call "GameScript +/-" which is derived from play-by-play data and measures a team's average lead / deficit during a game. Also, for these purposes I lock this metric when the game is no longer in question. This adds a measure of "game control" and potentially weeds out some "garbage time" effects. More explanation here.

How is T-Rank different from Kenpom?


The short answer is that T-Rank is very similar to Kenpom, which is no surprise given that T-Rank is basically an offshoot of Kenpom. But there are three main sources of difference:

GameScript and Garbage Time

The incorporation of the GameScript stat, and its degradation of garbage time gives T-Rank a slightly unique aspect. Whether it's a good aspect is another question.


Pythags versus Efficiency Margins

Prior to the 2017 season, Kenpom switched away from the pythagorean expectancy / log5 method, to a still very similar system that uses adjusted "efficiency margins" (EMs) instead. The main difference is that instead of being multiplicative, the new Kenpom system is additive. So the basic formula is:

Game Adj. OE = (PPP - Average PPP) - (Opponent's Adj. DE - Average PPP) + Average PPP

For our neutral court example above that would be:

(110 - 100) - (90 - 100) + 100 = 120

So, similar, but a little different. When Kenpom decided to go to adjusted EMs, I decided to stick with the Barthag, for old time's sake.

Secret Sauce

Here are the additional adjustments I make:
  • There's a recency bias—all games in the last 40 days count 100%, then degrade 1% per day until they're 80 days old, after which all games count 60%.
  • An adjustment that discounts blowouts in mismatches—if the margin of victory (MOV) is more than 10 points and the difference in Barthags is above a threshold, the game starts getting discounted. If the MOV is 20 points or higher, the discount is (Higher Barthag - Lower Barthag - .5) * 2. So if a team with a Barthag of .8000 is playing a team with a Barthag of .2000, and it wins by 20 points, the game value will be 1 - (.8 - .2  -.5) * 2, or 80%
  • As with Kenpom, there is also a preseason component that is phased out once a team has played 13 adjusted games (since not all games count for 100% of a game, it typically sticks around for 15 or 16 games).
One other adjustment Kenpom makes that I do not is that later in the year he gooses the average efficiency and depresses the average tempo. He does this, I presume, because it is a consistent pattern that efficiency rises and tempo falls as the year goes on. It makes a lot of sense. Though sound in theory, it turns out to be kind of unnecessary since the two adjustments counteract each other and pretty much cancel out. So I've never bothered with it, but it is a main reason why Kenpom's adjusted efficiencies are higher than T-Rank's and Kenpom's adjusted tempo numbers are lower.

Ultimately, because of these differences, the final numbers are similar but different. Notably, T-Rank has a wider "spread" between top and bottom teams, probably because Kenpom has a much more significant cap on margin of victory.

What is T-Rank For?


I don't envision T-Rank as a competitor to or potential replacement for the Kenpom ratings. People should pay for a Kenpom subscription. Those ratings are deservedly the "industry standard," and I have no ambitions of displacing them. My work started by using the published Kenpom ratings to fill some gaps, specifically the fact that he doesn't publish adjusted efficiency margins for conference-only play. He could easily do so, which means he probably has a good reason (probably that there are fewer games, and the schedule mostly evens out in the end) for not doing so. But that didn't stop me!

Eventually, I figured how to make a similar set of ratings, and making my own ratings from scratch allows me to fill more gaps and make more interesting tools for looking at college basketball. So the purpose of T-Rank is mainly to be the foundation for those tools—it's not an attempt to create a better or truer ranking of teams. 

21 comments:

  1. Why do teams ranked 340-351 have Barthags higher than any other team below 330? Coppin St. .767?

    ReplyDelete
    Replies
    1. Thanks for pointing that out -- it's a display error as the leading zero is being dropped. Will fix. Eg., Coppin St. is actually .0767

      Delete
  2. As a freestanding analytic tool (now not tied to Kenpom) your T-Rankings will provide an excellent comparison tool (set). Thanks much.

    ReplyDelete
  3. Can you explain WAB? I read the definition and then i see the higher number get greener. If a bubble quality team would win more games against the team's schedule, why would that be a good thing?

    ReplyDelete
    Replies
    1. The WAB number isn't how many games a bubble team would win against that team's schedule, it's how many MORE (or fewer) games a team has won against its schedule than a bubble-quality team would be expected to win. So say a team has a schedule that a bubble quality team would be expected to go 10-10 against. If the team is actually 15-5, that's a WAB of +5.0. If they were 5-15, the team's WAB would be -5.0. If they are 10-10, it's par, 0.

      Delete
  4. Can you add sortablity for the team names column and maybe filters to view 1 or 2 teams at a time? Great site btw. I like that customized filter where you can limit the time frame and see not only the ranks , but the adjusted offense and defense during the time selected against top teams. Also, the team pages are great.

    ReplyDelete
    Replies
    1. Thanks! I can add a teamname sort to the main page at last. As for 1 or 2 teams, will have to think about that from an interface perspective. Can filter by conference to narrow things down to a more manageable viewing experience.

      Delete
    2. Can now look at & compare two teams on the main page at a time by clicking on a matchup on the schedule or team pages. You can also choose any two teams by manipulated the URL parameters (t1l and t2l -- those are ELLs on the end, short for "limit")

      Delete
  5. Thanks for the explanation and for making T-rank. I really like the tools you have, like the ability to select games from a certain time period and the ability to compare tournament performance to expected wins.

    ReplyDelete
  6. I'm really impressed with your T-Ranketology algorithm. It's the most accurate near real-time bracketology I know of, which should make for a great resource to follow during the conference tournaments. I have noticed that your live scores sometimes don't acknowledge that a game has ended for quite a while. For instance today's Louisville/Florida St game ended at 1:00 CT today, but it still shows the game being in progress as of now. It does however already acknowledge that subsequent Boston College/NC St game has ended. Is there a possible fix for that? It would be awesome to see how some of these games affect your seed list in real time.

    ReplyDelete
    Replies
    1. Thanks! I'm pleased with the performance of the T-ranketology algorithm, though I'll probably try to improve it some more this offseason. Basically, the idea is to give a general idea of how games a given outcome will affect things, and I think it does a reasonable enough job of that. Ultimately, you can't model madness, but it's fun to try.

      As for the live scores, etc ... I update the site data every 15 minutes, but sometimes it takes quite a while for box scores to go officially final (which is when I pull the data). This seems to be especially common during tournaments, so that's probably what was going on with the Louisville /Florida St. game.

      All in all the live scoring feature is sort of a beta thing. If you like it, you can still see the live scores by putting live=1 in the URL, even though I took the checkbox away.

      Delete
  7. Hey Bart big Mean Green fan here- can you please update your site and replace Tony with Grant McCasland please. Big fan of your work. GMG

    ReplyDelete
    Replies
    1. Done - thanks for pointing out the error.

      Delete
  8. Hi. Is it better to have a lower rank in the FUN? I assume the lower rank means less lucky because those are shaded green and green is good usually. Lucky would not be a good thing because it makes the team look better than they are.

    ReplyDelete
    Replies
    1. As Socrates said, better lucky than good.

      Delete
  9. Adam SwindlehurstMay 8, 2018 at 2:12 PM

    This is incredible, and I love everything on your website. It has helped me a lot with a project I'm working on. The only thing I would want to see is the inclusion of RPI and BPI, especially with the ability to filter by date. I can only seem to find current BPI rankings on other websites, but nothing with the ability to see BPI rankings, say, one week before the tournament started.

    ReplyDelete
    Replies
    1. Thanks Adam! If you want to look at RPI as of certain dates, you can find that buried on that ncaa website in its archive of the "team sheets." For last year, the BPI is listed on those. For example, here are the team sheets as of March 4th last year: https://extra.ncaa.org/solutions/rpi/Stats%20Library/March%204,%202018%20Team%20Sheets.pdf

      Delete
    2. Adam SwindlehurstMay 10, 2018 at 4:24 PM

      Found it. This is everything I could have dreamed of. Thanks Bart!

      Delete
  10. Hey Bart, I'm a big fan of your work! I was looking at your 2019 player finder to compare some freshmen PRPG! projections w/ returning players and noticed freshman have not yet been included in the 2019 player finder. Are you planning on adding them? Thanks!

    ReplyDelete
    Replies
    1. Thanks, and sorry for the late reply. It would be sort of apples to oranges to put them in the player finder because for 2019 I've got a player's "returning" stats, and for Freshmen of course I can only do projections. I do have all the projections, including for freshmen, here: http://barttorvik.com/allrosters19.php?conlimit=&yvalue=Fr&type=All&s=15

      Delete