Saturday, October 21, 2017

T-Rank: New Stuff for 2017-18 ICYMI MEGAPOST

I've added a bunch of stuff to the T-Rank site this offseason (or at least since the start of last season) and thought it might be a good idea to inventory & explain it somewhere. Somewhere like here.

First, a little back-patting. Three different aspects of the T-Rank site were validated as pretty useful last year: the ratings system itself, the "T-Ranketology" tourney predicting algorithm, and the preseason projections.

The rating system performed well. In terms of predicting games, it was the best full-season predictor, at least according to this analysis. At the very least, T-Rank is in the same league with Kenpom (no surprise, since I copied Kenpom liberally) and Sagarin. This means the adjusted efficiencies that drive the ratings are likely sound as well, and all the fun stuff that I use those adjusted efficiencies to power has at least some relation to reality.

The T-Ranketology algorithm did a great job last year, finishing near the very top of the Bracket Matrix competition. Of course, this was probably lucky, and last year was a bit of a strange year in that most people were able to correctly pick the field. But again, the main takeaway is that T-Ranketology is "good enough" to use for other fun stuff, such as the new Teamcast and Tourneycast features, and feel confident that the results are at least worth taking seriously.

Finally, last year's preseason projections turned out the be the best 1 to 351 projections, at least according to Dan Hanner's analysis. This is somewhat of a bittersweet victory because—as I've explained in an update to this old post—I have actually totally revamped the preseason projections for this year. But I'm confident that the new system is even better, and I'm also quite certain that last year's "victory" was mostly luck. Still, coming out ahead of Hanner and Kenpom by any metric at least shows that the ideas behind the system aren't garbage.

So putting these three facts together, I think it's safe to say that T-Rank's underlying algorithms and ideas are reasonably sound, which makes the tools on the site even more fun to play around with. Speaking of those tools:

Teamcast


This is probably the coolest thing I've added: a tool to play around a team's schedule (picking wins and losses, adding dropping games) to see how it affects their tourney chances and seed, based on the T-Ranketology algorithm.


The last two columns there show how much of an effect a win or loss would have on a team's T-Ranketology score (in isolation from all other games). Teamcast is also retroactive back to 2008, so you can go back and look what would might have happened if a certain game had come out differently.

I'm excited for Teamcast this year to play out various bubble scenarios—especially given the likelihood that the Badgers will be on the bubble.

Daycast


Daycast is similar to Teamcast, except that instead of playing around with a team's schedule, you play around with the games on a given day to see how it affects the tourney. As part of this, I've developed a new thrill quotient, the "Torvik Tourney Thrill Quotient" or "T3Q" which ranks games in their anticipated effect on the tourney. Basically, games between potential bubble teams reign supreme.


TourneyCast™


For TourneyCast, I run 10,000 simulations of the season (including simulations of every conference tournament), run the results of each sim through the T-Ranketology algorithm to get a projected field of 68, and then simulate the NCAA tournament. The output is odds for every team to get to the tournament (whether through an autobid or at-large) and then odds for advancing to various stages of the tourney.

ADDED: one cool feature of the TourneyCast is that if you filter by conference you get the projected number of bids, at-large chances (interesting for mid-majors), final four chances, and championship chances.


Backfill to 2008


Not sure exactly when I did this, but the site is completely backfilled with team and player stats all the way back to 2008. This includes advanced stats boxscores, which certain other sites don't have back past 2011.

The only exception to this is play by play based stats (win probability, game score). I have a complete set of PBP for last year (2017) and 95% coverage for 2015 and 2016. I will probably add more backfill on that in the coming months, but my impression is that available PBP peters out around 2011 anyhow.

This also includes conference-only advanced stats, and all the other various ways I've got to split the player stats, back to 2008. Notable other sites have conference-only advanced stats back to just the 2014 season.

More Player Stats Splits and Filters


As I just mentioned, I've added new ways to split player stats in the pretty awesome Player Finder tool. Now you can filter out the stats so you look at performance against only top 50 (adjusted for venue) opponents. You can even filter by date, to see who the stars of November were. I've also added a "max height" filter so you can see who the best short rebounder is, for example.

Player Histories and Game Logs


Part of the backfill is that I have advanced player stats back to 2008, including game logs. In addition, one of my major projects this summer was to create a linked database of player histories, so that when you click on a player's name, all his seasons come up. You cannot imagine what a supreme pain in the ass this was. The things I had to learn, I shit you not, were things like natural language processing. Woo boy. But at this point the player histories are about 99% linked. If you see any any discrepancies, please let me know.

Charts and graphs! Charts and graphs!


I just learned how to make some awesome interactive charts and graphs, replacing the old jpeg-based graphs it took me a week to learn how to make. Pretty proud of myself. This new skill is on display in three areas:

Win probability charts

I blogged about my fun foray into created a win probability model here. I've now learned how to make the result more useful and interactive.

Old:

New:

Also: Win probability calculator.


Team Trends Charts

Previously, I had team trends charts for just offensive and defensive efficiency, and they were pictures like the old win prob chart. Now I've got em for most team stats, and they are interactive and fabulous. Note: these charts are also available at the bottom of the "Team Results" page.


Player Stats Trends

Not to be done, the player stats department demanded access to the charts and graphs technology.

Strength of Schedule Stats & Page


Every team page has strength of schedule stats, and there's also a comprehensive page for strength of schedule. These are broken down by overall SOS and non-conference SOS, and also broken down by the SOS so far, and projected SOS for the entire season. (Conference SOS calculations are on the conference pages.)

Might as well say a word about my SOS metrics, since they're a bit unusual. The "basic" metric is just the average of the team's opponents' pythagorean expectancy (adjusted for the location of the game).

The "elite" metric is different, and better. This is the percentage of games an "elite" team (approx .9000 Barthag) would be expected lose against a given schedule. This is better because it reflects how little difference there is between playing mediocre and bad teams at home, and really rewards playing games that a good team could realistically lose.

To see what this is better, imagine three potential opponents: one good team (Barthag of .9000), one mediocre team (.5000), and one terrible team (.1000). Under the basic method, playing two mediocre teams is the same as playing a good team and terrible team because (.5 + .5) / 2 = (.9 + .1) / 2. But this really isn't right for most teams that we care about, because an elite team's chances of losing to the mediocre team (at home, at least) aren't really that different than its chances of losing to the terrible team, while its chances of beating the elite team are quite a bit different: (.05 + .01) / 2 < (.50 + .01) / 2.

Coach History and Team History pages.


A concise way to look at team and coach performance back to 2008.

Advanced Analysis of Unbalanced Conf. Schedules


I blogged about the underlying analysis here: I run simulations to see how differently teams would perform against a true round robin schedule (both winning percentage and chance of winning the conference) and log the results. 

RPI Forecast


Here's one I forgot about: a page showing the projected RPIs of every team based on their current T-Rank rating. This page also shows projected records in each of the selection committees newly delineated "quality win" buckets. If you hover over those records, you can see which teams are in each bucket:


1 comment:

  1. Wow - I knew you had done a bunch - that these are some amazing additions.

    ReplyDelete