Sacred Data

If you want to work with any data on the T-Rank site, please get in touch with me—I'm happy to share and most of it is available in bulk on the site without the need to scrape.

For example, much of the data is available at the site in .csv and .json files in the format of XXXX_team_results.csv (or .json) where XXXX = the year. So, for example, http://barttorvik.com/2019_team_results.csv gives final stats from last season. These files update constantly during the season.

For player stats, see the first comment below.

Sometimes I notice mass scraping operations that are detrimental to site performance, and I take efforts to block those. If that happens to you and your aims were not malicious, let me know.

28 comments:

  1. Hi. I wanted to pull player stats from 2009 to 2016 for a school project. Is there any way could help me get the csv files for each year?

    Thanks

    ReplyDelete
    Replies
    1. csvs for player stats are available on my site at getadvstats.php?year=2009&csv=1 (change the year for other years)

      The column header info is available here: https://www.dropbox.com/s/ryugeykvntto5ji/pstatheaders.xlsx?dl=0

      Delete
  2. Hey I am attempting to pull lineup/player efficiency numbers but cannot find a reliable boxscore api feed with subsitutions. Can you share where you are pulling your data?

    ReplyDelete
    Replies
    1. I use a variety of sources. I've a paid subscription to the feed at natstat.com and also fill in gaps from stats.ncaa.org if necessary. But I don't parse play-by-play for subs (on/off) so not exactly sure if this will help you.

      Delete
  3. Hi Bart, just want to say thanks very much for all your data. Your work is really engaging, and it has been a big hit for us over at No Bid Nation (the only William & Mary-focused basketball blog). I am hoping to put together a model to track the CAA this year, and I will be sure to give you credit!

    ReplyDelete
  4. Hello,
    Do you have a returning production data point? I am happy to compile it myself from a csv file if the compiled data points are available.

    Sincerely,
    Kevin

    ReplyDelete
    Replies
    1. I typically calculate "returning possession minutes" for preseason projections https://www.barttorvik.com/rpms.php

      Delete
  5. Checkout the bigballR R package! Even if you aren't familiar/fluent in R programming the package has functions that will enable you to download/calculate play-by-play/stats (including lineup and on/off stats) and save data it as a csv with only a couple lines of code. Checkout the package's github page (https://github.com/jflancer/bigballR) that includes a handful of examples that should be a big help.

    ReplyDelete
  6. Hi! Is there an easy way to access "Today's Games" with each matchup and its predicted winner, spread, and probability? I'm looking to pull games from 2012-2019.

    ReplyDelete
    Replies
    1. This information is available at YEAR_results.csv - but it only goes back to 2015.

      Delete
    2. Bart- this is a great site so kudos to you and the rest of the crew for compiling this information. I downloaded the YEAR_results.csv files and cannot figure out what the last two columns represent. Can you tell me or point to column headers file? Thanks!

      Delete
    3. I believe the last two columns are pregame "Torvik Thrill Quotient" and pregame projected tempo.

      Delete
  7. Hello, big fan of your content. I run a sports betting YouTube channel a major focus point is a monte carlo simulation model I use. I have used a scraper for ncaa.org for years, but with the mass cancellations this year, its been a bit of a pain, but I've been able to work around it. However, there are still some games missing data, such as Eastern Illinois-UW Green Bay from December 5: https://stats.ncaa.org/contests/1983012/box_score

    I've only found that game and UTEP-St. Mary's that have returned a "Box Score Not Found". It's only 2 games, but still, it bothers me. So I am interested in your thoughts about NatStat as you said you subscribe. I don't need play by play data, just box score data. Is it worth it for just that? Or should I just let go the very small percentage of games on ncaa.org that have no data and not worry about them.

    Thanks, William

    ReplyDelete
  8. Bart: I just found out about your stats website. My bad! I am IndyStar's Butler beat writer and am surprised to see Aaron Thompson 19th in player rankings. It has long been evident how valuable he is, but somehow you have quantified that. If you don't mind, please send short explanation: david.woods@indystar.com.

    ReplyDelete
  9. Hi Bart! This data is awesome! I'm doing data analysis on home field advantages during COVID, but it looks like there is a slight problem with the first few columns of 2021_results.csv. It looks like it is combining both teams and the date into a single column, so the first game of the 2021 season looks like this: McNeese St.Nebraska11-25. Do you have an easy fix for that?

    ReplyDelete
    Replies
    1. Hi Carver. That is intentional, as that field is what I use as a unique gameID. There is a file at YEAR_super_sked.csv that has more information.

      Delete
  10. Hey Bart! do you have a .csv file for all team stats?

    ReplyDelete
  11. Hi Bart! Is there any way to download pre-tournament team statistics from the last few years?

    ReplyDelete
    Replies
    1. Couple ways to do this.

      1) You can use the T-Rank Time Machine (https://barttorvik.com/trank-time-machine.php) to get the actual ratings on the day after Selection Sunday. Those data files are available at /timemachine/team_results/YYYYMMDD_team_results.json.gz(compressed json files)

      2) You can filter the main page to just pre-tournament games by selecting only Regular Season games in the "type" drop down. This doesn't give the exact pre-tourney adjusted efficiency because it doesn't account for the recency bias that the actual ratings use. You can accomplish the same thing by setting the date ranges to end at Selection Sunday.

      This data can be pulled at, e.g. teamslicejson.php?year=2019&json=1&type=R (for 2019). Change "json=1" to "csv=1" for a csv. (I leave it as a fun project for your to figure out the columns.)

      Delete
  12. Hey Bart, is it possible to get the T-Ranketology Now data in json format?

    ReplyDelete
    Replies
    1. There is a file at now_inprob.json

      Delete
    2. thank you, is there a way to include the seed or to sort it by the seed?

      Delete
    3. the "score" is in there (the sixth element for each team) so if you can manipulate the data in your programming language of choice it should be trivial to sort by that.

      Delete
    4. thanks! indeed it does appear that sorting on the sixth element for each team manipulates the data into the correct order for almost all of the 1-12 seeds.

      maybe you can help me further, as i am trying to build a visual representation of the T-Ranketology Now bracket. i can sort on the score element to get most of the 1-12 seeded teams. however, it seems natural that a lot of the First Teams Out have higher scores than the teams that would be seeded 13-16... do you know if it might be possible to use this data to seed teams 13-16 correctly as well?

      Delete
    5. I've created a new file at now_seeding.json that has the projected tourney teams in order of score

      Delete
    6. amazing, thank you so much!!!!

      Delete