Sacred Data

If you want to work with any data on the T-Rank site, please get in touch with me—I'm happy to share and most of it is available in bulk on the site without the need to scrape.

For example, much of the data is available at the site in .csv and .json files in the format of XXXX_team_results.csv (or .json) where XXXX = the year. So, for example, http://barttorvik.com/2019_team_results.csv gives final stats from last season. These files update constantly during the season.

For player stats, see the first comment below.

Sometimes I notice mass scraping operations that are detrimental to site performance, and I take efforts to block those. If that happens to you and your aims were not malicious, let me know.

54 comments:

  1. Hi. I wanted to pull player stats from 2009 to 2016 for a school project. Is there any way could help me get the csv files for each year?

    Thanks

    ReplyDelete
    Replies
    1. csvs for player stats are available on my site at getadvstats.php?year=2009&csv=1 (change the year for other years)

      The column header info is available here: https://www.dropbox.com/s/ryugeykvntto5ji/pstatheaders.xlsx?dl=0

      Delete
  2. Hey I am attempting to pull lineup/player efficiency numbers but cannot find a reliable boxscore api feed with subsitutions. Can you share where you are pulling your data?

    ReplyDelete
    Replies
    1. I use a variety of sources. I've a paid subscription to the feed at natstat.com and also fill in gaps from stats.ncaa.org if necessary. But I don't parse play-by-play for subs (on/off) so not exactly sure if this will help you.

      Delete
  3. Hi Bart, just want to say thanks very much for all your data. Your work is really engaging, and it has been a big hit for us over at No Bid Nation (the only William & Mary-focused basketball blog). I am hoping to put together a model to track the CAA this year, and I will be sure to give you credit!

    ReplyDelete
  4. Hello,
    Do you have a returning production data point? I am happy to compile it myself from a csv file if the compiled data points are available.

    Sincerely,
    Kevin

    ReplyDelete
    Replies
    1. I typically calculate "returning possession minutes" for preseason projections https://www.barttorvik.com/rpms.php

      Delete
  5. Checkout the bigballR R package! Even if you aren't familiar/fluent in R programming the package has functions that will enable you to download/calculate play-by-play/stats (including lineup and on/off stats) and save data it as a csv with only a couple lines of code. Checkout the package's github page (https://github.com/jflancer/bigballR) that includes a handful of examples that should be a big help.

    ReplyDelete
  6. Hi! Is there an easy way to access "Today's Games" with each matchup and its predicted winner, spread, and probability? I'm looking to pull games from 2012-2019.

    ReplyDelete
    Replies
    1. This information is available at YEAR_results.csv - but it only goes back to 2015.

      Delete
    2. Bart- this is a great site so kudos to you and the rest of the crew for compiling this information. I downloaded the YEAR_results.csv files and cannot figure out what the last two columns represent. Can you tell me or point to column headers file? Thanks!

      Delete
    3. I believe the last two columns are pregame "Torvik Thrill Quotient" and pregame projected tempo.

      Delete
  7. Hello, big fan of your content. I run a sports betting YouTube channel a major focus point is a monte carlo simulation model I use. I have used a scraper for ncaa.org for years, but with the mass cancellations this year, its been a bit of a pain, but I've been able to work around it. However, there are still some games missing data, such as Eastern Illinois-UW Green Bay from December 5: https://stats.ncaa.org/contests/1983012/box_score

    I've only found that game and UTEP-St. Mary's that have returned a "Box Score Not Found". It's only 2 games, but still, it bothers me. So I am interested in your thoughts about NatStat as you said you subscribe. I don't need play by play data, just box score data. Is it worth it for just that? Or should I just let go the very small percentage of games on ncaa.org that have no data and not worry about them.

    Thanks, William

    ReplyDelete
  8. Bart: I just found out about your stats website. My bad! I am IndyStar's Butler beat writer and am surprised to see Aaron Thompson 19th in player rankings. It has long been evident how valuable he is, but somehow you have quantified that. If you don't mind, please send short explanation: david.woods@indystar.com.

    ReplyDelete
  9. Hi Bart! This data is awesome! I'm doing data analysis on home field advantages during COVID, but it looks like there is a slight problem with the first few columns of 2021_results.csv. It looks like it is combining both teams and the date into a single column, so the first game of the 2021 season looks like this: McNeese St.Nebraska11-25. Do you have an easy fix for that?

    ReplyDelete
    Replies
    1. Hi Carver. That is intentional, as that field is what I use as a unique gameID. There is a file at YEAR_super_sked.csv that has more information.

      Delete
  10. Hey Bart! do you have a .csv file for all team stats?

    ReplyDelete
  11. Hi Bart! Is there any way to download pre-tournament team statistics from the last few years?

    ReplyDelete
    Replies
    1. Couple ways to do this.

      1) You can use the T-Rank Time Machine (https://barttorvik.com/trank-time-machine.php) to get the actual ratings on the day after Selection Sunday. Those data files are available at /timemachine/team_results/YYYYMMDD_team_results.json.gz(compressed json files)

      2) You can filter the main page to just pre-tournament games by selecting only Regular Season games in the "type" drop down. This doesn't give the exact pre-tourney adjusted efficiency because it doesn't account for the recency bias that the actual ratings use. You can accomplish the same thing by setting the date ranges to end at Selection Sunday.

      This data can be pulled at, e.g. teamslicejson.php?year=2019&json=1&type=R (for 2019). Change "json=1" to "csv=1" for a csv. (I leave it as a fun project for your to figure out the columns.)

      Delete
  12. Hey Bart, is it possible to get the T-Ranketology Now data in json format?

    ReplyDelete
    Replies
    1. There is a file at now_inprob.json

      Delete
    2. thank you, is there a way to include the seed or to sort it by the seed?

      Delete
    3. the "score" is in there (the sixth element for each team) so if you can manipulate the data in your programming language of choice it should be trivial to sort by that.

      Delete
    4. thanks! indeed it does appear that sorting on the sixth element for each team manipulates the data into the correct order for almost all of the 1-12 seeds.

      maybe you can help me further, as i am trying to build a visual representation of the T-Ranketology Now bracket. i can sort on the score element to get most of the 1-12 seeded teams. however, it seems natural that a lot of the First Teams Out have higher scores than the teams that would be seeded 13-16... do you know if it might be possible to use this data to seed teams 13-16 correctly as well?

      Delete
    5. I've created a new file at now_seeding.json that has the projected tourney teams in order of score

      Delete
    6. amazing, thank you so much!!!!

      Delete
  13. Hi Bart,

    Thank you for all you do for the CBB community. Do you have a JSON/CSV file with information on quad 1/2/3/4 wins that includes who team x has beaten in each quadrant?

    ReplyDelete
    Replies
    1. The closest thing I have set up is a file at columns_now.json - it's a poorly organized json file but elements 8 - 11 are dictionary/objects that show who each team has played in each quadrant (8 is Q1, 9 is Q2, etc) but it is not broken down by wins & losses.

      Delete
    2. Okay, that's a start. Thanks. Is there JSON for each team's schedule with results? Maybe I could map the quadrant names from columns_now.json to values in the results file.

      Delete
  14. Hey Bart! big fan of the website and thanks so much for making all of that data available to us! I'm trying to use your super_sked dataset for a class I'm in, and I was just wondering though if you'd possibly be able to share what the column headers are for that dataset? Some are pretty self-explanatory but others I'm not quite sure, thanks again!

    ReplyDelete
    Replies
    1. Sorry I don't actually have this have this easily accessible in a way that would make much more sense so I prefer to leave it as a little puzzle ;)

      Delete
  15. Hi Bart - This is so cool. Is the data from the Teamsheets Rank page available in a .csv?

    ReplyDelete
  16. Hi Bart! Is there a CSV file or Json for a team's schedule and a result of the matchup? We found this page,https://barttorvik.com/results.php?team=Memphis&begin=20081101&end=20090501&conlimit=All&year=2009&top=0&hteam=&quad=5&rpi=&f=1, and we're hoping to find a source of this data without having to scrap it. The statistics you post are really awesome!

    ReplyDelete
    Replies
    1. getgamestats.php?year=2008&tvalue=Memphis will get you most/all of those stats in json.

      Delete
  17. Hi, Bart. Fantastic website! I am doing a school project on NCAA Tournament teams and would love to download your data for just NCAA Tournament teams each year from 2008-2019. Is there a CSV file for that? For instance, I would like to download all data from a page like this for each tournament: https://barttorvik.com/trank.php?year=2008&sort=&top=0&conlimit=All&venue=All&type=T&lastx=0#

    Thanks so much.

    ReplyDelete
    Replies
    1. if you put "&json=1" or "&csv=1" into the URL, you should get the data.

      Delete
  18. Hi Bart, is there anyway to view team strength of schedule ranks over a multi year span? (specifically looking for the 3 seasons from 2018-2021)

    ReplyDelete
    Replies
    1. Here is one way: https://barttorvik.com/program-maps.php?tvalue=Wisconsin&year=2021&sort=&t2value=None&avg=all&top=0&quad=4&venue=All&type=All&xax=99&yax=38

      Delete
  19. Noticed some missing data from Wichita's last game: https://www.barttorvik.com/box.php?muid=CincinnatiWichita+St.3-13&year=2021
    Not sure how this affects anything else related to your ratings.

    ReplyDelete
    Replies
    1. Weird, thanks for letting me know. SHouldn't affect the ratings, but does affect player stats.

      Delete
  20. Hi Bart,

    Great site. Love all the work you do. I'm curious if versioned Team data is available for download? That is, do you have and would you make available the team data from each day of the past few seasons (e.g. 2/17/2019, etc.)?

    ReplyDelete
    Replies
    1. data files are available at /timemachine/team_results/YYYYMMDD_team_results.json.gz(compressed json files)

      Delete
  21. Bart,

    Thanks for an amazing resource.

    Any chance you could leave players on the transfer page after they have committed to a new school? It would be interesting to be able to compare incomings based on Porpagatu! (or whatever else you want).

    ReplyDelete
    Replies
    1. Stats for committed transfers are here: https://barttorvik.com/playerstat.php?link=y&year=trans&minmin=0&start=-11101&end=trans0501

      Delete
  22. Hi Bart,

    Big fan of the site.

    I have been getting the advanced game stats for each game using getgamestats.php?year=2021 and I was wondering if there is anyway to get the raw totals for each game (like total turnovers, total rebounds, etc.) in a similar format as well.

    Thanks in advance

    ReplyDelete
    Replies
    1. those are available in the year_super_sked.json file or the year_season.json file.

      Delete
  23. Hey Bart, thanks for a great resource and being responsive.

    I was wondering if there is a strength of schedule data point? I know you adjust several things based on schedule strength, but I was looking for SOS as a specific number and maybe I'm dumb, but I'm unable to find it.

    If it is available, I am looking for it for multiple years as well.

    Appreciate any help you can provide.

    ReplyDelete
    Replies
    1. Hello,

      There are SOS metrics on the team page, and a summary table here:

      https://barttorvik.com/sos.php?year=2021

      Delete
    2. OK I could be going brain dead again, but I was able to load the CSV of this for 2021, but 2020 is not working. Or I have just forgotten how to do it.

      Appreciate any help. I was trying to put player PORPAGATU! by year with SOS by year dating back to 2009 (but probably didn't really need to go back that far, that's just what I saw on a previous question so for some reason I picked it.

      Delete
    3. Feel free to not post this msg. Just to clarify the previous.

      Actually no, it wasn't the schedule data I got, it was a copy of something I had already loaded.

      I am attempting to use the year=xxxx&csv=1 method.

      Delete
    4. Hi, not sure I'm following completely but there is no CSV available for that SOS page - cant just pull down the table though.

      Delete