Wednesday, July 20, 2016

T-Rank website note

Over the past few days I've been transitioning the T-Rank website (at barttorvik.com) over to a different server on the backend. Short story: everything should work the same and all the old links (well, most of them) should forward automatically to the new corresponding page. So please let me know if you run into any problems.

Long story:

Previously, I used a nifty service called Site44 which turns a folder in your dropbox into a web server. So I would just create all the team pages, conference pages, date pages, etc., every time I ran the T-Rank program, save all the files to a local folder, and they'd magically appear on the web.

The only real problem with this was that some bad people use the Site44 service for nefarious purposes, apparently. So Site44's server IP address was getting banned by some aggressive firewalls. I finally decided to bite the bullet and and set up my own hosted web server on the Amazon cloud service (EC2).

The only problem with this is that my old way of doing things -- creating hundreds of small html files and uploading them every time I made a change -- was no longer practicable, because getting the files to my server in the cloud would take a half-hour or more every time. Instead, I had to learn a new programming language (php) so that I could set up a few template pages which will dynamically create the actual webpages when called. Now I'll just upload a few data files whenever I make changes, and the server takes care of the rest.

Anyhow, we can now add php to python, javascript, jquery, css, and html on the list of programming I've (sort of) learned in bringing T-Rank to the web. Next on the list is SQL. In theory I could potentially turn this new knowledge into something practical someday!

Sunday, June 12, 2016

Happ could get UW career rebounding record.........in 3 seasons.

If Happ stays all 4 years and rebounds anywhere near the level he did as a freshman he will not only own the career rebounding record, he will crush it. A more interesting question is if he can get the record in just 3 seasons. This is not very far fetched. 

The record is currently held by Claude Gregory at 904 from 1978-81. Happ had 278 in his freshman year, which was also the 9th best single season tally in Badger history (record is Jim Clinton 344 in 1951). 

To get to 905 Happ would have to average 905-278=627/2 seasons=313.5/35games (assuming same number of games as his freshman year)=9.0 rebounds per game. 

Happ averaged 7.9 rebounds per game as a freshman, so an increase of 1 per game doesn't seem unreasonable with a small increase in minutes played in his next 2 seasons over his freshman season. 

If Happ were to average 9/game for the next 3 seasons(35 games played average), he would end up with 1219 career boards. 

Tuesday, May 24, 2016

2000 for Nigel

Now that Nigel is back, it's time to start looking at his potential for Badgers history in 2016-17. Nigel has a chance to become the 3rd leading scorer in Badgers history with a good season.
He currently has scored 292+497+551= 1340 points. Badger record is Alando's 2217, Finley is 2nd with 2147, and 3rd is Danny Jones 1854.

The Badgers will play 13 non-conference games, 18 conference games plus at least one game in the Big Ten Tourney for 32, and likely additional post season games. I will be realistic and say that the Badgers will play about 35 games which is the same as they played last season.
In order to catch those guys with 35 games Nigel would need to average:

Alando- 2217-1340=877/35=25.1 ppg
Finley- 2147-1340=807/35=23.1 ppg
Jones- 1854-1340=514/35=14.69 ppg

Since he averaged 15.7 last season, it would seem reasonable that he ends up in 3rd by the end of 2016-17 barring injury. Seems highly unlikely that he can catch either Finley or Alando. For some perspective, the greatest season in Badgers history (by my very limited research) was Clarence Sherrod in 1970-71 when he averaged 23.8 ppg. Alando's best season was 19.9 ppg, and Finley's was 22.1 ppg.

So what about 2000 points?
2000-1340=660/35=18.86 ppg.

This seems like a stretch, but possible. Nigel may carry less of the scoring load with the young players developing and since Vitto and Showalter played so much better down the stretch last year. However, the Badgers weren't a very good scoring team last year, so maybe they score more and the rising tide lifts Nigel's boat enough to get there.

If Bucky makes some deep tourney runs his odds of getting there look much better. If they were to make the Big Ten Championship game, and make an elite 8 run that gets them to 38 games.
660/38=17.37 ppg.

I'm glad he came back so we will get to track this all season long.

Friday, May 20, 2016

Some thoughts on Nigel's Decision

As we all know, Nigel Hayes is contemplating whether to turn pro instead of returning for his senior year of college. The deadline for him to decide is May 25th.

Fans and pundits (including Dickie V himself) are nearly unanimous: Nigel, come back!

There are a few fans -- seemingly put off by Hayes's outspokenness on the NCAA's essential contradictions -- who think Nigel is gone. He's sick of college, they say. He'd rather do anything that play another year of basketball for free. 

I think that's wrong. Hayes has been very open about his thought-process: he wants to whatever will give him the best chance of having a long NBA career. If that means coming to back to college for another year, that's what he's going to do. He's not going to play in the Turkish league out of spite.

It would be an easy call if he was relatively assured of being drafted in the first round. That's a guaranteed tanker full of money to play basketball, and except in rare cases going pro in that situation is a no-brainer.

It would (will?) also be an easy call if Hayes is assured that no one will draft him at all. That's a virtually guaranteed ticket an extended stint in the D-League or Europe, and Nigel has been pretty clear that's not his goal.

But the situation is this: Nigel may well get drafted in the second round. That's not ideal, but it's not necessarily a dead end, either. A few things have changed recently that make getting drafted in the second round potentially not-so-bad:

1) NBA teams are starting to realize the value of second-round picks. Players like Draymond Green are showing that there's plenty of talent still left. And teams are free to negotiate any deal they want with second-round picks, so they can be creative about structuring deals with players who are intriguing and may well develop into something. 

2) In Hayes's case in particular, his "type" is something of the flavor of the month. "Position-less basketball" is the watchword, as everyone tries to copy the magic of the Warriors. A few years ago, Hayes might have been ignored as a tweener. But now there's a chance teams may key in on this as an attribute -- particularly given his rather freakish 7'3" wingspan.

So if Hayes is given some indication that he'll be taken in the second round by a team that is willing to work with him, that is a very intriguing and tantalizing opportunity.

The flip side of this is: can he really prove anything to pro teams with one more year of college basketball? Of course, if he come back and shoots 45% from three, he will raise his stock considerably. But how likely is that? And how much opportunity will he have in the strictures of the Wisconsin offense to show off the shooting guard skills that NBA teams would want to see out him? Unless he has a great year next year, or at least a great tourney run, the second round may well be his destiny no matter what. In that case, why not get started now?

Ultimately, I think Hayes probably will be back, because I don't think he's going to get any assurance of being drafted. He knows he can play better than he played last year, and a good senior year should at least assure him of a spot in the draft. But it's not a slam dunk.


Saturday, April 16, 2016

Badger fans are spoiled.

WI State Journal had a good article today about the attendance at Badger games. They have some interactive charts and data that is nice.

The article shows the data about empty seats at Badger games which the university tracks with scanned tickets. I have been complaining about empty seats for a while as I have been going to Badger home games in football and basketball for about 20 years. I don't know that the percentages are a ton worse than they ever have been, but when your team wins a lot, as the Badgers do now, it seems like people would show up more.

This is old curmudgeon Adam at his worst. When I was a kid going to a Badger game was a big deal, and this was when they sucked at everything but hockey. Buying season tickets for a team and then only going to a handful of games just doesn't make sense to me. I am also very cheap, so the idea that someone spends a minimum of $400-500 for one person for one season of football or basketball tickets (and likely much more), and then doesn't go is crazy.

Thursday, March 17, 2016

Retrofitting T-Ranketology

I had fun with my hobby project T-Ranketology this year. The results are over at bracketmatrix.com, and I think they're acceptable -- better than major algorithmic projections like KPI and Team Rankings, but worse than most human bracketologists. This makes sense to me, because the tournament selection is a very human affair, and it's hard for a simple model like T-Ranketology to encompass all the vagaries of that process with any real fidelity. As I put it shortly after the bracket was announced: you can't model Madness.

But I will continue to try. To that end, I ran a bunch of experiments to see if I could retrofit T-Ranketology to produce a more accurate bracket. Here are the inputs to T-Ranketology:

RPI
WAB (wins against bubble)
Elo (my E-Rank, a simple elo rating seeded by T-Rank)
"Resume"

Each team was ranked in each of these categories, then their ranks were added up to get a total score (with lower being better). That was T-Ranketology this year.

The "resume" rating needs a better name, and I've got my branding people looking into it. But it was clear to me that bracketology is impossible if you aren't paying attention to "top 50 wins" and the like. So to come up with the Resume rating I used the following point values:

Top 50 win: 10 points
other top 100 win: 3 points:
sub-100 loss: -3 points
sub-200 loss -6 points

Obviously these point values were assigned rather arbitrarily, though I did some experimentation to get a bracket that passed the eye test.

One other possible addition to the algorithm would be T-Rank itself. It's pretty clear that efficiency rating does come into the selection process, at least at the margins. For example, efficiency rating must have been the determining cause of Vanderbilt's inclusion. But it's also ignored a lot, particularly when it comes to seeding.

Anyhow, I've now done a bunch of experiments -- running thousands and thousands of brackets with different values of inputs for each of the T-Ranketology inputs -- to see what the ideal weight of the factors would be. Here are the results:

Resume x 3.5
RPI x 1.5
T-Rank x. 1.5
Elo x 0.25
WAB x 0.2

With Resume being calculated as follows:

Top 25 wins: 16 points
other top 50 wins: 13 points
other top 100 wins: 5 points
sub 100 losses: -1 point
sub 200 losses: -5 points

The original T-Ranketology got a score of 312 on bracketmatrix, by getting 65 teams right, nailing the seed on 31 and within 1 seed on 24 others.

This version of T-Ranketology gets a score of 351 (which would tie for first place this year), by getting 66 teams right, nailing the seed on 44 teams, and within 1 seed on 21 others. This algorithm gets Tulsa and Vanderbilt into the tournament, but leaves Providence and Wichita State as 2nd and 3d teams out, respectively. (St. Bonaventure is the first team out.) Saint Mary's remains in the field, though as a 10-seed instead of an 8-seed. Florida also sneaks into the tournament.

If you want to see the bracket produced by this algorithm, it's here.

Clearly, this version of the algorithm is "over-fit" to this year's results. But I think this exercise does provide some insights. Most obviously, the "resume" rank is extremely important. This is how you get Tulsa into the tournament. You have to really value those wins against "top 25" and "top 50" teams. Bad losses don't matter too much, at least not much more than they already matter for the other ranks. The elo and WAB ratings add a little, but not much, to the analysis.

So, this is the algorithm I'll go with next year. The committee will probably do something entirely different and prove, once again, that you can't model Madness.

Tuesday, March 15, 2016

Thoughts on the bracket, part 2: The Snubs

The T-Ranketology algorithm had three teams in the tournament that the selection committee found underserving: St. Bonaventure, Saint Mary's, and South Carolina.

It's pretty obvious what they have in common: they all start with the letter "S". Frankly, anti-S discrimination is as good a theory of why the committee does what it does any other. But let's dig a little deeper into their resumes, and that of the other cause célèbre, Monmouth.

Monmouth

T-Ranketology was not surprised by Monmouth's exclusion, as they were the 12th team out according to the algorithm. This is because they got killed in the "resume" column because of their three bad losses to sub-200 teams.

Monmouth is a tough case because they had the four great wins in the non-conference, against UCLA, USC, Notre Dame, and Georgetown -- all away from home. Since high-major teams have no incentive to play true mids or low-majors on the road, the only path for a team like Monmouth to an at-large bid is go giant slaying on the road, and that's exactly what they did.

Unfortunately, the UCLA and Georgetown wins ended up not looking so great in the committee's eyes, because Georgetown was a sub-100 RPI team and UCLA was 99th. (Indeed, if they'd lost to Georgetown that would counted as a "bad loss"!) This is true even though those were both true road games, which makes them impressive wins by any measure, except any measure the committee pays attention to.

In the end, it was the three sub-200 losses that killed them. I'm pretty sure no team has ever gotten an at-large bid with three losses in that category. This is somewhat unfair to Monmouth, of course, because most at-large contenders do not play very many sub-200 teams on the road. As a result, we don't have an intuitive feel for how often at-large contenders should really lose these games. Monmouth played 11, and they went 8-3. That's not good, but how bad is it?

Easy answer: too bad. I think Monmouth is in the tournament if they go 9-2 in those games. But it was three strikes you're out.

South Carolina

South Carolina was the last team in according to T-Ranketology, and no one is weeping over their omission from the field. They played a crappy schedule, which got them out to a 14-0 start. They were even 20-3, but went just 3-6 in their last nine games. Although their schedule was weak, they actually performed admirably against it, compiling +2.0 WAB, which means they won two more games than you'd expect an average bubble team to win. But (as we'll see) the committee comes down hard on teams that didn't "challenge" themselves during the non-conference, and South Carolina's OOC SOS was 271st according to the RPI.

Since South Carolina is a major conference team that has no excuse for playing such a weak schedule, and they were right on the bubble by all metrics, no one cares about saying sayonara to South Carolina. Hmm, maybe I should write a song called "Sayonara, South Carolina."

Saint Mary's

In my opinion, Saint Mary's was the real snub this year. T-Ranketology had them into the field easily as an 8-seed, and they were in a majority of final brackets at bracketmatrix.com. They had a weak nonconference schedule, but Seth Burn has already detailed how well they performed against the schedule they played, in terms of Wins Against Bubble. They also did well in other metrics traditionally associated with good tourney resumes, such as elo.

Ultimately, they were done in by their lack of "good wins." Their record against the top 100 was great -- 6-3, but the committee cares most about number of wins, not winning percentage. And in the all important "wins against top 50" they had just two. And both those were against Gonzaga, which only squeaked into the top 50 after they beat Saint Mary's in the WCC championship game.

This is a case, I think, where the committee was forced to come face-to-face with the absurdity of its own metrics. Heading into the game against Gonzaga, Saint Mary's had a blank resume, highlighted by zero top-50 wins. They lost that game, convincingly. But now, because of that loss, they had an infinitely better resume, with two top-50 wins. How could a loss possibly improve their resume so much?!

That's a little thing called cognitive dissonance, Jack.

The committee did what anyone does when experiencing cognitive dissonance: it moved on as quickly as possible. Buh-bye, Saint Mary's.

St. Bonaventure

The Bonnies were the last of the S-nubs, and the most surprising, as they were in most every final bracket. But upon examination, their big calling card was a high RPI. The rest of their resume metrics were bubblicious, or worse: 49th in elo, 50th in "resume" (good wins minus bad losses), and 69th in WAB. I shed no tears for St. Bonaventure. Indeed, their exclusion is another sign that raw RPI is (appropriately) not much of an independent factor in the deliberations.