Post by eric on Mar 15, 2015 15:27:05 GMT -6
all data obtained from basketball-reference.com and boxscoregeeks.com
There are a lot of stats in basketball. Points, points per game, points per minute, points per possession. This plurality has led many to try and create metrics that combine all information into one number, with the obvious result that there are now many versions of this one number. They have been regrettably dubbed "advanced metrics" or "analytics", but I prefer the term "composite stats". There's nothing at all advanced about any of them: no algebra, no calculus, no programming. If you can do arithmetic, you can compute any of them. Additionally many are publicly available at various sites. I will be focusing on four of these publicly available stats for this post.
PER
Player Efficiency Rating was invented by John Hollinger. It is by far and away the easiest composite stat to compute, which is perhaps why it is the one that has penetrated the furthest into the public consciousness, though this has unfortunately led some to refer to it simply as "efficiency". It uses only the player's box score, as opposed to the player's and the team's, and is the only metric of the four to do so. It can be converted to wins by subtracting a position-dependent value and scaling by minutes (Estimated Wins Added or EWA). These position-dependent values are all within 10% of each other.
WP
Wins Produced were invented by David Berri, Martin Schmidt, and Stacey Brook. It is the only stat invented by academics, which is why it has a bunch of names. It uses both the player's and the team's box scores, and adjusts by position even more heavily than EWA, such that a shooting guard's production (point to point, rebound to rebound) is worth almost twice that of a center's.
WS
Win Shares were invented by Bill James and the general methodology was applied to basketball by Justin Kubatko. It has no explicit position adjustment.
VORP
Value Over Replacement Player was invented by Daniel Myers. Like PER it has to be scaled to produce a quantity of Wins, and it has to be further scaled for those Wins to be overall as opposed to over replacement, as a team full of replacement players would generate more than 0 wins.
There are (many) other composite stats, including Kevin Pelton's WARP and Jeremias Engelmann's RPM. I chose these four because they are the easiest to find historical databases for, and we want as big a sample size as possible to find out which one is best. Bigger sample sizes are always better, and they're especially important when our competitors are all very related to each other. These four stats take very, very different roads to produce their Win totals, but they all end up in very much the same place. If we look at the players of the 2014 NBA season and do linear regressions of one stat to another, we find R^2 values of...
Where P = wins Produced, S = win Shares, V = Vorp wins, E = Estimated wins added. Clearly the four are all very closely related, which we would expect. EWA is the least like the others and WS the most, but that doesn't necessarily tell us anything. Like the four Gospels, John (Hollinger or the beloved) could be the most right or the least right. To figure out which, we should add up all the wins supposedly generated by players and see how well those values match their team's. Let's do that for just the 2014 season and go team by team:
And you can see the flaws of PER on full display. The Pacers and Bulls were elite defensive teams (1 and 2 in DRtg) but didn't generate many steals (27 and 21) so PER thinks they were mediocre at defense while every other stat (because they take into account team stats) correctly identifies them as elite. The Pelicans and Pistons were poor defensive teams (27 and 25) but the Pelicans led the league in blocks and the Pistons were 8th in steals so they got big bumps in EWA.
The net result of all this is that the RMSE in 2014 was 7.68 for EWA, 3.69 for WP, 3.32 for WS, and 5.02 for VW... but it's only one year. What if we do five years? Well, the R^2 of each regression goes WS .9501, WP .9461, VW .9443, EWA .6993. One of these things is not like the other, so from now on we're just going to drop EWA.
An intriguing possibility is that each of the three remaining stats is right in different ways, so while WS sees the most maybe WP sees something it doesn't, and so on. We can do a linear weighting of each stat and come up with Basketball Quotient, so...
BQ = x * WS + y * WP + z * VW
...and it turns out that the ideal weights are 6, 3, and 1. By popular acclaim, the top MVP candidates are Curry, Harden, Westbrook, Davis, and LeBron. When we put their values in with these weights, we get...
...but some other interesting names are...
...for reference, putting up a 200 BQ over a full season is historically great, Jordan/LeBron territory. With about 15 games left in this season, obviously no one's getting there. With Curry and Harden about equal statistically, it comes down to whether you think Curry's much greater team success or Harden's much worse supporting cast is more relevant to the V in MVP. Historically it has been the former, so Curry has the edge for MVP.
When in doubt, go with Win Shares.
There are a lot of stats in basketball. Points, points per game, points per minute, points per possession. This plurality has led many to try and create metrics that combine all information into one number, with the obvious result that there are now many versions of this one number. They have been regrettably dubbed "advanced metrics" or "analytics", but I prefer the term "composite stats". There's nothing at all advanced about any of them: no algebra, no calculus, no programming. If you can do arithmetic, you can compute any of them. Additionally many are publicly available at various sites. I will be focusing on four of these publicly available stats for this post.
PER
Player Efficiency Rating was invented by John Hollinger. It is by far and away the easiest composite stat to compute, which is perhaps why it is the one that has penetrated the furthest into the public consciousness, though this has unfortunately led some to refer to it simply as "efficiency". It uses only the player's box score, as opposed to the player's and the team's, and is the only metric of the four to do so. It can be converted to wins by subtracting a position-dependent value and scaling by minutes (Estimated Wins Added or EWA). These position-dependent values are all within 10% of each other.
WP
Wins Produced were invented by David Berri, Martin Schmidt, and Stacey Brook. It is the only stat invented by academics, which is why it has a bunch of names. It uses both the player's and the team's box scores, and adjusts by position even more heavily than EWA, such that a shooting guard's production (point to point, rebound to rebound) is worth almost twice that of a center's.
WS
Win Shares were invented by Bill James and the general methodology was applied to basketball by Justin Kubatko. It has no explicit position adjustment.
VORP
Value Over Replacement Player was invented by Daniel Myers. Like PER it has to be scaled to produce a quantity of Wins, and it has to be further scaled for those Wins to be overall as opposed to over replacement, as a team full of replacement players would generate more than 0 wins.
There are (many) other composite stats, including Kevin Pelton's WARP and Jeremias Engelmann's RPM. I chose these four because they are the easiest to find historical databases for, and we want as big a sample size as possible to find out which one is best. Bigger sample sizes are always better, and they're especially important when our competitors are all very related to each other. These four stats take very, very different roads to produce their Win totals, but they all end up in very much the same place. If we look at the players of the 2014 NBA season and do linear regressions of one stat to another, we find R^2 values of...
0.8400 PS
0.8353 PV
0.7977 SV
0.6418 PE
0.8221 SE
0.7335 VE
2.3171 P
2.4598 S
2.3665 V
2.1974 E
Where P = wins Produced, S = win Shares, V = Vorp wins, E = Estimated wins added. Clearly the four are all very closely related, which we would expect. EWA is the least like the others and WS the most, but that doesn't necessarily tell us anything. Like the four Gospels, John (Hollinger or the beloved) could be the most right or the least right. To figure out which, we should add up all the wins supposedly generated by players and see how well those values match their team's. Let's do that for just the 2014 season and go team by team:
And you can see the flaws of PER on full display. The Pacers and Bulls were elite defensive teams (1 and 2 in DRtg) but didn't generate many steals (27 and 21) so PER thinks they were mediocre at defense while every other stat (because they take into account team stats) correctly identifies them as elite. The Pelicans and Pistons were poor defensive teams (27 and 25) but the Pelicans led the league in blocks and the Pistons were 8th in steals so they got big bumps in EWA.
The net result of all this is that the RMSE in 2014 was 7.68 for EWA, 3.69 for WP, 3.32 for WS, and 5.02 for VW... but it's only one year. What if we do five years? Well, the R^2 of each regression goes WS .9501, WP .9461, VW .9443, EWA .6993. One of these things is not like the other, so from now on we're just going to drop EWA.
An intriguing possibility is that each of the three remaining stats is right in different ways, so while WS sees the most maybe WP sees something it doesn't, and so on. We can do a linear weighting of each stat and come up with Basketball Quotient, so...
BQ = x * WS + y * WP + z * VW
...and it turns out that the ideal weights are 6, 3, and 1. By popular acclaim, the top MVP candidates are Curry, Harden, Westbrook, Davis, and LeBron. When we put their values in with these weights, we get...
142.71 Harden
139.77 Curry
118.77 Davis
93.36. Westbrook
91.89. LeBron
...but some other interesting names are...
133.17 Chris Paul
122.61 DeAndre
105.72 Butler
89.97. Kyrie
85.05. Wall
84.69. Draymond Green
79.98. Love
74.94. Horford (highest ATL guy)
72.51. Klay
...for reference, putting up a 200 BQ over a full season is historically great, Jordan/LeBron territory. With about 15 games left in this season, obviously no one's getting there. With Curry and Harden about equal statistically, it comes down to whether you think Curry's much greater team success or Harden's much worse supporting cast is more relevant to the V in MVP. Historically it has been the former, so Curry has the edge for MVP.
When in doubt, go with Win Shares.