Composite Confidence

eric
New York Knicks

Posts: 5,909

Likes: 2,131

Dump Bucks: 7,425

Joined: January 2015

Composite Confidence Jan 12, 2018 18:47:15 GMT -6

Quote

Post by eric on Jan 12, 2018 18:47:15 GMT -6

or:

There Is No Fate But What God Plays Dice With

or:

How I Learned to Stop Worrying and Feel the Noise

.

We recently talked about how on/off is a wildly noisy stat that even after a full season is almost entirely useless. Noise is everywhere, though, so I thought it would be worthwhile to investigate what the noise was like in other stats, because it was possible on/off for all its apparent uselessness was actually the least unreliable option. Win Shares and Wins Produced are the two best composite stats, plus the method of calculation is publicly available, so I focused on them. This is going to get pretty long even for me so I'm breaking it up into several posts.

First post: The Short Version
Just gives you the answer.

Second post: Of Coins and Clarity
Everyone knows the term small sample size. Show me a baby that claims not to and I will show you a liar in short pants. Small samples are less reliable, large samples are more reliable. Those are words. Weak, human words. Can we nail down the meaning a little better with cold, precise numbers? preferably by analogy to flipping coins? We can. And we will.

Third post: Simulation of a League of the Basketball
I had to build a basketball simulation in order to get box score data for seasons with identical players. (Why? Because it wasn't there.) This'll go into how I did it, the assumptions I made, the rules I used, how Access is for nerds and my way is totally better.

Fourth post: Numbers Beget Numbers, Beget Greater Numbers
How the composite stats turned out, and how I went from there to the conclusions in the first post. A flat circle is math.

Last Edit: Jan 12, 2018 18:51:22 GMT -6 by eric

eric
New York Knicks

Posts: 5,909

Likes: 2,131

Dump Bucks: 7,425

Joined: January 2015

Composite Confidence Jan 12, 2018 18:47:28 GMT -6

Quote

Post by eric on Jan 12, 2018 18:47:28 GMT -6

The Short Version

Win Shares per 48 for a single season are ± .035
Wins Produced per 48 for a single season are ± .060

For a player who plays 2700 minutes, you would therefore put a ± 2 on their total Win Shares and a ± 3.4 on their total Wins Produced if you knew what was good for you, because 2700 / 48 * .035 = 2. This is a useful rule of thumb because a player who plays 34 minutes per game for 78 games hits 2652, so it's about what starters tend to play in the NBA.

Illustration: last year James Harden had 15 Win Shares and Jimmy Butler had 14. We can't say that either player was better to a statistically significant degree, because 15 - 2ish is less than 14. (Strictly speaking because 15 - sqrt(2ish^2 + 2ish^2) is less than 14, because two independent sources of error should be added in quadrature, but you get the idea.) (I swear this is the short version.)

Illustration: Nik Stauskas had 1 Win Share. We can say that both Harden and Butler were better than the Castilian Sauceman to a statistically significant degree, because both 15 and 14 are much more than 2ish higher than 1.

eric
New York Knicks

Posts: 5,909

Likes: 2,131

Dump Bucks: 7,425

Joined: January 2015

Composite Confidence Jan 12, 2018 18:47:48 GMT -6

Quote

Post by eric on Jan 12, 2018 18:47:48 GMT -6

Of Coins and Clarity

Suppose you have a coin, let's call it Coin A. You want to know if Coin A is fair or foul. If you flip it ten times and it comes up heads five times, that's probably not good evidence it's fair, because even a somewhat rigged coin could happen to come up five heads. If you flip it ten billion times and it comes up heads five billion times, that probably is good evidence it's fair. If you flip it a hundred times and it comes up heads fifty times, is that good evidence or not?

This is what we talk about when we talk about small sample size. At what point specifically is the sample no longer small? The technical answer is at the point when the noise is smaller than the signal we're trying to measure. In many cases, including flipping coins, we can directly calculate the noise and then it's easy. But let's suppose we can't, because we definitely can't for intricately calculated composite stats, what then?

Let's take a coin we know is fair, call it Coin B, and flip it a hundred times just like Coin A. And let's do this a hundred times so we can get a good understanding of how much variation noise can account for.

Illustration:

Coin B ranged from 32 heads to 63 heads out of 100 flips, with most results coming around 50. These kind of ranges are usually not reported by going from minimum to maximum but with a calculation called the standard deviation, which tells you that if you go plus or minus that you'll get 68% of the observed values, if you go plus or minus twice that you'll get 95%, three times that you'll get 99.7%. Turning this around, if you were to run a hundred and first trial you would give it a 68% chance of falling within one standard deviation of the average, 95% of falling within two, and so on. These escalating levels of confidence give rise to the term confidence interval, and for reasons the confidence interval usually reported in scientific literature is twice the standard deviation, or 95%.

In this case our average is 50 and our standard deviation is 5. Though we could never guess exactly how many entries would be in each bin, these two values are exactly what we would expect because 100 tries * 50% chance of coming up heads = 50, and 50% chance of coming up heads * 50% chance of not coming up heads * 100 tries all taken to the square root = 5. Thus, we should not be surprised if a fair coin came up with 40 or 60 heads in 100 tries... which is (approximately) the same as saying that we should not be surprised if a coin that's actually rigged to be 40% heads came up with 50 heads in 100 tries, which brings us back to Coin A: the best we can say from a hundred flips is that it's probably not rigged any further than ± 10%, but we can't reject the hypothesis that it's for example a 55% heads coin, or 41% heads.

I've rigged up a Google "Sheet" with a hundred flips of a fair coin, a pivot table, and a bar graph walk into a bar. The bartender says that Google has no recalculate button the way Excel does, but if you just change the letter next to the arrow where it says "change this letter" it'll self-update the (Microsoft) works. The pivot table bursts into tears and says "Sports fan, I am the great Santini!" Good joke. Everyone refreshes. Curtains. Play around with it and you can see how even a big sounding sample that would be very laborious to tally (i.e. flipping a coin a hundred stupid times) is nevertheless pretty small when it comes to statistical significance.

Last Edit: Jan 12, 2018 18:50:43 GMT -6 by eric

eric
New York Knicks

Posts: 5,909

Likes: 2,131

Dump Bucks: 7,425

Joined: January 2015

Composite Confidence Jan 12, 2018 18:48:11 GMT -6

Quote

Post by eric on Jan 12, 2018 18:48:11 GMT -6

Simulation of a League of the Basketball

Alright. Let's get back to basketball. Win Shares and Wins Produced are quite intricate calculations that take box score values as inputs and spit out one number intended to reflect how much credit a player deserves for winning (or losing). We can't run an NBA season with identical players in real life because there are no identical players in real life. So let's build a simulation that mirrors the NBA but for the players and RUN IT!

I took the stats from the 2016-2017 NBA regular season to create the average point guard, shooting guard, and so on down the line.
I set up a rotation of ten players, an identical starter and backup for each position.
Starting perimeter players played exactly 34 minutes each game, starting bigs 32 minutes.

On every possession I utilized eight random numbers to model eight possible parameters. I used chances proportional to how frequently each event happened in the actual NBA. For example, there was a 12.4% chance for a possession to end in a turnover, and a 27.1% chance that that turnover would come from the point guard, and a 57.2% chance that that turnover would be due to a steal, and a 21.8% chance that steal would be from the opposing small forward. In the long run we would expect exactly 12.4% of possessions to end in a turnover, but in any given game or season that number could be more or less depending on where the random numbers happened to fall. (As it happened, the actual range of TOV% was 11.7% to 13.5%.) Here are the eight specific parameters:

-what kind of play it was (three point attempt, two point attempt, free throw attempt, turnover)
-who made the play
-how many points resulted
-whether the player was assisted
-whether the ball was stolen
-whether the shot was blocked
-whether the ball was rebounded
-whether the player was fouled

Where the last five also include who did the assisting, stealing, etc., and only occur on the relevant play type. For example: you can't assist a turnover, you can't block a free throw attempt, you can't triple stamp a double stamp. In this way we measure everything that goes into the box score.

Illustration: The eight random numbers .593 .688 .748 .658 .135 .695 .164 .142 produced the play a4fgaa40ptsa0astb0stlb0blka5orbb0pf. That is, the starting power forward (a4) took a two point attempt (a field goal attempt that is not a three) and generated 0 points because they missed, so by definition the assist went to no one (a0) and no one stole the ball (b0), it so happened that the shot was blocked by no one (b0), the starting center (a5) got the offensive rebound, and since it was a field goal attempt there was no foul (b0). Since there was an offensive rebound, team A had the ball for the next possession.

There are several ways the model was limited:

-no foul outs or ejections
-no flagrant or technical fouls
-no loose ball or otherwise non shooting fouls
-no fatigue, injuries, trades, or suspensions - players must be identical to themselves, that's the point
-no overtimes
-no change of possession at quarter end
-free throw attempts were limited to two shot situations: no and ones, no three shot fouls, no clear path or off ball fouls
-rebounding a missed second free throw was given the same chances as rebounding a missed field goal

Additionally, pace is a pretty complicated formula so it proved difficult to match exactly. My league ended up at around 94.6 pace instead of the goal of 96.4.

After every game was finished I recorded the resulting box score and started over until I reached 82 games, then I calculated each player's Win Shares and Wins Produced, then I started over until I had 100 seasons.

Then the real fun began. >:)

eric
New York Knicks

Posts: 5,909

Likes: 2,131

Dump Bucks: 7,425

Joined: January 2015

Composite Confidence Jan 12, 2018 18:48:23 GMT -6

Quote

Post by eric on Jan 12, 2018 18:48:23 GMT -6

Numbers Beget Numbers, Beget Greater Numbers

Once you have the box score calculating either composite stat is relatively simple:

SPOILER: Click to show

Win Shares
ScPoss = (FG_Part + AST_Part + FT_Part) * (1 - (Team_ORB / Team_Scoring_Poss) * Team_ORB_Weight * Team_Play%) + ORB_Part
FG_Part = FGM * (1 - 0.5 * ((PTS - FTM) / (2 * FGA)) * qAST)
qAST = ((MP / (Team_MP / 5)) * (1.14 * ((Team_AST - AST) / Team_FGM))) + ((((Team_AST / Team_MP) * MP * 5 - AST) / ((Team_FGM / Team_MP) * MP * 5 - FGM)) * (1 - (MP / (Team_MP / 5))))
AST_Part = 0.5 * (((Team_PTS - Team_FTM) - (PTS - FTM)) / (2 * (Team_FGA - FGA))) * AST
FT_Part = (1-(1-(FTM/FTA))^2)*0.4*FTA
Team_Scoring_Poss = Team_FGM + (1 - (1 - (Team_FTM / Team_FTA))^2) * Team_FTA * 0.4
Team_ORB_Weight = ((1 - Team_ORB%) * Team_Play%) / ((1 - Team_ORB%) * Team_Play% + Team_ORB% * (1 - Team_Play%))
Team_ORB% = Team_ORB / (Team_ORB + (Opponent_TRB - Opponent_ORB))
Team_Play% = Team_Scoring_Poss / (Team_FGA + Team_FTA * 0.4 + Team_TOV)
ORB_Part = ORB * Team_ORB_Weight * Team_Play%

FGxPoss = (FGA - FGM) * (1 - 1.07 * Team_ORB%)
FTxPoss = ((1 - (FTM / FTA))^2) * 0.4 * FTA
TotPoss = ScPoss + FGxPoss + FTxPoss + TOV

PProd = (PProd_FG_Part + PProd_AST_Part + FTM) * (1 - (Team_ORB / Team_Scoring_Poss) * Team_ORB_Weight * Team_Play%) + PProd_ORB_Part
PProd_FG_Part = 2 * (FGM + 0.5 * 3PM) * (1 - 0.5 * ((PTS - FTM) / (2 * FGA)) * qAST)
PProd_AST_Part = 2 * ((Team_FGM - FGM + 0.5 * (Team_3PM - 3PM)) / (Team_FGM - FGM)) * 0.5 * (((Team_PTS - Team_FTM) - (PTS - FTM)) / (2 * (Team_FGA - FGA))) * AST
PProd_ORB_Part = ORB * Team_ORB_Weight * Team_Play% * (Team_PTS / (Team_FGM + (1 - (1 - (Team_FTM / Team_FTA))^2) * 0.4 * Team_FTA))

Stops = Stops1 + Stops2
Stops1 = STL + BLK * FMwt * (1 - 1.07 * DOR%) + DRB * (1 - FMwt)
FMwt = (DFG% * (1 - DOR%)) / (DFG% * (1 - DOR%) + (1 - DFG%) * DOR%)
DOR% = Opponent_ORB / (Opponent_ORB + Team_DRB)
DFG% = Opponent_FGM / Opponent_FGA
Stops2 = (((Opponent_FGA - Opponent_FGM - Team_BLK) / Team_MP) * FMwt * (1 - 1.07 * DOR%) + ((Opponent_TOV - Team_STL) / Team_MP)) * MP
+ (PF / Team_PF) * 0.4 * Opponent_FTA * (1 - (Opponent_FTM / Opponent_FTA))^2
Stop% = (Stops * Opponent_MP) / (Team_Possessions * MP)

DRtg = Team_Defensive_Rating + 0.2 * (100 * D_Pts_per_ScPoss * (1 - Stop%) - Team_Defensive_Rating)

Team_Defensive_Rating = 100 * (Opponent_PTS / Team_Possessions)
D_Pts_per_ScPoss = Opponent_PTS / (Opponent_FGM + (1 - (1 - (Opponent_FTM / Opponent_FTA))^2) * Opponent_FTA*0.4)

+((((2*(B4+0.5*D4)*(1-0.25*(P4-F4)/C4*(Q4/48*1.14*(K$29-K4)/B$29+(K$29/48*Q4-K4)/(B$29/48*Q4-B4)*(1-Q4/48))))+((B$29-B4+0.5*(D$29-D4))/(B$29-B4)*(P$29-F$29-P4+F4)/(C$29-C4)*K4/2)+F4)*(1-((B$29+(1-(1-F$29/G$29)^2)*G$29*0.4)/(C$29+0.4*G$29+N$29))*((1-H$29/J$29)*((B$29+(1-(1-F$29/G$29)^2)*G$29*0.4)/(C$29+0.4*G$29+N$29))/((1-H$29/J$29)*((B$29+(1-(1-F$29/G$29)^2)*G$29*0.4)/(C$29+0.4*G$29+N$29))+(H$29/J$29)*(1-((B$29+(1-(1-F$29/G$29)^2)*G$29*0.4)/(C$29+0.4*G$29+N$29)))))*H$29/(B$29+(1-(1-F$29/G$29)^2)*G$29*0.4))+(H4*((1-H$29/J$29)*((B$29+(1-(1-F$29/G$29)^2)*G$29*0.4)/(C$29+0.4*G$29+N$29))/((1-H$29/J$29)*((B$29+(1-(1-F$29/G$29)^2)*G$29*0.4)/(C$29+0.4*G$29+N$29))+(H$29/J$29)*(1-((B$29+(1-(1-F$29/G$29)^2)*G$29*0.4)/(C$29+0.4*G$29+N$29)))))*((B$29+(1-(1-F$29/G$29)^2)*G$29*0.4)/(C$29+0.4*G$29+N$29))*P$29/(B$29+(1-(1-F$29/G$29)^2)*0.4*G$29)))-0.92*((B4*(1-0.25*(P4-F4)/C4*(Q4/48*1.14*(K$29-K4)/B$29+(K$29/48*Q4-K4)/(B$29/48*Q4-B4)*(1-Q4/48)))+0.25*((P$29-F$29-P4+F4)/(C$29-C4))*K4+(1-(1-(F4/G4))^2)*0.4*G4)+(C4-B4)*(1-1.07*H$29/J$29)+(1-(F4/G4))^2*0.4*G4+N4)*(P$26/(C$26+N$26+G$26*0.4-1.07*H$26/(J$29+J$30)*(C$26-B$26))))/(0.32*(P$26/82/2))+Q4/48/5*(0.5*(C$29+0.4*G$29+N$29-1.07*H$29/J$29*(C$29-B$29)+C$30+0.4*G$30+N$30-1.07*H$30/J$30*(C$30-B$30)))*(1.08*(P$26/(C$26+N$26+G$26*0.4-1.07*H$26/(J$29+J$30)*(C$26-B$26)))-((P$30/(0.5*(C$29+0.4*G$29+N$29-1.07*H$29/J$29*(C$29-B$29)+C$30+0.4*G$30+N$30-1.07*H$30/J$30*(C$30-B$30))))+0.2*((P$30/(B$30+(1-(1-(F$30/G$30))^2)*G$30*0.4))*(1-(((L4+M4*((B$30/C$30*(1-H$29/J$29))/(B$30/C$30*(1-H$29/J$29)+(1-B$30/C$30)*H$29/J$29))*(1-1.07*H$29/J$29)+I4*(1-((B$30/C$30*(1-H$29/J$29))/(B$30/C$30*(1-H$29/J$29)+(1-B$30/C$30)*H$29/J$29))))+(((C$30-B$30-M$29)*((B$30/C$30*(1-H$29/J$29))/(B$30/C$30*(1-H$29/J$29)+(1-B$30/C$30)*H$29/J$29))*(1-1.07*H$29/J$29)+N$30-L$29)*Q4/48/5+O4/O$29*0.4*G$30*(1-(F$30/G$30))^2))*5*48/Q4/(0.5*(C$29+0.4*G$29+N$29-1.07*H$29/J$29*(C$29-B$29)+C$30+0.4*G$30+N$30-1.07*H$30/J$30*(C$30-B$30)))))-P$30/(0.5*(C$29+0.4*G$29+N$29-1.07*H$29/J$29*(C$29-B$29)+C$30+0.4*G$30+N$30-1.07*H$30/J$30*(C$30-B$30))))))/(0.32*(P$26/82/2))

Wins Produced

PROD = 3FGM*0.064 + 2FGM*0.032 + FTM*0.017 + FGMS*-0.034 + FTMS*-0.015 + REBO*0.034 + REBD*0.034 + TO*-0.034 + STL*0.033 + FTM(opp.)*-0.017 + BLK*0.020
FTM(opp.) = PF / Team PF * Team FT allowed
TDRBPM = MP * .034 * .504 * (Team DRB – Player DRB) / (Team Minutes Played – Player Minutes Played)
TAPM = FGA * .032586 * 2 * .725 * (Team Assists – Player Assists) / (Team Minutes – Player Minutes)
Team Defense Adjustment = [(3FGM(opp.)*-0.064 + (2FGM(opp.)*-0.031 + TO(opp.)*0.033 + TOTM*-0.034 + REBTM*0.033 – BLKTM*0.200)/Minutes Played]*48
DEFTM48 = League Average Team Defensive Adjustment – Team Defensive Adjustment
ADJ P48 = PROD * 48 / MP + (Team TDRBPM * Player DRB / Team DRB - TDRBPM) + Player AST / Team AST * Team TAPM + DEFTM48

Position Average Adj. P48
Point Guards 0.191
Shooting Guards 0.158
Small Forwards 0.186
Power Forwards 0.256
Centers 0.296

WP48 = ADJ P48 - Average ADJ P48 + .099
Wins Produced = WP48 * MP / 48

(((((D4*0.064+E4*0.032+F4*0.017+(C4-B4)*-0.034+(G4-F4)*-0.015+H4*0.034+I4*0.034+N4*-0.034+L4*0.033+(O4/$O$29*$G$30)*-0.017+M4*0.02)+0.034*((S$2*I4/I$29)-S4))+K4/K$29*T$2-T4)*48/(Q4*82)+((D$30*-0.064+E$30*-0.031+N$30*0.033+N$29*-0.034+J$29*0.033-M$29*0.2)/(82*5)-(D$29*-0.064+E$29*-0.031+N$29*0.033+N$30*-0.034+J$30*0.033-M$30*0.2)/(82*5))/2-U4)+0.099)*Q4*82/48

Note that unlike in real life we never have to worry about which position a player really is for Wins Produced purposes, because we've defined it from the get go. As you've no doubt noticed personal fouls only come into Win Shares and Wins Produced as a proportion of team fouls, so no non shooting fouls doesn't hurt us. Before we look at how the values change from season to season, I thought we could whet our appetite a little by looking at how a player's WS and WP in a given season were related, if at all:

That looks a lot like two distinct lines to me. What if we split them up so that bigs are blue, wings are green, and points are orange?

Yep. The wings and point distributions are slightly different but obviously much more similar to each other than the bigs. As you can see above these are wildly dissimilar approaches to assigning value and yet the different approaches turn out to be extremely well correlated with each other. There's more than one way to skin an apple, as the human saying goes.

Alright so let's look at our stats, all values given as per 48 minutes:

Labels	Avg ws	Max ws	Min ws	SD ws	Avg wp	Max wp	Min wp	SD wp
1	.0835	.1172	.0346	.0191	.1475	.2231	.0628	.0315
2	.0612	.1101	.0212	.0169	.1131	.1887	.0377	.0274
3	.0981	.1479	.0515	.0180	.1711	.2524	.1034	.0289
4	.1064	.1434	.0611	.0167	.1251	.1986	.0480	.0293
5	.1700	.2141	.1237	.0174	.2290	.3057	.1468	.0279
6	.0824	.1431	.0108	.0264	.1387	.2479	.0256	.0420
7	.0571	.1211	.0009	.0281	.1076	.2141	.0011	.0473
8	.0934	.1526	.0219	.0249	.1668	.2735	.0592	.0383
9	.1015	.1427	.0409	.0231	.1216	.1852	.0312	.0374
10	.1705	.2131	.1142	.0221	.2351	.3079	.1362	.0362
Total	.1024	.2141	.0009	.0430	.1556	.3079	.0011	.0557
								
sb	pwb	max-avg	avg-min	stdevp	mp	max-avg	avg-min	stdevp
s	p	.0337	.0489	.0191	2788	.0756	.0847	.0315
s	w	.0489	.0400	.0169	2788	.0756	.0754	.0274
s	w	.0498	.0466	.0180	2788	.0813	.0677	.0289
s	b	.0370	.0453	.0167	2624	.0735	.0771	.0293
s	b	.0441	.0462	.0174	2624	.0767	.0823	.0279
b	p	.0607	.0716	.0264	1148	.1092	.1131	.0420
b	w	.0640	.0562	.0281	1148	.1064	.1065	.0473
b	w	.0592	.0716	.0249	1148	.1067	.1076	.0383
b	b	.0412	.0606	.0231	1312	.0636	.0905	.0374
b	b	.0426	.0563	.0221	1312	.0728	.0990	.0362
								
x	x	.0481	.0543	.0213	x	.0841	.0904	.0346

There's a lot going on here.

Most relevant to our interests, the standard deviation is roughly inversely proportional to minutes played: the more minutes a player plays, the less uncertain we should be about his composite stats. That's good! It would be really disconcerting (not to mention annoying) if it was the opposite! At the same time, slight differences in minutes played show only small effects: all starters have about the same standard deviations, same for all backups.

For each stat the distribution is not quite symmetrical, the lower bound is slightly larger than the upper bound on average. The computations are clearly nonlinear so it's not surprising to have nonlinear effects here, but it's nice to know that they're relatively small: some players even saw the opposite. If we had to have different rules for different player qualities it would be annoying.

Tagging on to that point, there is no relationship between overall values in either stat and standard deviations. The shooting guard produced about a third of the Win Shares per 48 minutes of the center, but their standard deviations are almost identical. There was no a priori reason to believe this would happen but it did, which is very nice.

The backups and starters are extremely well correlated in both stats, with R^2s of .9981 and .9955 respectively. We did have an a priori reason to believe this would be the case, because the backups and starters are identical to each other, so it's good that this happened. We didn't expect perfect correlation because the whole point of this exercise was that randomness was involved, but nearly perfect is great.

Taking the average of the standard deviations for starters for each stat and doubling it gives us .0352 for Win Shares and .0580 for Wins Produced, which are pretty much .035 and .060. We can be 95% sure that for any given player with X Win Shares per 48 minutes played has a true talent level in a range of X ± .035, and Y ± .060 for Wins Produced. This is a pretty big range, about one third of the average Win Shares per 48, but nowhere near the on/off figure of ± 11 that covers almost every player in the NBA. The values for backups are 41% higher for Win Shares and 39% for Wins Produced - it's probably 40% in each case, and it's another surprise and good sign that the methodology scales in the same way for each stat. But you may be asking, why does Win Shares have so much less uncertainty? I'll tell you.

I don't know.

But it does, and now we know it does. I've always preferred Win Shares over Wins Produced anyway because having to argue about whether Magic Johnson or Ben Simmons is really a point guard is nobody's idea of a good time, but now we have another reason not to! So we've got that going for us.

Illustration: James Harden had 15 Win Shares last year, most in the league. How far down the leaderboard do we have to go before we find someone who was worse to a statistically significant degree? We take his WS/48 of .245, subtract .035*sqrt(2) because we have two independent noises, one on his value and one on whomever we are comparing him to, then turn that back into overall Win Shares and get 12. Next we take everyone else and add .035*sqrt(2) and multiply back through by their minutes, and see that the first player who doesn't get up to 12 turns out to be #22 (!!!) DeMar DeRozan. It is proven that if you ain't first you're last, but it turns out there are twenty some odd players who are indistinguishable for first place.

Illustration: But 2017 was pretty blah stats-wise. Let's look at a historic stats year like 2016, when Steph Curry had one of the all time great regular seasons with 18 WS and .318 per 48. Surely he was first to a statistically significant degree? Nope. Six other players had statistically indistinguishable years: Kevin Durant, Russell Westbrook, Kawhi Leonard, LeBron James, James Harden, Chris Paul. Granted those players are all pretty solid too, but what I'm getting at is even in a year where one player seems to stand head and shoulders above the rest it could just as easily have been noise.

And that's it! The moral of the story is if anyone says "it's X games/weeks/months into the season, it's not a small sample anymore" they're almost certainly wrong, and it's your right and duty as an American to demand they show you their calculations.

In bed.

Last Edit: Jan 12, 2018 18:49:02 GMT -6 by eric