Jalen “Monstar” Reynolds is easily my favorite Xavier University basketball player and, in my opinion, has the greatest opportunity to grow in the coming season. They say there’s a ball he once blocked so hard that it’s still somewhere in the stratospere above his hometown of Detroit. He’s earned his nickname within an incredibly small fraction of the student population by at times displaying athleticism that seems, like the Monstars, to have been stolen from Patrick Ewing, Larry Johnson, Charles Barkley, Muggsy Bogues, and Shawn Bradley. If he keeps out of foul trouble and is able to fill part of the hole let by Isaiah Philmore then he could make a big leap.
Before doing any sort of predictive analysis, I need to know what I’m predicting. Different positions obviously have different roles and prioritize different actions on the court; Matt Stainbrook, center, isn’t kicking himself over not taking many 3’s and Dee Davis, guard, isn’t too worried about his blocking. I need to determine what stats are most important to a power forward, where Jalen plays the bulk of his minutes. I attempted to do this by creating a probit regression model, which is similar in spirit to a linear regression, except that it is used to predict binary outcomes (1 or 0). In my case, 1=being drafted during any year a player is eligible and 0=not being drafted by the time their college eligibility is used up. For my sample, I used all 2010-2011 freshmen who played significant minutes (>13) in their first year. For predictors I used a wide range of in-game stats combined. I include a variable attempting to reflect innate talent by normalizing the Rival 150 rankings of top high school seniors from when the class of players were seniors. Unranked players are given a stat of 0 while others have between 1 and 100, with 100 being the highest ranked player.
The model was very exciting in theory, but in practice, due to a combination of multicollinearity and statistical insignificance, the model is not predictive. Simply put, there is very little that a freshman basketball player does in their first year that impacts if they will make the NBA. An illustrative example is Doug McDermott, who had fairly middling stats in his early years at Creighton, but made his way to an NBA roster largely on the back of a highly efficient blockbuster of a senior year. You can also think about most ‘one-and-done’ players who would have entertained bypassing college for the NBA if they could. They then make the NBA less because of collegiate accomplishment than because of physical skills they already possessed in high school.
It’s okay, I’m not at ¯\_(ツ)_/¯ yet. In trying to find a combination of significant variables you can get a decent sense of which variables are driving the probability towards 1 by noting the largest positive coefficients across multiple iterations of the model. Combining these observations with some basic intuition about forwards in basketball gives me five stats I can use to compare Reynolds to his peers:
PPWS – Points Per Weighted Shot [PTS / (FGA + (0.475 x FTA))]
A/TO – Assist to Turnover Ratio
PPG – Points per Game
BPG – Blocks per Game
RPG – Rebounds per game
Listing blocks as more important than rebounds for forwards who are typically expected to haul in balls seemed wrong so I did some digging. The variance, how far a set of numbers are spread out, in the RPG sample is larger than that of my BPG sample, so it’s not explained by NBA caliber forwards getting more blocks than their non-NBA peers. More likely is that because I don’t have another statistic measuring defensive effectiveness other than BPG and RPG, BPG is soaking up the increased likelihood of making the NBA that more specific defensive measures would show.
There are three methods of prediction I will use to estimate what Jalen’s second season will look like. The first relies on a linear progression based on a sample of his games that has had all outliers removed, good and bad, to track improvement over the season. The second will be another linear regression based only the games he played the most in and preseason games, this will be the statistical equivalent of hopeful thinking. The final method will incorporate historical data on power forwards from 2011-2013 who had similar first years as Reynolds and apply their average progress to him, forming a sort of bootleg Bayesian prior.
METHOD 1
To start off, here’s Jalen’s actual stat-line using my prioritized forward stats
1. Outliers in basketball stats looks like a player performing uncharacteristically well or poorly. I calculated outliers using a set of stats that discludes games with 0 minutes played. The first method only turned up as an outlier the game against St. John’s where he scored a double-double, though that was also his season high for minutes and shots attempted. While on a per game basis it’s a pretty harsh outlier my intuition tells me that per minute it’s probably not that bad. This is backed up by a PPWS outside of his top 5 PPWSs for the year, meaning that even with higher shooting volume he still is more or less as effective at getting buckets as he is in games where he doesn’t get as many minutes.
2013-2014 stats adjusted for outliers
And now using the updated set of games stats I create five linear regression equations that predict the change in each stat as Jalen gets more experience. X is the number of games played and Y is the statistic being predicted.
PPWS | y=0.006x+.8399 |
AST | y=.0093x+.0774 |
TO | y=-.0048x+.6258 |
PPG | y=.0265x+2.9563 |
BPG | y=-.0109x+.6581 |
RPG | y=.0478+2.692 |
While the negative sign on the unit change of turnovers makes intuitive sense, the negative sign on blocks may not. The two ways to explain decreasing blocks with experience is either as rebounds increase and defensive positioning adjusts to prioritize rebounds, blocks decrease, or it shows model error ¯\_(ツ)_/¯. Either way here is his theoretical statline for next season
This shows somewhat minor progress and is probably not indicative of a player that will be able to consistently step into starter’s minutes. Who knows, maybe in this alternate timeline he develops his J and becomes a threat on both sides of the pick n pop, but probably not. Luckily, this is the most conservative of a my prediction methods.
METHOD 2
Two of the biggest pitfalls of this first prediction method is that it places equal weight on games where a player is asked to come off the bench for short stints and also that linear progress doesn’t intuitively jive when we know players don’t come into the preseason exactly as they left the post-season. My second method attempts to correct for both of these by removing from my sample any game where the player played less than 8 minutes and also factoring the four preseason games Xavier played in Brazil less than a month ago.
Here’s Jalen during Xavier trip to Brazil where he averaged a double-double
And here’s what his adjusted statline looks like using games where he played >= 15mpg and preseason
This method is definitely more optimistic than the first method. But one call-out is even if Jalen remains as efficient getting little playing time as he does when he gets more time there’s a causality dilemma here – Does a player play better because they have more time to get adjusted to the game or are they getting more time because they are playing better – that hinders the predictive ability of this method. That being said here are the new linear regression equations for each stat.
PPWS | y=-.0293x+1.4771 |
A | y=0.3529 |
TO | y=.0049x+.8971 |
PPG | y=.424x+4.0662 |
BPG | y=-.0098x+1.0294 |
RPG | y=..2721x+4.7279 |
Interestingly, the decreasing PPWS and increasing PPG means ya theoretical boy is getting buckets, but with abysmal efficiency. This is a man redefining ‘volume shooter’. Here’s the predicted statline based on these.
For comparison here’s Julius Randle, the first power forward selected in the 2014 NBA draft
AW FRICK YEAH
METHOD 3
Using linear regressions to predict future performance has some significant issues. One of the biggest is that basketball players historically don’t get sequentially statistically better with each game and they don’t pick up exactly where they left off the previous season. There’s a huge difference between the JaMarcus Russell off-season plan and the Giannis Antetokounmpo plan.
In this method I use historical data on freshmen forwards from the 10/11-12/13 seasons who averaged 15 or more minutes per game, meaning I again use a truncated set of Jalen Reynold’s games. Since my data set of historical freshman stats only includes player with >13.5 MPG, I use a subset of Jalen’s stats from games where he played >= 14 minutes. I identify the top ten players most similar to Jalen using the top five stats I’ve been following throughout this analysis. Ideally I would have something like a similarity score, but for my purposes generally eyeballing it isn’t too difficult. For reference, the most similar player I found is probably Rico Gathers who played his freshmen year in the 12-13 season for the Baylors Bears.
I use this set of similar players to calculate how much, on average, a player with stats like Jalen Reynolds changes from this first to their second year of college ball. Below is the change in each stat from the player’s first to second year.
RPG | 45% |
BPG | 46% |
A/TO | 25% |
PPWS | -2% |
PPG | 64% |
There are of course some players that got better than these averages suggest, and others that don’t get much better or even have their flaws exposed by increased playing time. Within the set of players Jalen’s ceiling may look something like Kansas’ Perry Ellis or Louisville’s Montrezl Harrell. Based on the average change in stats here’s what this method predicts for Jalen this year.
These numbers fall nicely between the previous two prediction methods, except for PPWS. According to my set of games where Jalen got significant minutes he’s very efficient, 1.2 puts him easily in the top 20% of his peers in that regard. They also look like the stats he put up during the preseason in Brazil, but adjusted for quality of play. It’s important to note that while this isn’t as blatantly optimistic as the second method, this stat-line still reflects a player who would be making a HUGE impact playing alongside a healthy Matt Stainbrook. Having two big men capable of getting the double-double any given night is not a bad formula for success.
SUMMARY
Predicting the success of a player on only one season of data is very difficult and may well be a fool’s errand based on the number of unsuccessful formulations of probit regression models I attempted. Predicting subsequent year ability may not be much easier but it at least requires two areas of understanding 1. how good was the player in the previous year, and 2. what kind of improvement will they experience over the summer.
My analysis answers the first problem by dividing Jalen Reynolds’ season up a number of ways to grasp what kind of game could truly be expected of him on average at the end of the season. For the second problem this analysis used linear gains and historical averages as stand ins for whatever improvements have been made.
Finally, here are the three predicted statlines stacked for comparison
1.
2.
3.
Want my data? You’re going to have to email me at murraym6@xavier.edu, there’s a lot of it. God help your soul.