A New Test of the Moneyball Hypothesis
### Abstract
It is our intention to show that Major League Baseball (MLB) general managers, caught in tradition, reward hitters in a manner not reflecting the relative importance of two measures of producing offense: on-base percentage and slugging percentage. In particular, slugging is overcompensated relative to its contribution to scoring runs. This causes an inefficiency in run production as runs (and wins) could be produced at a lower cost. We first estimate a team run production model to determine the run production weights of team on-base percentage and team slugging. Next we estimate a player salary model to determine the individual salary weights given to these same two statistics. By tying these two sets of results together we find that slugging is overcompensated relative to on-base percentage, i.e., sluggers are paid more than they are worth in terms of contributing to team runs. These results suggest that, if run production is your objective, as you acquire talent for team rosters more attention should be paid to players with high on-base percentage and less attention to players with high slugging percentage.
**Key words:** Moneyball, strategy, quantitative analysis, economics
### Introduction
It is our intention to show the Major League Baseball (MLB) general managers did not immediately embrace the new statistical methods for choosing players and strategies that are revealed in the 2003 Michael Lewis Moneyball book. In particular we will show that three years after the Moneyball publication, a player’s on-base percentage is still undercompensated relative to slugging in its contribution to scoring runs. This contradicts a study by two economists (3) who claim Moneyball’s innovations were diffused throughout MLB only one season after the book’s publication.
#### Background
In the 2003 publication of _Moneyball_, Michael Lewis (4) describes the journey of a small-market team, the Oakland Athletics, and their unorthodox general manager, Billy Beane. This team was remarkable in its ability to attain high winning percentages in the American League despite the low payroll that comes with the territory of being a small-market team. Lewis followed the team around to discover how they managed to utilize its resources more efficiently than any other MLB team. Moneyball practice included the use of statistical analysis for acquiring players and for evaluating strategies in a way that was allegedly not recognized prior to 2003 by baseball players, coaches, managers, and fans. Central to this statistical analysis is determining the relative importance of on-base percentage versus slugging percentage. By buying more undervalued inputs of on-base percentage, Billy Beane could put together a roster of hitters that would lead them to more wins on the field while still meeting its modest payroll. Although there are many other aspects of Moneyball techniques discussed in the book (e.g. scouting, drafting players, and game strategy), in this paper we will focus on whether a team can increase its on-field performance for a given budget by sacrificing some more expensive slugging performance for more, but less expensive, on-base performance. This is what we will call the Moneyball test: efficiency in the use of resources requires the equality of productivity per dollar for on-base percentage versus slugging percentage.
Hakes and Sauer (3) were the first researchers to use regression analysis to demonstrate at the MLB level just what Beane and Lewis had suggested: 1) slugging and on-base-percentage (more so than batting average) are extremely predictive in producing wins for a team, 2) players before the current Moneyball era (beginning around 2003) were not paid in relation to the contribution of these performances. In particular, on-base percentage was underpaid relative to its value. They used four statistics to predict team wins: own-team on-base percentage, opposing-team on-base percentage, own-team slugging percentage, and opposing-team slugging percentage. The regression coefficients for the team on-base percentage and slugging percentage assign the weight each factor has in determining team wins. A second regression for player salaries assigns a dollar value to each unit of a hitter’s on-base percentage (OBP) and slugging percentage (SLUG). The following statistics were used in player salary equation: OBP, SLUG, fielding position, arbitration and free agent status, and years of MLB experience. They estimated salary models each year for the four MLB seasons prior to the release of the _Moneyball_, and the first season after. The regression coefficients of OBP and SLUG assign the weight each factor has on player salary. By comparing the salary costs of OBP versus SLUG with the effect each factor has on wins the authors determined whether teams are undervaluing OBP relative to SLUG. Their results showed that in the years before the _Moneyball_ book, managers/owners undervalued on-base percentage in comparison to slugging average. In other words, a team could improve its winning percent by trading some SLUG inputs for an equivalent spending on OBP inputs. However, the year after the publication of the _Moneyball_ book, Hakes and Sauer report that on-base percentage was suddenly no longer under-compensated. A team could no longer exploit the higher win productivity per dollar of OBP because now the ratio of win productivity to cost was the same for both OBP and SLUG factors. They concluded that this aspect of Moneyball analysis was diffused throughout MLB.
The speed of this diffusion is surprising, and it does raise questions as to their methodology. For example, what if this test of the Moneyball hypothesis is misdirected? Hitters are paid to produce runs, not wins. A mis-specified statistical model can lead to erroneous conclusions. In this paper we propose a more direct test of the Moneyball hypothesis: comparing the run productivity per dollar of cost for both OBP and SLUG factors. In other words, will an equivalent dollar swap for a small increment of slugging percentage in return for a small increment of on-base percentage lead to the same increase in runs scored? If this is not the case, then a team can exploit this difference and score more runs for the same team payroll by acquiring more units of OBP in place of SLUG units. On the other hand, if the ratios are equal, MLB is in equilibrium with respect to the run productivity for the last additional units of OBP and SLUG.
### Methods
This study differs from Hakes and Sauer in three ways: 1) the focus is on run production rather than win production, 2) the designated hitter difference between the National League and the American League will be controlled, and 3) more recent data from the MLB website is used.
#### Team Run Production Model
An MLB general manager should attempt to gain the most effective combination of the on-base and slugging attributes given the amount of money the MLB team is able to spend. This will maximize the team’s run production subject to its budget constraint. The run production model on a team basis will be of the form:
RPSit = β1 + β2OBPit + β3SLGit + β4NL + eit
– RPSit = **number of runs produced by team i in season t.** This takes the total number of runs by each team for the 162 games in a season. If fewer than 162 games are played, this number is adjusted to make it equivalent to a 162 games season.
– OBPit = **on-base percentage of team i in season t.** This is found by taking the total number times the hitters reached base (or hit a homerun) on a hit, walk, or hit batsman and dividing this by the number of plate appearances (including walks and hit batsmen) for the season. This proportion is then multiplied by 1,000 in order to make it more relatable. For example, a team that reached base 350 times per one thousand plate appearances would have a 350 “on-base percentage.”
– SLGit = **slugging percentage of team i in season t.** This is the number of bases (single, double, triple, or home run) that a team achieves in a season divided by the number of at bats (excluding walks and hit batsmen). This proportion is multiplied by 1,000 in order to make it more relatable. For example, a team that achieved 175 singles, 40 doubles, 5 triples and 35 homeruns per 1000 at bats would have 410 bases per 1000 at bats and therefore a 410 “slugging percentage.”
– NLi = **dummy variable = 1 if team i is in the National League, 0 otherwise.** The American League and National League do not have exactly the same set of game rules. One difference is the American League Designated Hitter rule that allows a non-fielding hitter to bat for the pitcher.
– eit = **random error for team i in season t.** This component allows for the fact that runs produced cannot be perfectly predicted using the above variables.
#### Player Salary Model
The second regression will show how much each of the two statistics, on-base percentage and slugging percentage for individual players, is rewarded by team management for their proficiency in each category. Position dummies were employed but only the catcher and the shortstop had statistically significant increases in pay due to their contributions to fielding. The other dummy variables for position were dropped. The other factor that is included is player experience as measured by lifetime MLB game appearances. The experience factor will appear in quadratic form to allow for diminishing returns toward the end of the player’s career. This model follows the economic literature on salary models starting with Mincer (1974):
Mj = β1 + β2Gj + β3G2j + β4OBPj + β5SLGj + β6CTj + β7SSj + ei
– Mj = **salary of player j.** 2006 MLB salary in thousands of dollars.
– Gj = **MLB career games played by player j.** This measures the improvement in a player due to experience.
– Gj2 = **MLB career games squared.** In conjunction with G, a negative coefficient for G2. This will allow for a diminishing rate of improvement as more and more experience is achieved, and will even permit a decline in performance at the end of a player’s career.
– OBPj = **on-base percentage of the player.** This is compiled as an average of the 3 MLB seasons prior to the beginning of the season in which the player’s salary is put into effect (2003-2005).
– SLGj = **slugging percentage of the player.** This is compiled as an average of the 3 MLB seasons prior to the beginning of the season in which the player’s salary is put into effect (2003-2005).
– CTj = **dummy variable = 1 if the player is a catcher, 0 otherwise.** This variable is included to see if any special value is attributed to this fielding skill position.
– SSj = **1 if the player is a shortstop, 0 otherwise.** This variable is included to see if any special value is attributed to this fielding skill position.
– NLi = **dummy variable = 1 if player j is in the National League, 0 otherwise.**
– ei = **random error.** This component allows for the fact that player salaries produced cannot be perfectly predicted using the above variables.
#### Sample Selection
For the team run production, five seasons of data (2002-2006) are collected for each of the MLB teams, for a total sample size of 150 observations. Descriptive statistics for five years of 16 National League teams and 14 American League teams are given in Table 1. The mean runs scored per team during this time period is 765 per season, or 4.7 per game. The standard deviation is 76 runs, which is saying that from one team to the next the typical difference in runs per season is 76 or about 0.5 runs per game. Of particular note are the means and standard deviations of on-base percentage and slugging percentage. The mean team OBP is 334, with a typical change from one team to another of 12. For SLUG the mean is 423 and the standard deviation is 23.5.
Batting statistics from players are averaged over the course of the last three MLB seasons in order to match recent performance and salary more closely. To be selected as a player in the salary regression, the athlete must play in at least two of the last three MLB seasons (2003-2005) and play in at least 100 games each season. Another important restriction was that all players in the sample needed to have played at least six seasons at the Major League level. Before six seasons, MLB players are unable to become free agents, a very important concern for their salary. As free agents, players are permitted to seek employment from any team, commonly resulting in competitive bidding for the player’s services and a free market determination of wages. With this we have our sample of 154 hitters (free agent eligible starting players). The 2006 salaries of players and their three year MLB performance averages (prior to 2006) are given in Table 2. The highest salary in the sample is $25,681,000 and the lowest is $400,000. The mean salary is $6.2 million with a standard deviation from one player to the next of $4.89 million. The mean OBP for the players is 347, with a typical change of 34 from one player to the next. The average SLUG is 450 with a standard deviation of 65.5.
### Results and Discussion
#### Team Run Production Model
Applying ordinary least squares, the following team runs regression was estimated for the five seasons:
RPS = -908 + 2.85 OBP + 1.74 SLG – 23.0 NL + e
In Table 3 the more statistical details for the above equation (Model 1) and other versions of the run production model are shown. Model 1 is the one used in the Moneyball hypothesis, and it explains 92 percent of the variance in team runs scored. This verifies that team OBP and SLUG are extremely predictive of team runs scored. It should also be noted that the runs scored equation fit is better than the one Hakes and Sauer have for their winning equation. Model 2 drops the dummy for the National League and Model 3 adds interaction terms of NL with OBP and SLG. The differences from the first model are small. This sensitivity analysis confirms that Model 1 is the most appropriate.
We will now interpret each slope coefficient in Model 1, holding the other included factors constant. A 10 unit change in team OBP (e.g., going from 330 to 340), brings an additional 10(2.85) =28.5 team runs scored per season, on the average. A 10 unit change in SLUG brings a 10(1.74) = 17.4 more runs, on the average. Each regression coefficient, including the one for NL, is statistically significant at a 1% level. This identifies the relative importance of each hitting factor. For an incremental 10 unit change, getting on base more frequently has a bigger impact on scoring runs than getting more bases per hit. What is needed now is a determination of what these factors cost the team in salary.
#### Player Salary Model
Applying ordinary least squares the following player salary regression was estimated for the 156 starting free agent players in 2006:
SAL = -30164 + 10.28 G – 0.00321 G2 + 37.05 OBP + 36.98 SLG + 1748.1 CT + 2024.87 SS – 876.96 NL + e
In Table 4 the more statistical details for the above equation (Model 4) and other versions of the player salary model are shown. In Model 4 we see the estimated coefficients from the player salary model—the one used in the subsequent test for the Moneyball hypothesis. This model explains 55% of the variance in salaries, roughly the same as the salary equations for Hakes and Sauer. In Model 5 the NL dummy is removed, and in Model 6 the position dummies are removed. There were only small changes in the remaining coefficients compared to Model 4. This sensitivity analysis confirms that Model 4 is the most appropriate.
We will now interpret each slope coefficient of Model 4, holding the other included factors constant. A 10 unit change player’s OBP for increases 2006 salary on average by 37.05(10) = 370.5 ($370,500), and 10 unit increase in a player’s SLUG increases salary on average by 36.98(10) = 369.8 ($369,800). The coefficients for G and G2 show that experience increases salary at a decreasing rate. Both the catchers and shortstops earn higher salaries, holding OBP and SLUG constant, than the other fielding positions. The experience and hitting coefficients are statistically significant at a 1% level. The position dummies are statistically significant at a 5% level. The NL dummy is statistically significant at a 10% level.
#### The Moneyball Hypothesis
In the _Moneyball_ book small market teams like the Oakland Athletics can compete against larger market teams if they can acquire run production factors that provide more runs per dollar spent. This occurs when OBP is undervalued relative to SLUG. To see if this is the case in 2006 we will compare the two main models (Models 1 and 4). A 10 unit increase in team OBP is brings an additional 28.5 runs and a 10 unit increase in team SLUG yields an additional 17.4 runs. The salary equation reveals that a 10 unit increase in individual OBP costs $370,500, and a 10 unit increase in individual SLUG costs $369,800. At essentially the same increase in team salary (at the player level) an increase in OBP brings in 11.1 more runs than SLUG. This means that teams can achieve a higher run production at essentially the same cost by swapping 10 units of SLUG for 10 units of OBP. The ratio of run production to cost favors OBP. The Moneyball hypothesis of slugging percentage being overvalued relative to on-base percentage remains in effect three seasons after the _Moneyball_ book.
Why did our results differ from Hakes and Sauer, who argue that slugging was no longer overvalued one season after the _Moneyball_ book? We repeat our differences in methodology here: 1) using a run production model instead of a winning production model because players are paid to produce runs, not wins; 2) including a variable to differentiate the National League from the American League; and 3) using more recent data.
### Conclusions
In this paper we propose a new test of the Moneyball hypothesis using team run production in place of team wins. We clearly show that in producing runs baseball managers continue to overpay for slugging versus on-base percentage. In the 2006 MLB season, for the same payroll, a team could generate more runs by trading some SLUG for OBP. The question is, why don’t general managers recognize these results in their roster and payroll decisions? We propose several possible reasons:
1. Only small revenue market teams need to be efficient in their labor decisions.
2. Sluggers are paid for more than just their ability to score runs.
3. Moneyball techniques will take time before all teams adopt them.
Each of these answers will now be discussed. Large-revenue market teams are profligate partly in response to the pressure they feel by the fan base to produce a winner at whatever cost. By acquiring well-known free agents at high cost rather than bargain free agents who are not recognized by home fans seems a safe way to operate, even if it cuts into some profits. These well-known players tend to be the sluggers. The second reason for slugger overcompensation is that they are crowd-pleasers, and it may be more profitable (higher gate attendance and television viewership) to have more homerun hitters. This study does not attempt to measure this alternative hypothesis. Finally, Hakes and Sauer believed equilibrium between OBP and SLUG in the player market occurred in just one year after the _Moneyball_ book was published, but it is doubtful such innovation can spread throughout MLB so quickly.
> “Given the A’s success, why hasn’t a scientific approach come to dominate baseball? The answer, of course, is the existence of a deeply entrenched way of thinking….Generally accepted practices have been developed over one-and-a-half centuries, practices that are based on experience rather than analytical rigor.” (1, p. 80)
The behavioral patterns in MLB change slowly. For example, it took twelve years after Jackie Robinson joined the Brooklyn Dodgers before every team in MLB acquired African-American players on their roster, despite the large pool of talent in the Negro Leagues. The slow pace of diffusion can also be claimed for the more recent immigration of Asian players in MLB. And more to the point, batting average still receives more attention than on-base percentage in the evaluation of talent.
Finally, the adoption of Moneyball is not limited to baseball. General managers in hockey (6), basketball (8), football (5), and soccer (2) are beginning to see the same advantages in using statistical analysis to supplement or replace conventional wisdom in making decisions on personnel and strategy. Despite the Oakland Athletics’ more recent lackluster performance, Moneyball is here to stay.
### Applications in Sport
The increased use of quantitative analysis in the coaching and management of sports teams allows colleges and professional teams to make decisions based more on data driven results rather than merely tradition. “Moneyball” is often the term used to convey this decision-making apparatus, particularly when money resources, if allocated efficiently, can improve on-field performance (scoring, wins) on a limited budget.
The advantage of adopting Moneyball techniques before your rival teams may be short term, however, widespread adoption eliminate opportunities (e.g., acquisition of under-rated players) that are not also seen by other teams in your sport. But this study shows that the diffusion of Moneyball techniques is taking place slowly, creating advantages for managers who are open to this approach.
### References
1. Boyd, E. A. (2004). Math works in the real world: (You just have to prove it again and again). Operations Research/Management Science, 31(6), 81.
2. Carlisle, J. (2008). Beane brings moneyball approach to MLS. ESPNsoccer. Retrieved from <http://soccernet.espn.go.com/columns/story?id=495270&cc=5901>
3. Hakes, J. K., and R. D. Sauer (2006). An economic evaluation of the moneyball hypothesis. Journal of Economic Perspectives, 20, 173-185.
4. Lewis, M. (2003). Moneyball: the art of winning an unfair game. New York: W.W. Norton & Company.
5. Lewis, M. (2008) The blind side. New York: W.W. Norton & Company.
6. Mason, D. S. and W. M. Foster (2007). Putting moneyball on ice? International Journal of Sport Finance, 2, 206-213.
7. Mincer, J. (1974). Schooling, experience, and earnings. New York: Columbia University Press.
8. Ostfield, A. J. (2006). The moneyball approach: basketball and the business side of sport. Human Resource Management, 45, 36-38.
### Tables
#### Table 1. Descriptive Statistics for the Team Run Production Sample
RPS | OBP | SLG | NL | |
---|---|---|---|---|
Mean | 765.04 | 332.927 | 423.27 | 0.53 |
Median | 760.34 | 332.000 | 423.00 | 1 |
Standard Deviation | 76.43 | 12.168 | 23.52 | 0.50 |
Range | 387.00 | 63.000 | 123.00 | 1 |
Minimum | 574.00 | 300.000 | 368.00 | 0 |
Maximum | 961.00 | 363.000 | 491.00 | 1 |
Count | 150 | 150 | 150 | 150 |
#### Table 2. Descriptive Statistics for the Player Salary Sample
G | OBP | SLG | NL | CT | SS | |
---|---|---|---|---|---|---|
Mean | 1146.1 | 347.3 | 450.0 | 0.552 | 0.130 | 0.12 |
Median | 1070.5 | 346.5 | 446.5 | 1 | 0 | 0 |
Standard Deviation | 462.1 | 34.0 | 65.5 | 0.499 | 0.337 | 0.322 |
Range | 2345.0 | 237.9 | 432.0 | 1 | 1 | 1 |
Minimum | 385.0 | 276.1 | 310.7 | 0 | 0 | 0 |
Maximum | 2730.0 | 514.0 | 742.7 | 1 | 1 | 1 |
Count | 154 | 154 | 154 | 154 | 154 | 154 |
#### Table 3. Coefficients for the Team Run Production Models
MODEL 1 | MODEL 2 | MODEL 3 | ||||
---|---|---|---|---|---|---|
Variable | Coefficient s | t Stat | Coefficient s | t Stat | Coefficient s | t Stat |
Intercept | -908.00*** | -17.16 | 941.72*** | -18.46 | -861.67*** | -13.73 |
OBP | 2.85*** | 11.21 | 2.69*** | 13.22 | 2.86*** | 10.26 |
SLG | 1.74*** | 15.42 | 1.92*** | 15.30 | 1.62*** | 10.37 |
NL | -23.00*** | -6.26 | -134.3* | -1.34 | ||
(NL)(OBP) | 0.275 | 1.06 | ||||
(NL)(OBP) | 0.241 | 0.06 | ||||
Adj. R-Squared | 0.921 | 0.900 | 0.923 | |||
F | 568.9 | 661.6 | 343.3 |
*** .01 level ** .05 level * .10 level
#### Table 4. Coefficients for the Player Salary Models
MODEL 1 | MODEL 2 | MODEL 3 | ||||
---|---|---|---|---|---|---|
Variable | Coefficient s | t Stat | Coefficient s | t Stat | Coefficient s | t Stat |
Intercept | -30164*** | -9.38 | -30952*** | -9.67 | -27182.6*** | -8.73 |
G | 10.28*** | 4.21 | 10.24*** | 4.18 | 9.75*** | 3.95 |
G2 | -0.00321*** | -3.67 | -0.00323*** | -3.68 | -0.00304*** | -3.42 |
OBP | 37.05*** | 3.32 | 38.08*** | 3.39 | 35.30*** | 3.10 |
SLG | 36.98*** | 6.47 | 37.01*** | 6.43 | 33.58*** | 5.88 |
CT | 1748.10** | 2.14 | 1798.21** | 2.19 | ||
SS | 2024.87** | 2.34 | 2048.73** | 2.35 | ||
NL | -876.96* | -1.65 | -929.14* | -1.71 | ||
Adj. R-Squared | 0.557 | 0.552 | 0.532 | |||
F | 28.48 | 32.39 | 44.44 |
*** .01 level ** .05 level * .10 level
### Corresponding Author
#### Thomas H. Bruggink, Ph.D.
Department of Economics
Lafayette College
Easton PA 18042
<bruggint@lafayette.edu>
610-330-5305
### All Authors
#### Anthony Farrar
Brinker Capital
Berwyn, PA
#### Thomas H. Bruggink
Lafayette College
Easton, PA