Ranking Ruth

February 26, 2020

Source: Bigstock

I recently unfairly baited the great baseball statistics thinker Bill James into responding at vast length to one of my snarky tweets, which got me thinking about what we could learn from baseball history about how the media often gets The Narrative wrong.

The intellectual rigor of baseball discourse, a formerly overly sentimental field, has improved dramatically over my lifetime. Unfortunately, discussion of many other more important subjects, such as immigration policy, has gotten schmaltzier.

You might think, for example, that the success of new baseball stats suggests we should use “Moneyball” techniques to pick the best immigrants, the talented few whose presence will most benefit existing Americans, out of the 7 billion non-Americans, in much the way that sports teams today carefully draft the most promising young athletes.

And, indeed, after three years in office, the Trump administration has finally managed to get into effect this week a new, improved “public charge” rule to better screen out immigrants likely to wind up on welfare. But this seemingly simple concept of America trying to more intelligently select promising immigrants and keep out ill-omened immigrants has outraged and baffled the great and the good.

Baseball statistics has become a Safe Space for white guys who like to engage in acts of pattern recognition ever since a boiler-room attendant named Bill James began revolutionizing the study of baseball stats with his first self-published Baseball Abstract in 1975.

Over the past 45 years, the progress in baseball analysis has been testimony to the value of intellectual freedom. There has been a revolution among baseball intellectuals because nobody has gotten canceled for saying something deemed immoral. The old order was overthrown because, mirabile dictu, they lost a long series of arguments, and eventually jobs. Mr. James, for example, was hired by the Boston Red Sox front office, from which he recently retired after seventeen years and four world championships.

On the other hand, as with any historic change, the victors are likely to exaggerate the backwardness of the defeated. Hence, I like to tweak advocates of the triumphant advanced analytics (or “sabermetrics”) by pointing out that many of their findings about who were the best baseball players in history were anticipated generations ago by little boys. So I tweeted that the glass hadn’t been half empty, it had been half full:

Sabermetricians congratulate themselves for noticing in retrospect what fans in the stands noticed at the time. E.g., Advanced Analytics has determined that the greatest baseball player of all time was…Babe Ruth! Who was also the most popular player ever.

I went on to point out that the eight highest-achieving careers, according to the composite Wins Above Replacement (WAR) metric for estimating the best all-around players at the popular Baseball Reference website, belong to Ruth, Walter Johnson, Cy Young, Barry Bonds, Willie Mays, Ty Cobb, Hank Aaron, and Roger Clemens, all of whom were extremely famous during their careers among fans who had never heard of WAR or James’ own rival composite statistic: Win Shares.

Interestingly, James avoided creating an all-around statistic for rank-ordering all the players in baseball until 2002 due to the inherent difficulties in doing it right, long feeling that it was more useful to answer many small questions well than one big question badly.

Even today there are multiple versions of WAR that mostly but not always agree.

“There has been a revolution among baseball intellectuals because nobody has gotten canceled for saying something deemed immoral.”

So caution is advised in thinking about these new numbers. But people love using all-around statistics like WAR and Win Shares to determine who is better than whom. I’m reminded of Charles Murray’s reassurance in his new book of social-science stats Human Diversity that:

Nothing we learn will justify rank-ordering human groups from superior to inferior—the bundles of qualities that make us human are far too complicated for that.

But, judging by the popularity of these new baseball stats for ranking who is best, males love ranking. One reason so many intellectuals go berserk with rage over Murray’s publication of IQ statistics is because they recognize that, limited as IQ is, it’s the closest thing to an all-around statistic for rank-ordering that the social sciences have.

The creation of these new synthetic statistics unveiled a paradox: Most of the great careers in baseball history had already been obvious to observers. In fact, the highest-ranked player whose career has been rescued from obscurity by the newer statistics is likely workhorse pitcher Bert Blyleven at No. 40 in WAR. Meanwhile, the most famous player who doesn’t rank high in WAR is Joe DiMaggio at only No. 68.

Why don’t the new numbers show DiMaggio being as great as the man in the street, Marilyn Monroe, and Simon and Garfunkel all believed? Joltin’ Joe played in only thirteen major-league seasons instead of the twenty or more a player of his caliber could be expected to enjoy, due to a late start, war, and early retirement.

James responded to my Babe Ruth sally by tweeting:

But that’s not something that anybody, ever, claimed to have “discovered.” That was accepted truth, and we acknowledged it without controversy. What was discovered by our field was more along the lines of “we think that Larry Doby was actually more valuable than Hack Wilson.”

Doby was the Jackie Robinson of the American League, the first black in the younger circuit, who was an All-Star from 1949 to 1955, so he wasn’t totally underrated. Hack Wilson was a slugger semi-famous for setting the single-season record in 1930 with 191 runs batted in (RBIs) before drinking himself out of the National League.

But Hack’s big year was the most inflated offensive season in history, with the NL averaging .303 in 1930. Plus, the RBI statistic was traditionally overrated as a contributor to winning ballgames. (Briefly, if, say, you come to bat with three men on base and hit a home run, you are credited with four RBIs but only one run scored.) Not surprisingly, the more teammates adept at getting on base you have batting ahead of you, the more RBIs you will ring up, all else being equal.

My impression, however, is that James is underrating his movement’s accomplishment in securing Ruth’s reputation. I recollect that the condescending opinion of old-time baseball intellectuals in the 1960s was often, well, sure, ill-read fans think Ruth was the greatest because he hit a lot of vulgar home runs, but the real aficionados know that Ty Cobb’s record batting average (the most prestigious statistic) of .366 shows he was better than Ruth with his measly .342.

As late as 1974, Robert Creamer’s biography Babe needed to emphasize that the more obscure slugging average proved Ruth’s supremacy.

Today, however, 45 years into the era introduced by James, few doubt Ruth’s superiority over Cobb.

But that raises the question, why did regular fans rightly see Ruth as the greatest ballplayer way back in the statistical dark ages?

As Yogi Berra said, you can observe a lot just by watching. If you watched a lot of games, Ruth’s value at helping his team win was hard to miss.

On the other hand, if you respected the reputable baseball stats of the time, Ruth seemed meretricious. As statistics professor Andrew Gelman wrote:

Responding to a comment by some humanist type who was yammering on about how there are all sorts of truths that aren’t in the numbers, James pointed out that the alternative to good statistics is not “no statistics,” it’s bad statistics. People who argue against statistical reasoning often end up backing up their arguments with whatever numbers they have at their command….

The traditional batting average isn’t a bad statistic—it’s particularly good for tracking whether a player is streaking or slumping—but it became overly enshrined in baseball lore in the 19th century and became less relevant after Ruth introduced home run slugging as a viable strategic alternative to line drive hitting in 1919.

James then followed up with a 6,765-word response on his website titled “Three Looks at the MVPs.”

MVP stands for Most Valuable Player. It’s the top annual award, one in each league, and is voted on by two sportswriters from each baseball city, typically baseball beat reporters. Modern statistical gurus frequently point to particularly dumb choices in the MVP voting in the past to show how much more enlightened we are now. James wrote:

I tried to ask the gentleman [me] how he would explain these discrepancies, if the way that we evaluate players has not really changed. Yes, Babe Ruth was always recognized as great, and yes, Babe Ruth WAS great, but what about Don Baylor in 1979, or Jackie Jensen in 1958, to name just a couple of MVP selections from the past which are not likely to be mirrored in the present?

While a baseball player’s long-term fame is a pretty good proxy for his value, single-season judgments are often silly, especially when they are worsened by the unconscious biases and faddish groupthink of journalists.

Jensen and Baylor are examples of the old habit of handing the MVP trophy to the league leader in RBIs.

Although Jackie Jensen led the American League in RBIs in 1958 with 122, today’s sophisticated statistics show that the best player that year was instead…Mickey Mantle, the legendary Yankee superstar who was the favorite of 9-year-old boys across the country.

Why did the sportswriters vote Jensen first and Mantle fifth?

Well, for one reason, Jensen beating out Mantle made a terrific narrative. Jensen, a college football All-American at Cal, had been slated to replace Joe DiMaggio in the Yankee outfield. But he was passed up by the younger Mantle and traded in 1952. Moreover, reporters knew Jensen was playing under terrible stress because of his worsening fear of flying, which soon ended his career when the American League expanded to the West Coast in 1961.

Also, the writers had given Mantle the 1956 and 1957 MVPs when the Mick had played out of this world, so 1958 seemed like a disappointment when he returned to being merely by far the best player in the league. Plus, Mantle was only 26 and he might just deserve to win all the trophies for the next half-dozen years, which would be boring, so let’s get creative and pick somebody else! (Much the same happened to Willie Mays in the National League, who had to wait from 1954 until 1965 to win his second MVP.)

Another reason was because Mantle drove in only 97 runs that year. Was he choking in the clutch?

No, pitchers were terrified of his power (he led the American League in homers for the third time in four seasons), so they walked him 129 times, thirty more bases on balls than Jensen’s runner-up total. Mantle thus led the league with 127 runs scored versus only 83 for Jensen.

But sportswriters before the James Era had a bias in favor of players who drove in runs over players who scored runs themselves. A good example of this was the 1985 voting when two Yankees had strong claims to be the AL MVP. Leadoff batter Rickey Henderson scored 146 runs, while slugger Don Mattingly drove in 145 runs. Mattingly finished first and Henderson third in the balloting even though today’s stats say that Rickey, the greatest leadoff man in baseball history, was much the better of the two that year.

My suspicion is that reporters valued RBIs over runs scored because RBIs come more in clumps (you can drive in up to four runs in one at-bat, but you can’t score more than one run), so the RBI man is more likely to get the headline the next morning.

For instance, in 1985 Donnie Ballgame had fourteen games with three or more RBIs, while Rickey had only eight games with three or more runs scored. Henderson scored in 100 of his 143 games, while Mattingly drove in runs in 84 of his 159 games. You could say Rickey was more consistent, but “Henderson Scores in Yet Another Game” is a boring headline compared with “Mattingly Clears the Bases in the Clutch.”

The player who got the write-up the next morning for the biggest play in the game tended not to be the specialist in scoring runs, but in driving them in. In other words, much of the RBI fetish wasn’t due to politics or even ideology, it just happened to be a byproduct of how journalism works.

We should keep this lesson in mind when evaluating more serious questions as well: Much of what might seem like, depending upon whose ox is being gored, either bias in journalism or the indisputable truth is often an unintentional side effect of the structures and incentives of the news and opinion business.

James then goes on to write several thousand technical words about how the dominant WAR statistic that has emerged in the past decade as the most popular way to rank-order baseball players for awards has problems that his own Win Shares was designed to get around. He says he hopes to have time for a major reform of his all-purpose metric this year.

I won’t presume to comment on these complex matters, but I do hope my impudent tweet about Babe Ruth winds up helping inspire Mr. James to take this next step.