Statistical Toolbox: Pitching

Credit: Chris Humphreys-USA TODAY Sports

We’re more than half way there. Two weeks ago we covered a variety of useful hitting statistics, and last week we addressed defense, which remains the most difficult part of the game to measure (at least for now). Before concluding with a look at attempts to quantify value on a macro and micro scale next week, we’ll delve into pitching metrics today. I think the best way to review the pitching portion of our statistical toolbox is to break up the statistics into four distinct families. Still, we have to remind ourselves that these metrics, like in batting, interact with and influence one another.

The Classical Family

The Questions: on average, how many runs has a pitcher given up per nine innings? Similarly, how many base runners has a pitcher allowed per inning?

The Statistics: Earned Run Average (ERA), calculated by dividing earned runs allowed by innings pitched, and multiplying that number by nine; and Walks and Hits per Innings Pitched (WHIP), formulated by adding walks and hits and dividing that number by total innings pitched.

What they do and do not tell us: ERA and WHIP are past perfect statistics. ERA tells us, with exact precision, how well a pitcher has done in terms of giving up runs per nine innings. What it does not do is account for the circumstances of those runs or predict future performance. ERA does not consider the defense playing behind a pitcher, which affects the runs he might give up. The metric is dependent upon not only defense, but the whims of the scorekeeper, as runs allowed on errors are not counted. WHIP likewise tells us what a pitcher has done in the past; it is not very good at predicting how a pitcher will do in the future because, again, defense plays a central role in preventing hits. Finally, WHIP is a strange statistic that has lingered despite the fact that it was developed solely to provide another pitching category for fantasy baseball. Its value is quite literally fanciful.

Credit: Christopher Hanewinckel-USA TODAY Sports

In context: ERA equalizes outs in a fashion similar to the way AVG equalizes hits. They are not created equal. Imagine a three up, three down, inning that unfolds in two different scenarios, the difference only being the manner in which the batters got out. Let’s say that Tyler Chatwood is on the mound. In the first instance, Chatwood strikes out Carl Crawford, Yasiel Puig, and Adrian Gonzalez—an excellent inning. In the second case, Crawford leads off with a slow roller to Nolan Arenado, who bare-hands it and throws Crawford out. Then, Puig hits a line drive in between third base and shortstop, but Troy Tulowitzki makes a diving catch for the second out. Finally, Gonzalez hits a line drive to right-center field, but the speedy Drew Stubbs tracks it down for out number three. Both innings produced the same result: three outs. But in the second scenario, the Rockies defense greatly assisted in the getting the outs. Now let’s say that in this given inning Nolan Arenado had the day off, and he was replaced by Josh Rutledge; Tulowitzki took a day off in an otherwise healthy and productive season (obviously), so DJ LaMahieu was at shortstop; Stubbs (obviously) sat because a right-hander started for the Dodgers, so Charlie Blackmon was in center field. The defensive portion of our statistical toolbox taught us that poor defense is not always corrected by an error, which destabilizes the “earned” portion of ERA. The overall defensive downgrade in this hypothetical might result in hits and runs, thus inflating Chatwood’s ERA as well as his WHIP.

So ERA and WHIP tell us precisely what a pitcher has done in the past, but that does not translate to how well a pitcher will do in the future. This is because variables outside of the pitcher’s control contribute to both statistics. But there is still value in knowing what a pitcher has done, even if the metric is flawed. A good pitcher is highly unlikely to have a 5.00 ERA due to just a lack of control on batted balls, and terrible pitchers don’t usually have sub-3.00 ERAs. Finally, ERA does not look the same depending on a pitcher’s home park. Rockies pitchers, for example, simply have to eat a higher ERA because of Coors Field, but that does not necessarily mean they are poorer pitchers. Talent remains most important. So ERA and WHIP should be used, but not in isolation, and not without considering outside factors.

Because I’m running out of sensory adjectives, we’ll retread our tables en route to describing a more satisfying baseball experience. Objections welcome:

Metric	Stunning	Beautiful	Attractive	Quotidian	Drab	Unseemly
ERA	below 2.90	2.90-3.25	3.25-3.75	3.75-4.15	4.15-4.75	above 4.75
WHIP	below 1.00	1.00-1.15	1.15-1.30	1.30-1.40	1.40-1.50	above 1.60

The DIPS Family

The Question: acknowledging the flaws of ERA, how can we find out what a pitcher’s ERA should have been, and what it will look like in the future?

The Statistics: opponent Batting Average on Balls in Play (BABIP), Fielding Independent Pitching (FIP and xFIP), and Skill Interactive ERA (SIERA)

What they do and do not tell us: the theoretical foundation of these metrics is Defense Independent Pitching Statistics (DIPS), developed by baseball analyst and possible Harry Potter villain Vörös McCracken (and you thought I was being facetious when I said these statistics were products of the mystical). In answering the above question, McCracken determined that “there is little if any difference among major league pitchers in their ability to prevent hits on balls hit in the field of play.” That should sound familiar, as should the measure deployed to find out what happens to those balls in play: BABIP. BABIP can be used to adjust for ERA in the same way it adjusts a batter’s average. If hitters bat .350 on balls in play against a pitcher, it will likely inflate the pitcher’s ERA—and for reasons outside of the pitcher’s control. So if a pitcher’s BABIP is either extremely higher or lower than league average, which is around .300, then his ERA and WHIP likely also saw change.

But that still looks at the past. FIP is the pivot to the future because it focuses on outcomes that a pitcher can control (home runs, walks, and strikeouts), under the assumption that the pitcher will continue to be able to control those things in a similar fashion. We can call it future imperfect. The metric assumes a league average BABIP, weights giving up a home run as worse than giving up a walk, and credits the pitcher for strikeouts. The result is what a pitcher’s ERA should have looked like. Another way to think about the FIP is to consider the average amount of strikeouts (K/9 innings), walks (BB/9), and home runs (HR/9) a pitcher gives up on average. The complement to FIP, xFIP, was developed by smart person (but non-magically named) Dave Studeman of The Hardball Times. The principle is the same, except that it estimates how many home runs the pitcher in question should have given up based on league average home run to fly ball ratio (HR/FB): eXpectedFIP. The difference between the FIPS and ERA is that while ERA is poor at predicting the future but precise in narrating the past, the FIPs predict the future pretty well but are not very good at measuring the present. They are subject to severe fluctuation in small samples. So if we want to project how well Tyler Chatwood will do in 2014 after a couple of starts, it’s better to look at his 2013 FIP and xFIP rather than his FIP and xFIP through two starts.

Another statistic, Skill Interactive ERA (SIERA), takes into account balls in play (as opposed to the FIPs), but it does not equalize outs (as opposed to ERA). It’s designed as an estimator, but its predictive value is as good as the FIPS. Not only that, but it stabilizes more quickly, so it can estimate what a pitcher’s ERA should be after just a couple of starts.

In context: The great thing about FIP, xFIP, and SIERA, is that they are scaled like ERA. So a putrid ERA would also be a putrid FIP:

Metric	Ambrosial	Perfumed	Fragrant	Odoriferous	Malodorous	Putrid
ERA	below 2.90	2.90-3.25	3.25-3.75	3.75-4.15	4.15-4.75	above 4.75
FIP	below 2.90	2.90-3.25	3.25-3.75	3.75-4.15	4.15-4.75	above 4.75
xFIP	below 2.90	2.90-3.25	3.25-3.75	3.75-4.15	4.15-4.75	above 4.75
SIERA	below 2.90	2.90-3.25	3.25-3.75	3.75-4.15	4.15-4.75	above 4.75

Thanks, magicians! Keep the caveats of scaling in mind though—just like we observed that wOBA looks like OBP, and on the defensive side DRS looks like UZR, they aren’t equivalents. The distinction is even starker for pitching, as FIP, xFIP, and SIERA are designed to correct the failings of ERA, not reproduce the number. Still, let’s appreciate how comprehensible they are. Not only are they that, but these fine, magical, people have provided us with statistics that weigh ERA, FIP, and xFIP according to league average. Remember the way to read OPS+ and wRC+: 100 is league average, so a wRC+ of 105 is five percent better than league average, and 95 is five percent below. The pitching equivalents are ERA-, FIP-, and xFIP- (of FanGraphs stock). According to immortal baseball logic, lower numbers for pitchers are good, whereas higher numbers are bad. So the true statements that Jhoulys Chacin’s 2013 ERA- was 81 and his FIP- was 80 means that he had a very good year. Another way to say that is that his ERA was 19 percent better than league average, and that his FIP was 20 percent better.

The Rate Family

The Question: if a statistic like SIERA accounts for balls in play, how does it avoid the caveats of BABIP, as well as the idiosyncrasies of pitchers?

The Statistics: batted ball data—line drive rate (LD%), fly ball rate (FB%), ground ball rate (GB%)—and home run to fly ball ratio (HR/FB)

What they do and do not tell us: batted ball data is self-explanatory, as it measures the type of balls in play allowed by a pitcher. However, the numbers don’t tell us anything unless they are attached to a name and repertoire. Pitchers tend to be lumped into two groups: ground ball pitchers and fly ball pitchers. “Line drive” pitchers don’t really exist, but if they do they are instead called “sucky pitchers,” because line-drives quickly turn into runs. It’s critical to know the style of a pitcher before judging the ERA coterie and adjusting for BABIP. Ground ball pitchers, the type of pitchers the Rockies like to have, tend to have higher BABIPs because ground balls more frequently go for hits. However, they generally allow fewer extra base hits. That’s the bargain ground ball pitchers and their employers make. Fly ball pitchers, on the other hand, allow fewer hits and have lower BABIPs against because fly balls land for outs more often than ground balls. The fly ball pitcher’s Faustian pact is that fly balls also sometimes travel a little too far and land over the fence instead of snugly in the welcoming glove of a waiting outfielder. So batted ball data is useful as long as you know the tendencies of the pitcher. You’ll find the averages in the following table, but remember that ground ball pitchers generally induce ground balls at a rate over 50 percent, and fly ball pitchers induce fly balls at a rate usually at 40 percent or above (courtesy of FanGraphs and Baseball Info Solutions):

	League Average
LD	20%
GB	44%
FB	36%

That brings us to the next rate: HR/FB. Fly ball pitchers generally have lower HR/FB ratios because of all of the balls in the air mitigate the imprint home runs leave. For ground ballers, it tends to be higher. This is not a predictive stat, but it is useful because it tells us that pitchers who occupy either the extremely good or extremely bad ends of the spectrum probably won’t stay there over a long period of time.

Metric	Transcendent	Sublime	Worldly	Common	Numbed	Anaesthetized
HR/FB	under 5%	5-7%	7-8.5%	8.5-10%	10-13%	above 13%

Credit: Ron Chenoy-USA TODAY Sports

In Context: now it’s time to pull all of these threads together. HR/FB informs FIP and xFIP; batted ball data and the FIPs inform SIERA; and SIERA is a modified estimate of what a pitcher’s ERA should have been and what it might be in future based on everything; all of this, in turn, informs baseball fans of a pitcher’s ability. Let’s use a case study as an explanation. Here is Jhoulys Chacin’s pitching line from 2013.

	Chacin
ERA	3.47
FIP	3.47
xFIP	3.97
SIERA	4.27

His ERA and FIP were exactly the same. Remember that the ERA should be read as a frozen statistic without predictive value. Last year, Chacin gave up 3.47 runs on average per nine innings—a good year. His FIP is exactly the same as his ERA, meaning that if we just measure his walks, strikeouts, and home runs, his ERA “should” have been what it actually was, 3.47. If we just used FIP, we’d conclude that his ERA was what it should have been based on his ability. However, Chacin’s xFIP is a bit higher. What this indicates is that he should have given up more home runs than he actually did last year based on league-wide HR/FB. In fact, Chacin’s HR/FB supports xFIP’s conclusion. In 197 innings, Chacin’s mark was a lofty 6.2%. It’s not that this is an impossible mark for a quality pitcher, but it’s unexpected for Chacin. Indeed, his 2013 was a tale of two halves when it came to HR/FB, which was an otherworldly three percent in the first half, and a modest ten percent in the second. Not only that, but Chacin’s ground ball tendencies mean that he and pitchers like him tend to have higher BABIPS. But in 2013, it sat at a fortunate .288. SIERA takes these things, as well as Coors Field’s tendency to inflate home run totals, into account in estimating what his ERA should have looked like: 4.27. For what it’s worth, ZiPS projects that Chacin will neither replicate his 2013 ERA nor produce according to his 2013 SIERA in 2014. It places him somewhere in between, with an ERA of 4.02 and a 4.00 FIP.

Like its predecessors, this post does not claim to be comprehensive. These metrics inhabit the statistical ecosystem from which I usually draw—if you have different preferences, let me know what they are and why in the comments or on Twitter.

Resources:

These three glossaries will lead you to a variety of other sources

FanGraphs Stat Glossary

Baseball Prospectus Glossary

Baseball Reference Collaborative Encyclopedia

Schedule