Expected Batting Average vs. Projected Batting Average …so you’re telling me there’s a chance
Let’s get this part out of the way so we can have some fun (scroll all the way down if you need to skip the preface).
BABIP measures how many balls in play fall for this. On average, 70+% of linedrives fall for hits. In general, defense, luck and power/speed are the variables that can affect BABIP.
xBABIP depicts the expected batting average on balls. The newest formula I can readily point to (and I have integrated into my projection tool) is = (( GB – IFH ) * (GBIFH constant) + (FBHRIFFB) * (OFFB Constant) + LD * (LD Constant) + IFH + BUH ) / (GB + FB + LD + BU + – HR – SH), which replaced the previous formula that focused more on the BIP percentages.
Where am I going with this? The value of xBABIP is that it can be interpreted to output an expected batting average, which looks like this: HR+xBABIP*(ABKHR+SF))/AB.
Therefore, based on BIP data, I want to look at hitters that a) fall short of their expected BABIP’s and/or b) with some luck can approach a batting average closer to their expected rates (vs. my projected rates).
Above, I mentioned defense, luck and skilltype can effect xBABIPs and associated xAVG. To explain this a bit more, I mean variables like speed, speed associated with GB rates and average distance associated with FB rates. Mike Podhorzer found these regression results. In addition to these variables, he pointed out other important variables that need to be interpreted like batted ball direction and park factors.
My expected batting average projection approach is an intense, irrationally long, not infinitely reliable approach – one that looks at their career BABIP and average on each balls in play type (GB%, FB% and LD%). It gets regressed/trended appropriately. An infinitely quicker way (by formula) is the above expected batting average formula. Again = HR+xBABIP*(ABKHR+SF))/AB. The issue is that much more often than my approach, the output won’t wind up anywhere near their actual average. Next year, I am going to use the xBABIPxAVG instead from a timesaving perspective, but I will catch the values that are too far off (maybe by .005 points or something of that nature) and naturally will incorporate park factors, BIPspray information as well as speed scores and powerrelated peripherals. I’ll run correlations after the fact to see if it was any better than what I have going now.
Need an example? Let’s use Edwin Encarnacion. Albeit an extreme example, 3 of the past 4 years, EE actually had batting averages that surpassed his BABIPs! Check out this matrix (customized in FanGraphs):
AB 
K% 
BB% 
IFFB% 
IFH% 
HR/FB 
GB% 
FB% 
LD% 
BABIP 
ISO 
AVG 
Ct% 

‘11 
481 
14.50% 
8.10% 
17.20% 
6.10% 
9.40% 
36.40% 
44.20% 
19.40% 
0.292 
0.181 
0.272 
84.50% 
‘12 
542 
14.60% 
13.00% 
12.00% 
13.30% 
18.70% 
33.00% 
49.50% 
17.60% 
0.266 
0.277 
0.28 
82.20% 
‘13 
530 
10.00% 
13.20% 
9.30% 
6.00% 
17.60% 
35.10% 
43.30% 
21.60% 
0.247 
0.262 
0.272 
84.60% 
Tot 
3803 
15.80% 
10.00% 
14.50% 
7.30% 
13.80% 
35.90% 
45.00% 
19.10% 
0.275 
0.215 
0.265 
81.80% 
Here is what i have projected for EE in 2014: .262 BABIP; .271 AVG. But what about his expected babip and batting average? Check out the below matrix filtered by the highest expected average differential or “xAvgdiff.” Three columns prior you can filter by expected babip differential or “xBABIPdiff.” I also added the rankings for each of these two differentials in the first 2 columns. If you want their actual rank, you can sort by ‘Pos. Adj” which is their adjusted by position 5×5 fantasy value or you can simply go to our 2014 Fantasy Baseball Projection Page.
Now guys like EE, Kinsler, Scutaro…we know what they are now. We know their BIPrelated data, maybe age will keep their babip and avg. away from their xBABIP and xAVG, but the value in this matrix is for young/high potential opportunities. With some luck/development/combination of power and speed growth…they can output a batting average a bit higher than we have projected.
However, if you go to other sites with projections that approach .290 for guys like Kinsler, EE (which I’ve seen) it’s because they’re solely using the xBABIPxAVG formula which isn’t appropriate and is overestimating their ball in play mix.
Some value opportunities toward the top: Andrelton Simmons, Brian Dozier, Matt Carpenter again.
Next Steps?!
I might play around with some zscores – contextually understanding what we can determine if we looked at these differentials in conjunction with speed scores, factors from Baseball Heat Maps (average distance, angle, etc.). We’ll see what we come up with.
If you have ideas, please comment away!!!
LD% and GB% should allow a player to carry a decent BABIP since both go for hits more than FBs. Or do they? Does batted ball distance correlate or in E5 scenerio offset BABIP marks?
LD% for sure; GB% more than FB% and especially w/ speed. Batted ball spray (l/f/r) and batted ball distance certainly has an effect. Distance obviously is directly related with HR/FB ratio …more HR the better average positively effecting BABIP naturally. Batted ball spray can offset xbabips…per this link: http://www.fangraphs.com/fantasy/battedballlocationandbabip/ you can see that they correlate (at least directionally) with xBABIP. Pull guys like EE will get negatively effected which matches up with his career rates.
Where would one find the constants used in the above listed xBABIP formula for 2013? The Fangraphs article only includes data through 2012.
Good ? I believe I used the average data from ’09 or ’10 to ’12…whatever was the most recent available data. Not sure where to find it..I can check