Monday, 2 May 2011

Stats stuff

It's an offday today, so what I thought I'd do is go in to a bit of detail about what I want this blog to be.  First, I don't claim to be some baseball savant; I'm simply a big baseball fan and have been my entire life.  I've always had a bit of a love affair with stats and numbers and stuff.  The main thing that I want to keep in mind when writing is that some people don't like baseball or numbers as much as I do, and some people simply don't look at the same stats for their baseball info as I do.  As result, today's blog will explain some of the advanced stats that I like to use, and express my distaste for some stats that are way overused.  I'll likely bookmark this at the top of the page and continuously edit it as new things come to light, or if I have a writer's block some day, I can explain them better.


  • "Slash-lines": this is probably the one I'll use the most.  It will be in the form of batting average/on base percentage/ slugging percentage.  
  • Slash lines are closely related to OPS, which is "On-base plus Slugging percentage".  I find faults with it, but don't really know how to explain it all that well without finding extreme examples.  Basically, it overrates slugging percentage.  OPS doesn't really like a guy like Ichiro, but I think we can all agree that he's a pretty good hitter.
    • Update, March 14/'12: Check this out!  Tango and Patriot at the book blog verbalize (in writing) the faults that I couldn't really explain myself, but then go another step beyond that.  Awesome stuff here.  
    • OPS+ sets the league-average OPS to 100.  An OPS+ of 101 or more is above league average, 99 or less is below league average.  Using Bautista's '10 season as an example, his OPS+ was 165 last season, which was right around tops in baseball.  OPS+ is park-adjusted (that is, some parks are easier to hit in, and some are easier to pitch in. The ball flies out of Yankee Stadium, for example, due to typical wind direction, field size, dimensions, and so on.  Basically, not all fields are created equal). 
  • wOBA: stands for "weighted on base average".  This is a fangraphs tool.  Basically, this was created to make up for the fact that a triple is worth 3x as many bases as a single, but is not 3x as valuable.  Similarily, a double with 0 outs and a triple with 0 outs give a really close expectation with regards to runs scoring.  wOBA also accounts for stolen bases, which OPS does not.  wOBA loves guys like Ichiro, despite the low slugging percentage, because they can steal bases and effectively turn a single in to a double.
  • WAR: stands for "Wins above Replacement".  This attempts to measure any single player's contributions to his team.  It calculates both offensive and defensive contributions.  It's a system that is more complicated than I could ever explain, so here is the fangraphs explanation.  You don't really need to know how it's calculated to understand it though; simply put, Jose Bautista was worth about 7 wins last season (depending on which site's WAR you look at; fangraphs and baseball-reference use different defensive metrics and rank mostly everyone differently).  Adam Lind was worth 0.5 wins last year.  That should give you a pretty good idea of what a good and bad WAR season is.  A league average player is right around 2 WAR, while a replacement level player is worth 0 by definition.
    • WAR is used to evaluate contracts and player value.  Let's say that two players are worth 2 WAR each, but one is making $2MM and the other $17MM.  There is one really good contract (i.e. the team is making a surplus over the value that they paid), and one pretty bad contract (i.e. the team is overpaying for a win).
    • A single WAR has a different value each season as contracts grow.  This past offseason, a WAR was valued at approximately $4.7MM.  In other words, a 3 WAR player should get a contract in the $14MM range on the free agent market.  Adam Dunn is a good example of this.  He's only been worth about a win and a half in the last two seasons because he's such a liability in the field.  By moving him to DH, Chicago is effectively looking at his batting numbers over the last few years (very good) and completely ignoring the defence (dogshit), since it becomes irrelevant (though each position has a different value in calculating WAR.  A DH needs better offensive numbers than a SS to be a 2 WAR player, since the average SS hits much less effectively than the average DH).
    • Marginal Wins: Not all WAR are created equal.  Teams who have been bad for the last few years (Pittsburgh, Kansas City) aren't going around every offseason signing expensive free agents, because all those free agents will do is help the team go from 72 wins to 75.  Instead, they build value with draft picks and deadline deals, keeping costs low until they are ready to contend.  Boston, on the other hand, went out this offseason and replaced their 1B (3B, technically) and LF with Adrian Gonzalez and Carl Crawford.  They probably overpaid with the two longterm contracts that they handed out, especially in a few years when those two are 35 and making $20MM a season, but the wins that those two players provide now can be the difference between winning 87 games, possibly missing the playoffs, and winning 94 games, winning the division.
    • Players who are still under team control (first 6 years of their career; 3 years of full control, 3 years of salary arbitration) are paid much less than free agents.  Full control years are typically around league minimum ($414K this season), and then arbitration years are performance and service time based, with players earning 40%, 60% and 80% of their market value in each respective year of arbitration.  This explains why Jose Bautista's contract is $8MM this season, and $14MM in the next four, since he was arbitration eligible this year.
  • FIP and xFIP: stands for "Fielding Independent Pitching".  Basically speaking, it's what a pitcher's ERA should be, after fixing for any amazing or disappointing defense, intentional walks, and so on.  Both adjust for ballparks and league (NL offenses are worse than AL, mostly because of DH`s, but also the AL East), but xFIP also adjusts homerun rates to 12.5% of flyballs.  Most pitchers have little to no control over what ratio of flyballs leave the park; about 12.5% leave, the rest stay.  Any pitcher with a HR% that varies drastically from that 12.5% is probably due to experience a regression.
  • BABIP: "batting average on balls in play": Briefly, about 30% of balls that are hit in to fair territory (excluding homeruns) fall in for hits, with the other 70% being outs.  Basically, if a pitcher has a .400 babip, he's either getting unlucky, or is just throwing a bunch of meatballs that are being hit for line drives.  Conversely, if a pitcher has a .200 babip, I'd expect his ERA to balloon sooner or later.  Babip is available for both pitchers and hitters.  For example, Aaron Hill had a famously low babip last year, coming in at something like .196.  As Jon Hale at The Mockingbird shows us, Hill just had a fucking brutal season last year, no matter the babip.  Hill`s pop-up rate was so damn high, and a pop-up may as well be a strikeout.  Power hitters typically have lower babip`s due to the higher flyball rates, which will turn in to outs (or HR`s, which aren`t measured in babip) more often than not.  A guy like Ichiro will have a higher babip because he hits a lot of line drives and grounders, and gets a whole bunch of infield singles.
  • K/9, BB/9, H/9, WHIP, aka "rates": This is how I'll compare pitchers to one another.  The "9" obviously stand for "per 9 innings".  I still like FIP better, since it is better adjusted than the raw data, but rates are still useful. 
I'd like to point out that I'm not a big fan of babip, since it doesn't really show how solid the contact is, nor does it show (I think anyway) how effective a pitch is.  Mariano Rivera comes to mind, painting corners with unhittable pitches (.261 career babip).  This is actually something I want to research a little bit, so I might have something some day.  But as a short term change from the norm, it does help with analyzing slumps and such.

(Update- I guess I have a bit of a point with my Rivera thing, but he`s a freak, so it`s not something we should really look at seriously.  Babip is pretty much normalized, to the point that .295-.300 is going to be your standard mark each year.  Rivera`s low babip can be explained by what is still a relatively low sample despite 15 seasons (1100ish career innings, which most decent starting pitchers reach in 4-5 seasons), some downright nasty stuff, and a reputation for throwing all strikes painting every corner all the time, which gets him a lot of called strikes by way of a wider than normal strikezone, and thus, more 2 strike counts, more swings at bat pitches, and poor contact, broken bats, etc.)

Bad stats:

  • Pitching record (wins and losses): let me use an extreme example.  I pitch for the Harlem Globetaters, who score an average of 105 runs per game.  I start 42 games a season and average a 10 ERA.  By all rights, I should be 42-0 by the end of the season, despite an ERA that might possibly make me the worst pitcher in history.  Yes, I realize that this assumes a lot, but it's a hypothetical.  Wins and losses (from a team perspective) have a lot to do with the entire pitching staff and the lineup, way too much so to give a pitcher a win or a loss, especially when it becomes a relief pitcher earning the decision.
  • RBI's: see: Bautista, Jose.  If people don't get on base in front of you, you can not knock in runs.  There is no such thing as "an RBI guy" in baseball.  There might be a good hitter, or a high on-base/high slugging guy, but there is no way possible for someone to have a quality of being good at knocking in runs.  Bautista would be a great RBI guy if the bases were loaded every time he came to the plate.  I`m sure every major league batter would be as well, even if they were a below average hitter.
  • Fielding percentage: sucks.  A 1B is going to have a really high fielding percentage due to the number of (easy) chances.  Don't look at anything like putouts or assists (for infielders at least-- outfield assists are not especially useful but guys like Shin-soo Choo, Ichiro, Bautista, etc. who have real good arms will get more outfield assists than floppy-armed dudes playing left field because they need a position) either.
  • Win shares: One of the few advanced stats that are fairly useless.  They're relatively close to WAR, but don't account for someone being worse than replacement level; you can't get negative win shares.  Those were only kind of popular 2-3 years ago, and that popularity lasted briefly, but they're still around.

I'll still use HR's, RBI, and AVG to describe what happened in any given game, but if you ever want to argue with me about player X being good because he knocked in 121 runs last year, don't.  I won't reply.  If you say "so-and-so is underrated because of [...]" and then use a bunch of good, useful stats, I'm all for it.

No comments:

Post a Comment