Search This Blog

Friday, February 22, 2013

Trouble with Curve: How Econometrics works (or fails)?

Trouble with Curve: Baseball Scouting Job is more than just computer

After Michael Lewis wrote the book about Econometric application in Major League Baseball legendary manager Billy Beane and his magical application of Sabermetrics, people might start using such techniques to evaluate the quality of players. However, in "Trouble with Curve", the movie portrait an old-schooled scout spotting better players with his experience: a seemingly contradictory example. 

Which one is closer to fact?  Do we exaggerate the power of Sabermetrics?

Both movie present valid points: the key is "Random Sampling"  and "Sample Size"

Let's look at what data Billy and Gus might have in hands, and what these data sets might differ.


Billy's data has a good collections of all major/minor player records. These players should have been in the league for a while. Therefore, as batter/pitcher, they have faced various counterparts in all kinds of scenarios.  So, the number reflects a good part of their performance. This is "random" sampling.

The batting/pitching statistics of a player would be more valid when he faced various kinds of opponents. Which means, the number is not "biased" toward certain characteristics/situations.

The second thing which is crucial is "large sampling size". Pitching statistics would only make sense when a pitcher has enough innings-pitched (IP). So as for batters, their statistics would only make sense when they have enough at-bat (AB). So, if Billy's data is valid, buy a player and give him some appearances in game, he would perform more or less as the same as he was. This is Law of Large Number: when the observation points are large enough, certain characteristics would show up more or less the same as their previous average.

So, as presented in "Money Ball", Billy Beane use the data of "established" major league players. These players have been in the system for a while, they know the game, and their statistics reflect, to some good extent, of their ability in certain dimension. For example, what Billy looks for is OBP(on-based percentage), he wants the players have good eyes to get themselves on base. So, Voila, "Money Ball", he could spend every dime on players he needs.

In "Trouble with Curve", the fictional scout, Gus, relies more on his field experiences. There are few statistical records of these high school players . Or, even if there were, these numbers could not be very representative. You might not have too many high school pitchers throwing nice breaking balls. Therefore, the statistics mainly reflect the ability of a batter deal with fastballs. So, Voila, "Trouble with Curve." A good scout can only spot it on the field based on the gesture and contact of a batters in the game.

Along my way of studies for research, I found these two ideas so basic and so important. However, sometimes people would take it for granted when they approach some samples. To exam whether the sample ever close to random or/and having enough observations in samples is a very crucial task before we do any further inferences/estimations.

These ideas also could apply into our life, and especially in some crucial decisions that we usually only make once or twice in life. These are the places where mistakes could easily committed. For example, as Prof. Robert Shiller points out in his book: The Subprime Solution. (I have to check for sure but in my memory this is the book). He consider the house purchasing decision is not made frequently for most individuals/households. Therefore, it is prone to make wrong judgement about the level of housing price.

Econometrics are boring? Not at all, it tells you a lot about how life makes sense.

No comments:

Post a Comment

假想情境:Omicron已在歐洲 (?)

  這是荷蘭疫情開始後,病房住院狀態:從這樣的變化,有沒有新型變體已經在歐陸的可能?