Saturday, June 30, 2012

The Ale-X-Files: Predictions

Now that we're through more than half the season, let's take a look at just how accurate those preseason predictions were.

Much of this article uses statistics from that I calculated in this spreadsheet. The stats are from prior to Friday's sim. You'll see the highlights here, but all the details are there and too numerous to post here.  There are some other fancy stats included, like Pythagorean Run Expectation, but that information got to be too much for this article, so I'll just stick with the preseason predictions for now.

Let's start, though, back at the beginning, when those preseason predictions came out and see how accurate they have been. Remember how excited or maybe even demoralized some of us might have been? Some of us looked at and discussed them last week, and at a glance, they seem relatively close. In fact, I remember Martín's comment  (probably because it was about my team) that Minnesota was the one big swing and miss.

But are preseason predictions something we could rely on and maybe even use as guidance for how well our teams will do? Let's take a look at how things have sorted out so far.

Nailed It:
Right now, there are six teams that have a winning percentage within two percentage points of the preseason predictions.  For those doing the math, this means their record is within two games or fewer right now of where OOTP predicted they would be.

Montgomery and Montreal are at -0.3%.
Los Angeles is at -1.20%
Ann Arbor, Seattle, and Mile High are all between 1.5%-2.0% off (Mile High and Ann Arbor to the positive, with Seattle to the negative).

Hovering:
There are a quite a few more teams (7) who will need the season to play out in order to see how accurate the predictions are. These teams have a difference between 2.0% and 4.8% from their predictions, or between 2 and 4 games. In order from underperforming to outperforming:

Maple: -4.20%
Dallas: -4.00%
South Carolina:  -3.60%
San Diego: -2.70%

Greenville: +3.00%
Cabo San Lucas: +3.30%
Carolina: +4.40%

Significant Misses

This category features teams who are five or more games off in their preseason predictions (4.9% or greater). While this may seem small, for most of them it's the difference between leading a division or being in second place, or competing for one and not competing for one. In most cases the difference is larger than the minimum five games.

So Cal: -37.70%
New York: -8.50%
Jacksonville: -6.60%
Boston: -4.90%

Carolina: +4.90%
Las Vegas: +5.00%
Jersey: +6.10%
Eureka: +6.40
San Fran: +6.80%
Nottinghamshire: +7.80%
Minnesota: +13.40%

Observations:
  • It seems that most teams, are relatively significant misses of 5 games or more, which will end up being about 8 games by the end of the season. These are significant numbers for the wild card and division races.
  • Personally, I'm most impressed/surprised by a team like San Fran, that was picked to win 90 games, a pretty impressive number by itself, but they are currently on pace to win several more than that. When a team is at the extreme ends, it's very hard to "improve" on that number in the case of a high win total, or to lose more than predicted when at the low end. My next point may explain this a bit better.
  • A prediction of high success is very challenging to meet. Take a look at teams like Boston, Maple, New York, Seattle, Dallas to see what I mean. These teams were all predicted to win 96 or more games and as of this data study, none were on pace to meet the original predictions.   The reverse is true as well, if the expectations are low, it's very likely that a team will outpace the low expectations (Nottinghamshire, KC, Eureka). Point is, if you're in the 100 game win/loss range, it's just hard to meet either of those goals.
Conclusions:
  • Considering that nearly half the teams are off by 5 or more games, I'd argue that preseason predictions have more value as an interesting talking point than as an accurate predictive tool.
  • They do have some level of relative predictability though, as most teams fall within about 8-10 games of accuracy for the entire season. It might be interesting to consider that this might be the range you have the ability to impact your team once the roster is set in combination with some luck (both good and bad -- ie, injuries). This range has a massive impact on the standings, but it's not going to take a team that is predicted to be elite out of the running or bring a rebuilding team into competitiveness.
Caveats (and there are couple of big ones):
  • As I mentioned in my email earlier this week, preseason predictions suffer from two major flaws in that they take the roster the GM has set at the time of the predictions and lack the inability to react the way a human GM does when things go wrong (like selling off a massive number of veterans or making three or four trades to get better).
  • There are still a lot of games left, and while I don't expect most of these numbers to change, some of them may end up closer to the expectations. I'd argue, though that the 110 game sample used in this article is going to prove a better sample size than the last 50.

4 comments:

  1. Alex doesn't say much about his own team but + 13.4% is an amazing differential for the plus side. I guess there is a reason they are called the Berserkers.

    ReplyDelete
  2. I love the analysis. I'll need to analyze the spreadsheet in some detail. Two observations.

    1) As I recall when we discussed these at the start of the season, it was commented that +/- 9 was the variance. Also a 5% variance over the season would be +/- 8 games, so if that were the normalized bell curve, there may actually be more in the morm.

    2) There probably needs to be a slight adjustment. As was pointed out this is a zero sum game, and So Cal's huge variance is why there are more positive than negative. So the norm may be closer to a +1 or +2% as opposed to zero.

    Got to like number though

    ReplyDelete
    Replies
    1. 1.)Yeah, as noted, 5 games now turns into 8 games at the end of the season and most teams were either at, or beyond that mark of accuracy. My point was:
      a) it doesn't work well for teams on the edge. Boston was predicted to win 104 games, so with a number like that, they're much more likely to come up short than go over....and

      b) that, for most teams, the number is just too big to really be of much use for a GM when they see them at the start of a season.

      2.) There aren't more negative teams than positive teams overall. We oddly have a 12-12 split despite the fact that So Cal is off the charts, so I'm not sure what adjustment you're referring to?

      Overall, I should add that I'm relatively glad the preseason predictions are as inaccurate as they are. I'd hate using them or seeing them if they were closer to final records.

      Delete
  3. I absolutely agree that the preseason predications tend towards extremes, with Boston and Nottinghamshire being the prime examples.

    However, the predictions were quite accurate. Look at the Munson League: the predicted order of finish agrees with the current standings for 10 of the 12 teams. (It predicted Maple in first and San Francisco in second.) That's one rather minor miss.

    Things are more complicated in the Clemente League, but not too much. There is a minor miss on predicting New York in second and Ann Arbor in third. There is the big swing and miss on Minnesota: predicted to finish in third and under .500 but instead the best team in the RCL. And there is the exceptional case of So Cal, a team which has completely changed since the beginning of the season: I can hardly blame the A.I. for not predicting that.

    The winning percentages are off at the extremes, but the overall predictions still seem accurate to me.

    ReplyDelete