![]() ![]() Once again, I separated the AL and NL, because the AL had an expected run value of 4.4995 runs/game and a 9.9989 variance, and the NL had 4.2577 runs/game expected value and 9.1394 variance. The NBD and PD can be used to describe the runs scored in a game by a team as well. Using a pitcher’s expected runs/inning, the NBD could be used to approximate the pitcher’s chances of throwing a no-hitter assuming he will pitch for all 9 innings. The NBD allows someone to calculate the probability of the likelihood of an MLB team scoring more than 7 runs in an inning or the probability that the home team forces extra innings down by a run in the bottom of the 9th. It’s clear that there are a lot of scoreless innings, and very few innings having multiple runs scored. Using only the expected value and the variance, the negative binomial distribution approximates the distribution of runs per inning more accurately than the Poisson distribution. Using data from 2011-2013, the American League had an expected value of 0.4830 runs/inning with a 1.0136 variance, while the National League had 0.4468 runs/innings as the expected value with a. I separated the two leagues to get a better fit for the data. The second section of the post will discuss the specific equations and their application to baseball.īecause of the difference in rules regarding the designated hitter between the two different leagues there will be a different expected value and variance of runs/inning for each league. I’ve plotted both distributions for comparison throughout the post. So if you are trying to describe why 73% of all MLB innings are scoreless to a friend over a beer, either will work. This doesn’t at first intuitively seem like it relates to a baseball game or an inning, but that will be explained later.įrom a conceptual stand point, the two distributions are closely related. ![]() It would answer the question, what’s the probability that I get 3 TAILS before I get 5 HEADS when I continue to flip a coin. The NBD is also a discrete probability distribution, but it finds the probability of a certain number of failures occurring before a certain number of successes. It’s not a terrible way to approximate the data or to conceptually understand the randomness behind baseball scoring, but the negative binomial distribution (NBD) works much better. ![]() The actual data is in gray and the Poisson distribution is in yellow. The graph above shows an example of the application of count data distributions. In both runs per inning and runs per game, the variance is about twice as much as the mean, so the real data will ‘spread out’ more than a PD predicts. The PD makes an assumption that the mean and variance are equal. It predicted fewer scoreless innings and many more 1-run innings than what really occured. This worked reasonably well to get the general shape of the distribution, but it didn’t capture all the variance that the real data set contained. The Poisson distribution describes count data like car crashes or earthquakes over a given period of time and defined space. Previously, I’ve used the Poisson distribution (PD) to describe the probability of getting a certain number of runs within an inning. The overall goal of this post is to describe the random process that arises with scoring runs in baseball. ![]() Runs in baseball are considered rare events and count data, so they will follow a discrete probability distribution if they are random. This large number of scoreless innings can be described by discrete probability distributions that account for teams scoring none, one, or multiple runs in one inning. However, if you attend a baseball game, the vast majority of innings you’ll watch will be scoreless. 4830 runs per inning, but does this mean they will score a run every two innings? This seems intuitive if you apply math from Algebra I. A team in the American League will average. ![]()
0 Comments
Leave a Reply. |