This document represents the work of Christopher Teixeira for the 2017 TruMedia Hackathon.

Understanding pitches

For illustrative purposes, I wanted to understand this new field probCalledStrike to see how well it might be used to determine a pitcher's best pitch. I'm doing this work on a macbook air, with limited access to other more powerful resources. To that end, I'll read in three years worth of data and stick to analyzing Jon Lester's set of pitches in that time frame. The raw data had a lot of data, but I found two variables of interest that I wanted to add:

  1. battingTeam: the team facing the pitcher
  2. pitchID: an identifier that would allow me to order pitches in a game
A pitcher's best pitch

pitcher pitchType numPitches probCalledStrikeMean probCalledStrikeSd
Jon Lester FC 2377 0.4584699 0.4214171
Jon Lester SI 1268 0.4451964 0.4086514
Jon Lester CU 1565 0.3200946 0.4040301
Jon Lester CH 425 0.3144729 0.3961353
Jon Lester FF 5025 NA NA
We can see that for Jon Lester, he throws the cutter the most, and gets a higher average probCalledStrike. This matches up against Brooks Baseball’s Jon Lester card which provides a nice validation for this metric. However, the boxplot shows that the median is actually higher for the fourseam fastball.

My next question was, “could you use probCalledStrike as a way to determine whether he will throw that type of pitch again?” This gets into a bit of a sophisticated solution. Using previous pitch information, we can try and see if it helps to determine whether he feels confident in throwing that pitch.

Feature engineering

In starting to look at this, I'm taking a very few amount of features. Here's a quick description of the features and their calculations.


df$noise <- runif(nrow(df),0,1)

                        Description=c("Year the pitch took place",
                                      "The pitch type being predicted",
                                      "The batter's hand",
                                      "The number of balls for the at bat before the pitch",
                                      "The number of strikes for the at bat before the pitch",
                                      "The number of outs before the pitch",
                                      "Boolean for a runner on first",
                                      "Boolean for a runner on second",
                                      "Boolean for a runner on third",
                                      "Number of times batter faced this pitcher within this game",
                                      "The last pitch type thrown",
                                      "The last pitch's probability of called strike",
                                      "The last pitch result",
                                      "The batter's hand against the last pitch",
                                      "Random noise")))
Features Description
seasonYear Year the pitch took place
pitchType The pitch type being predicted
batterHand The batter’s hand
balls The number of balls for the at bat before the pitch
strikes The number of strikes for the at bat before the pitch
outs The number of outs before the pitch
manOnFirst Boolean for a runner on first
manOnSecond Boolean for a runner on second
manOnThird Boolean for a runner on third
inning Inning
timesFaced Number of times batter faced this pitcher within this game
lastPitchType The last pitch type thrown
lastprobCallStrike The last pitch’s probability of called strike
lastPitchResult The last pitch result
lastBatterHand The batter’s hand against the last pitch
noise Random noise

Building a model

Now that we have a set of features to work with, let’s get into modeling. For this exercise, I chose a multinomial logistic regression model. I chose to use the caret library for the ability to switch models later on just in case it is needed.

First up, let’s split the data into train and test data sets. This will allow us better control to assess the performance of the model. I’ll also take the time to create the train control that will be used in the train function.

Now let’s actually train the model and print out the some information on its performance. As you can see in the results below, this isn’t exactly a great model. The first print function gives information on the model. It describes the classes we’re predicting and the accuracy for different tuning attempts.

We then print out the coefficients to see how they might be different across the pitch types and the various inputs to describe the pitching situation. In theory, these estimates would indicate whether each variable has a positive or negative influence on the type of pitch to be thrown.

The confusion matrix gives us an idea on where the predictions might be failing. In other algorithms, the confusion matrix shows that those algorithms only choose the fourseam fastball while maintaining similar accuracy. In this particular effort, we can see how one pitch might be mistaken for another.

Finally, let’s print out the variable importance to see exactly where probCalledStrike fell in the list of variables. I include a noise variable in order to measure which variables are better than random noise. As it turns out most variables are, but the probCalledStrike strike for the previous pitch didn’t have a large impact on the final result.

Given the original model wasn't great, I was hesitant to take this further. However, let's use the test dataset to see how well this model performed. The confusion matrix indicates a similar performance to the model, which is somewhat good and somewhat bad. The model performs poorly, but consistently poorly.

model.test <- predict(model, newdata = df.test,type="prob")
model.test$pred <- predict(model, newdata = df.test)
model.test$obs <- df.test$pitchType
## Confusion Matrix and Statistics
##           Reference
## Prediction   CH   CU   FC   FF   SI
##         CH    0    1    0    0    1
##         CU   11   86   52   99    9
##         FC    9   72  103  111   34
##         FF  150  466  793 1763  458
##         SI    0    0    1    0    1
## Overall Statistics
##                Accuracy : 0.4628         
##                  95% CI : (0.4477, 0.478)
##     No Information Rate : 0.4675         
##     P-Value [Acc > NIR] : 0.7364         
##                   Kappa : 0.0595         
##  Mcnemar's Test P-Value : <2e-16         
## Statistics by Class:
##                      Class: CH Class: CU Class: FC Class: FF Class: SI
## Sensitivity          0.0000000   0.13760   0.10854    0.8936 0.0019881
## Specificity          0.9995062   0.95243   0.93091    0.1691 0.9997310
## Pos Pred Value       0.0000000   0.33463   0.31307    0.4857 0.5000000
## Neg Pred Value       0.9596965   0.86399   0.78258    0.6441 0.8809862
## Prevalence           0.0402844   0.14810   0.22488    0.4675 0.1191943
## Detection Rate       0.0000000   0.02038   0.02441    0.4178 0.0002370
## Detection Prevalence 0.0004739   0.06090   0.07796    0.8602 0.0004739
## Balanced Accuracy    0.4997531   0.54502   0.51972    0.5313 0.5008595

Building a better model

My next thought was, why not try other algorithms. We can use the caretEnsemble package to run several models at once. It doesn’t support multiclass predictions but we can at least get the results out in a minimal amount of coding.

The correlation and dotplot give us information about how each model performed. In general, it wasn’t much better than our original model but perhaps we can find away to ensemble them together as future work.


## Call:
## summary.resamples(object = results)
## Models: rf, nnet, multinom 
## Number of resamples: 10 
## Accuracy 
##            Min. 1st Qu. Median   Mean 3rd Qu.   Max. NA's
## rf       0.4605  0.4659 0.4694 0.4688  0.4722 0.4761    0
## nnet     0.4596  0.4659 0.4717 0.4691  0.4726 0.4764    0
## multinom 0.4485  0.4638 0.4690 0.4667  0.4702 0.4766    0
## Kappa 
##                Min. 1st Qu.    Median      Mean  3rd Qu.     Max. NA's
## rf       -0.0008324 0.00000 0.0002156 0.0005866 0.001363 0.002425    0
## nnet      0.0096460 0.04000 0.0635700 0.0565200 0.071960 0.088470    0
## multinom  0.0432100 0.06106 0.0716700 0.0704100 0.076440 0.092820    0
##                 rf      nnet  multinom
## rf       1.0000000 0.8501164 0.7015046
## nnet     0.8501164 1.0000000 0.6944573
## multinom 0.7015046 0.6944573 1.0000000

Next Steps

Well as you can tell, this didn’t turn out too great. I’d like to create a custom ensemble model just to see how much it might improve on the individual models. In addition, there are additional features that I would like to create:

Please reach out to or @ct_analytics on twitter if you have any questions about this analysis.