Sometimes our models overfit, sometimes they overfit.
A model’s capacity is, informally, its ability to fit a wide variety of functions. As a simple example, a linear regression model with a single parameter has a much lower capacity than a linear regression model with multiple polynomial parameters. Different datasets demand models of different capacity, and each time we apply a model to a dataset we run the risk of overfitting or underfitting our data.
Continue reading “Statistical Learning Theory: VC Dimension, Structural Risk Minimization”
The Box-Cox transformation is a family of power transform functions that are used to stabilize variance and make a dataset look more like a normal distribution. Lots of useful tools require normal-like data in order to be effective, so by using the Box-Cox transformation on your wonky-looking dataset you can then utilize some of these tools.
Here’s the transformation in its basic form. For value and parameter :
Continue reading “The Box-Cox Transformation”
Suppose we are given a dataset of outcomes from some distribution parameterized by . How do we estimate ?
For example, given a bent coin and a series of heads and tails outcomes from that coin, how can we estimate the probability of the coin landing heads? Continue reading “MLE, MAP, and Naive Bayes”