## Statistical Learning Theory: VC Dimension, Structural Risk Minimization

Sometimes our models overfit, sometimes they overfit.

A model’s capacity is, informally, its ability to fit a wide variety of functions. As a simple example, a linear regression model with a single parameter has a much lower capacity than a linear regression model with multiple polynomial parameters. Different datasets demand models of different capacity, and each time we apply a model to a dataset we run the risk of overfitting or underfitting our data.

## DropConnect Implementation in Python and TensorFlow

I wouldn’t expect DropConnect to appear in TensorFlow or Theano since, as far as I know, it’s used pretty rarely and doesn’t seem as well-studied or demonstrably more useful than its cousin, Dropout. However, there don’t seem to be any implementations out there, so I’ll provide a few ways of doing so. Continue reading “DropConnect Implementation in Python and TensorFlow”

## Style Transfer with Tensorflow

A Neural Algorithm of Artistic Style” is an accessible and intriguing paper about the distinction and separability of image content and image style using convolutional neural networks (CNNs). In this post we’ll explain the paper and then run a few of our own experiments.

To begin, consider van Gogh’s “The Starry Night”: Continue reading “Style Transfer with Tensorflow”

## The Box-Cox Transformation

The Box-Cox transformation is a family of power transform functions that are used to stabilize variance and make a dataset look more like a normal distribution. Lots of useful tools require normal-like data in order to be effective, so by using the Box-Cox transformation on your wonky-looking dataset you can then utilize some of these tools.

Here’s the transformation in its basic form. For value $x$ and parameter $\lambda$:

$\displaystyle \frac{x^{\lambda}-1}{\lambda} \quad \text{if} \quad x\neq 0$

$\displaystyle log(x) \quad \text{if} \quad x=0$

## MLE, MAP, and Naive Bayes

Suppose we are given a dataset $X$ of outcomes from some distribution parameterized by $\Theta$. How do we estimate $\Theta$?

For example, given a bent coin and a series of heads and tails outcomes from that coin, how can we estimate the probability of the coin landing heads? Continue reading “MLE, MAP, and Naive Bayes”

## Text Classification at Data Science Hackathon with DataKind

Last weekend I attended a DataKind data science hackathon. It was a lot of fun and a great way to meet people in the space and share some ideas. If it sounds the least bit interesting, I encourage you to join a DataKind event. Here’s what my team worked on, which should serve as a good indication what you might do over the course of the weekend. My code here: project folder, supervised classification – most interesting, and topic modeling. Continue reading “Text Classification at Data Science Hackathon with DataKind”