I wouldn’t expect DropConnect to appear in TensorFlow, Keras, or Theano since, as far as I know, it’s used pretty rarely and doesn’t seem as well-studied or demonstrably more useful than its cousin, Dropout. However, there don’t seem to be any implementations out there, so I’ll provide a few ways of doing so. Continue reading “DropConnect Implementation in Python and TensorFlow”
Suppose we are given a dataset of outcomes from some distribution parameterized by . How do we estimate ?
For example, given a bent coin and a series of heads and tails outcomes from that coin, how can we estimate the probability of the coin landing heads? Continue reading “MLE, MAP, and Naive Bayes”
Getting Useful Information Out of Unstructured Text
Let’s say that you’re interested in performing a basic analysis of the US M&A market over the last five years. You don’t have access to a database of transactions and don’t have access to tombstones (public advertisements announcing the minimal details of a closed deal, e.g. ABC acquires XYZ for $500mm). What you do have is access to is a large corpus of financial news articles that contain within them – somewhere – the basic transactional details of M&A deals.
What you need to do is design a system that takes in this large database and outputs clean fields containing M&A transaction details. In other words, map an excerpt like this: Continue reading “Shallow Parsing for Entity Recognition with NLTK and Machine Learning”
I recently undertook some work that looked at tagging academic papers with one or more labels based on a training set.
A preliminary look through the data revealed about 8000 examples, 2750 features, and…650 labels. For clarification, that’s 2750 sparse binary features (keyword indices for the articles), and 650 labels, not classes. Label cardinality (average number of labels per example) is about 2, with the majority of labels only occurring a few times in the dataset…doesn’t look good, does it? Nevertheless, more data wasn’t available and label reduction wasn’t on the table yet, so I spent a good amount of time in the corners of academia looking at multi-label work.
This project stems from two overarching questions:
I recently saw a BuzzFeed presentation on, among other things, the virality of BuzzFeed content. A big part of their business relies on understanding what kind of content goes viral and why, so their data science team understandably spends a lot of time not only looking at how a piece of content becomes widely popular, but also looking at the distribution of content types in their most popular pieces of content. Continue reading “Trump Tweet Analysis”
How does front page news track a single topic over a period of time? What’s the media’s attention span for a given story?
In general, many find it surprising how quickly major media outlets shift their attention from one story to another. This is partly a reflection of our own attention spans and appetites, and is partly due to the fact that media organizations are incentivized to be the first to break news; as a result readers are more likely to be bombarded with what’s novel instead of what’s important. Continue reading “Article Classification and News Headlines Over Time”
What is regularization? Regularization, as it is commonly used in machine learning, is an attempt to correct for model overfitting by introducing additional information to the cost function. In this post we will review the logic and implementation of regression and discuss a few of the most widespread forms: ridge, lasso, and elastic net. For simplicity, we’ll discuss regularization within the context of least squares linear regression, and I assume that you have some familiarity with linear regression. Onward! Continue reading “Introduction to Regularization”