MLE, MAP, and Naive Bayes


Suppose we are given a dataset X of outcomes from some distribution parameterized by \Theta. How do we estimate \Theta?

For example, given a bent coin and a series of heads and tails outcomes from that coin, how can we estimate the probability of the coin landing heads? Continue reading “MLE, MAP, and Naive Bayes”


Text Classification at Data Science Hackathon with DataKind

Last weekend I attended a DataKind data science hackathon. It was a lot of fun and a great way to meet people in the space and share some ideas. If it sounds the least bit interesting, I encourage you to join a DataKind event. Here’s what my team worked on, which should serve as a good indication what you might do over the course of the weekend. My code here: project folder, supervised classification – most interesting, and topic modeling. Continue reading “Text Classification at Data Science Hackathon with DataKind”

A Few Nice Coding Challenges

A recent interview process required passing some coding challenges.

When I first started programming I spent a decent amount of time on Project Euler, but since then I rarely do these crack-the-interview coding challenges. I find project-based work more interesting, I work mostly with data, and – based on what I understand from experienced interviewers – facility with brain teasers and coding challenges correlates less with good programming than time spent programming correlates with good programming. Anyway, I spent a few afternoons working through coding challenges on Codility to get a feel for the types of questions that get asked of software engineering candidates. Continue reading “A Few Nice Coding Challenges”

Understanding Facebook Ads: Pros and Cons

I recently did some A/B testing work through the Facebook advertising platform, and gave a quick presentation on the pros and cons of the platform. Here’s a summary.


  • Microtargeting
  • Optimization
  • Inexpensive, low ceilings
  • Demonstrated to work at scale, sophisticated distribution


  • Click bots
  • Opaque

To clarify my perspective on the platform, some background on the work we did:

We ran some A/B tests through the platform targeting a specific population, evaluating different levels of resulting engagement for statistical significance. I assure you, nothing fancy. Continue reading “Understanding Facebook Ads: Pros and Cons”

Decorators and Metaprogramming in Python


Decorators are intuitive and extremely useful. To demonstrate, we’ll look at a simple example. Let’s say we’ve got some function that sums all numbers 0 to n:

def sum_0_to_n(n):
    count = 0
    while n > 0:
        count += n
        n -= 1
    return count

and we’d like to time the performance of this function. Of course we could just modify the function like so:

Continue reading “Decorators and Metaprogramming in Python”

Shallow Parsing for Entity Recognition with NLTK and Machine Learning

Getting Useful Information Out of Unstructured Text

Let’s say that you’re interested in performing a basic analysis of the US M&A market over the last five years. You don’t have access to a database of transactions and don’t have access to tombstones (public advertisements announcing the minimal details of a closed deal, e.g. ABC acquires XYZ for $500mm). What you do have is access to is a large corpus of financial news articles that contain within them – somewhere – the basic transactional details of M&A deals.

What you need to do is design a system that takes in this large database and outputs clean fields containing M&A transaction details. In other words, map an excerpt like this: Continue reading “Shallow Parsing for Entity Recognition with NLTK and Machine Learning”