A recent interview process required passing some coding challenges.
When I first started programming I spent a decent amount of time on Project Euler, but since then I rarely do these crack-the-interview coding challenges. I find project-based work more interesting, I work mostly with data, and – based on what I understand from experienced interviewers – facility with brain teasers and coding challenges correlates less with good programming than time spent programming correlates with good programming. Anyway, I spent a few afternoons working through coding challenges on Codility to get a feel for the types of questions that get asked of software engineering candidates. Continue reading “A Few Nice Coding Challenges”
I recently did some A/B testing work through the Facebook advertising platform, and gave a quick presentation on the pros and cons of the platform. Here’s a summary.
- Inexpensive, low ceilings
- Demonstrated to work at scale, sophisticated distribution
- Click bots
To clarify my perspective on the platform, some background on the work we did:
We ran some A/B tests through the platform targeting a specific population, evaluating different levels of resulting engagement for statistical significance. I assure you, nothing fancy. Continue reading “Understanding Facebook Ads: Pros and Cons”
Decorators are intuitive and extremely useful. To demonstrate, we’ll look at a simple example. Let’s say we’ve got some function that sums all numbers 0 to n:
def sum_0_to_n(n): count = 0 while n > 0: count += n n -= 1 return count
and we’d like to time the performance of this function. Of course we could just modify the function like so:
Getting Useful Information Out of Unstructured Text
Let’s say that you’re interested in performing a basic analysis of the US M&A market over the last five years. You don’t have access to a database of transactions and don’t have access to tombstones (public advertisements announcing the minimal details of a closed deal, e.g. ABC acquires XYZ for $500mm). What you do have is access to is a large corpus of financial news articles that contain within them – somewhere – the basic transactional details of M&A deals.
What you need to do is design a system that takes in this large database and outputs clean fields containing M&A transaction details. In other words, map an excerpt like this: Continue reading “Shallow Parsing for Entity Recognition with NLTK and Machine Learning”
I recently undertook some work that looked at tagging academic papers with one or more labels based on a training set.
A preliminary look through the data revealed about 8000 examples, 2750 features, and…650 labels. For clarification, that’s 2750 sparse binary features (keyword indices for the articles), and 650 labels, not classes. Label cardinality (average number of labels per example) is about 2, with the majority of labels only occurring a few times in the dataset…doesn’t look good, does it? Nevertheless, more data wasn’t available and label reduction wasn’t on the table yet, so I spent a good amount of time in the corners of academia looking at multi-label work.
A useful snippet for visualizing decision trees with pydotplus. It took some digging to find the proper output and viz parameters among different documentation releases, so thought I’d share it here for quick reference.