Max Harlow

I currently work at Ordnance Survey's Geovation lab. I was previously technical lead at the Bureau of Investigative Journalism. Before that I worked as a software developer at the Guardian.

I also co-organise Journocoders, a monthly meetup for people working at the point where journalism and technology meet, focusing on teaching and learning practical skills for working in this space.

Projects

Command-line data analysis tools

CSV Match finds matches in two spreadsheets, optionally using various fuzzy-matching algorithms.

CSV Pivot produces pivot tables, much like those in Excel, but in the terminal.

CSV Bar converts a spreadsheet into a visual bar chart, for quickly understanding numerical data during analysis.

Newsagent

A Node-based system initially developed during my time at the Guardian to automatically identify potential story leads. Allows the creation and management of autonomous bots which poll data sources that you are interested in. Data is then processed and compared to the last time the bot ran. Any records being added or removed triggers an email alert. For example, this could be scraping a list of names from a website and comparing them against another list of politically notable people.

View on Github.

Graphik

A D3-based webapp to enable non-technical people to create simple static charts quickly. It can be easily customised to the house style of any given news organisation via a simple stylesheet.

Try out the version used by the Bureau of Investigative Journalism.

View on Github.

Offshore secrets

A batch of stories, the result of collaboration with three Guardian journalists. With a leaked list naming 53,000 people holding accounts in offshore bank Keinwort Benson, the aim was to find noted public figures on the list.

To do this, names were scraped from UK Who's Who, the Electoral Commission's record of political donors, and the Sunday Times rich list. Given the many ways a name can be expressed and the poor quality of the leaked data, I built a small tool to compare different permutations of the scraped names together with the names in the leaked list. Running such comparisons on powerful Amazon EC2 machines sped up the process.

Read the first in the series, and then the last.

Reconcile

A tool written using Node for doing batch lookups against various online services. For example, it can be used to quickly convert a list of company names into a list of directors of those companies using the OpenCorporates API. Such data could then potentially be imported into a graph database such as Neo4j for analysis.

View on Github.

Twitter search and archiving tools

Who Said That: Get the results of a Twitter search as a spreadsheet.

Who Follows Who: Find who follows who else from a list of Twitter account names.

Who Says What: Get someone's entire Twitter timeline as a spreadsheet.

Scrapers

I've written a number of scrapers for various websites as part of stories or for other projects, including: