Max Harlow

I specialise in applying technology within news organisations – both to find stories and then to tell them.

Specifically, I am interested in using a combination of traditional reporting skills and software development to break stories that cannot be covered otherwise. It's an emerging area, and we are just at the start of understanding what can be achieved, as well as what methods and tools are required to run such projects.

I was previously technical lead at the Bureau of Investigative Journalism. Before that I worked as a software developer at the Guardian.

Projects

Command-line data analysis tools

CSV Match finds matches in two spreadsheets, optionally using various fuzzy-matching algorithms.

CSV Pivot produces pivot tables, much like those in Excel, but in the terminal.

CSV Bar converts a spreadsheet into a visual bar chart, for quickly understanding numerical data during analysis.

Newsagent

A tool for automatically identifing potential story leads. It lets you easily create bots to periodically gather data that you are interested in, and then alert you to anything that changes. For example, this could be scraping a list of names from a website and comparing them against another list of politically notable people.

View on Github.

Graphik

A easy-to-use tool for creating static charts quickly. The two aims were that it be so easy to use it would be next to impossible to create a bad or incorrect chart and that it would be easy to customise to the house style of any given news organisation.

Try out the version used by the Bureau of Investigative Journalism.

View on Github.

Offshore secrets

A batch of stories, the result of collaboration with three Guardian journalists. With a leaked list naming 53,000 people holding accounts in offshore bank Keinwort Benson, the aim was to find noted public figures on the list.

To do this, names were scraped from UK Who's Who, the Electoral Commission's record of political donors, and the Sunday Times rich list. Given the many ways a name can be expressed and the poor quality of the leaked data, I built a small tool to compare different permutations of the scraped names together with the names in the leaked list. Running such comparisons on powerful Amazon EC2 machines sped up the process.

Read the first in the series, and then the last.

Reconcile

A tool to add new columns to your spreadsheet based on lookups to online services such as OpenCorporates. For example, lets you take a spreadsheet of company names, automatically look up their registered company numbers, then using those numbers, produce a new list of directors at those companies. Such data could then potentially be imported into a graph database such as Neo4j for analysis.

View on Github.

Twitter search and archiving tools

Who Said That: Get the results of a Twitter search as a spreadsheet.

Who Follows Who: Find who follows who else from a list of Twitter account names.

Who Says What: Get someone's entire Twitter timeline as a spreadsheet.

Scrapers

I've written a number of scrapers for various websites as part of stories or for other projects, including: