Max Harlow

I’m a data reporter at Bloomberg in London. Previously I was at the Financial Times.

I also organise Journocoders, a community of journalists and other people working in the media interested in developing technical skills for use in their reporting.

Clippings

  1. Labour diverts activists away from Lib Dem target seats

    Financial Times

    Rrevealed Labour’s electoral strategy in the runup to the 2024 general election which would likely maximise Conservative losses in the south of England. I scraped and analysed data from Labour’s volunteering website, which directs activists to where they should campaign.

  2. China’s machine tool exports to Russia soar after Ukraine invasion

    Financial Times

    Revealed that Chinese suppliers now dominate the trade in ‘computer numerical control’ devices vital to Moscow’s military industries. I matched up customs data on the imports of CNC machinery with data on companies sanctioned by the US Treasury.

More clippings

Projects

  1. CSV Match

    Finds fuzzy matches between CSV files. Based on Textmatch, a Python library I also maintain. Has been used by news organisations including the Wall Street Journal who used it to match up officials’ shareholding declarations with names of companies their agency had oversight of and the New Humanitarian who used it to identify a company the United Nations had a contract with who was also on its own sanctions list.

  2. Ship Overviewer

    Processes ship tracking data and generates a summary of where the vessel has been, and identifies any gaps. It can also highlight where data has changed, which can be used to spot where transponder data has been spoofed.

More projects

Talks

  1. Scraping the unscrapable: advanced approaches to deal with complex sites and evade anti-scraping systems

    Dataharvest 2024Mechelen, Belgium

    Scraped data is often the backbone of an investigation, but some websites are more difficult to scrape than others. This session covers best practices for dealing with tricky sites, including coping with captchas, using proxy and other scraping services, plus the tradeoffs and costs of these approaches.

  2. How to be a (better) data editor

    Dataharvest 2022Mechelen, Belgium

    As data journalism has become mainstream, more data editor positions have been created. But what makes a good data editor? In this panel we discussed what it takes to do the job effectively, the different things it can involve, and the different routes to getting there. With Marie-Louise Timcke, Jan Strozyk, Helena Bengtsson, Eva Belmonte, and Dominik Balmer, moderated by me.

More talks

Teaching

  1. Tracking changes with GitHub Actions

    Dataharvest 2024Mechelen, Belgium

    Sometimes how data changes can be more interesting than the data itself. For example, Wikipedia lets you see how a page has been edited - adding or cutting out certain bits of information. Using GitHub Actions, we can do something similar for any webpage. This session covers using Actions to regularly run a scraper, analysing the output, and identifying changes over time.

  2. Finding needles in haystacks with fuzzy matching

    Nicar 2024Baltimore, USA

    Fuzzy matching is a process for linking up names that are similar but not quite the same. It can be an important part of data-led investigations, identifying connections between key people and companies that are relevant to a story. This class covers how it fits into the investigative process, and includes a practical introduction to using the CSV Match tool I developed.

More teaching