I’m a data reporter at Bloomberg in London. Previously I was at the Financial Times.
I also organise Journocoders, a community of journalists and other people working in the media interested in developing technical skills for use in their reporting.
Rrevealed Labour’s electoral strategy in the runup to the 2024 general election which would likely maximise Conservative losses in the south of England. I scraped and analysed data from Labour’s volunteering website, which directs activists to where they should campaign.
Revealed that Chinese suppliers now dominate the trade in ‘computer numerical control’ devices vital to Moscow’s military industries. I matched up customs data on the imports of CNC machinery with data on companies sanctioned by the US Treasury.
Finds fuzzy matches between CSV files. Based on Textmatch, a Python library I also maintain. Has been used by news organisations including the Wall Street Journal who used it to match up officials’ shareholding declarations with names of companies their agency had oversight of and the New Humanitarian who used it to identify a company the United Nations had a contract with who was also on its own sanctions list.
Processes ship tracking data and generates a summary of where the vessel has been, and identifies any gaps. It can also highlight where data has changed, which can be used to spot where transponder data has been spoofed.
Scraped data is often the backbone of an investigation, but some websites are more difficult to scrape than others. This session covers best practices for dealing with tricky sites, including coping with captchas, using proxy and other scraping services, plus the tradeoffs and costs of these approaches.
As data journalism has become mainstream, more data editor positions have been created. But what makes a good data editor? In this panel we discussed what it takes to do the job effectively, the different things it can involve, and the different routes to getting there. With Marie-Louise Timcke, Jan Strozyk, Helena Bengtsson, Eva Belmonte, and Dominik Balmer, moderated by me.
Sometimes how data changes can be more interesting than the data itself. For example, Wikipedia lets you see how a page has been edited - adding or cutting out certain bits of information. Using GitHub Actions, we can do something similar for any webpage. This session covers using Actions to regularly run a scraper, analysing the output, and identifying changes over time.
Fuzzy matching is a process for linking up names that are similar but not quite the same. It can be an important part of data-led investigations, identifying connections between key people and companies that are relevant to a story. This class covers how it fits into the investigative process, and includes a practical introduction to using the CSV Match tool I developed.