Automatically identify potential story leads. Lets you create autonomous bots which poll data sources and run predefined data analysis. Results are then compared to the last time the bot ran – and any additions or deletions trigger an email alert.
Finds matches in two spreadsheets, optionally using various fuzzy-matching algorithms. Used by organisations including the Guardian, the Times, and news agency Irin who used it to identify a company the United Nations had a contract with who was also on its own sanctions list.
Enrich data by doing batch lookups against various online services. For example, quickly convert a list of company names into a list of directors of those companies.
Create simple static charts quickly – a tool for the non-technical. Can be easily customised to your organisation's house style using a simple stylesheet.
Produce pivot tables, much like those in Excel, but in the terminal.
Convert NDJson format data (such as the Companies House PSC data) into CSV. Data is streamed, so files much bigger than the available memory can be converted. Takes into account nested Json objects.
A batch of stories, the result of collaboration with three Guardian journalists. With a leaked list naming 53,000 people holding accounts in offshore bank Keinwort Benson, we found a number of public figures on the list though an automated fuzzy matching process I developed. Read the first in the series, and then the last.
Finds which Twitter accounts follow each other from a predefined list.
I've written a number of scrapers for various websites as part of stories or for other projects, including: