Finds matches in two spreadsheets, optionally using various fuzzy-matching algorithms. Used by organisations including the Guardian, the Times, and news agency Irin who used it to identify a company the United Nations had a contract with who was also on its own sanctions list.
Enrich data, adding new columns based on lookups to online services. For example, take a spreadsheet of company numbers and turn it into a list of directors of those companies.
Automatically identify potential story leads. Lets you create autonomous bots which poll data sources and run predefined data analysis. Results are then compared to the last time the bot ran – and any additions or deletions trigger an email alert.
Create simple static charts quickly – a tool for the non-technical. Can be easily customised to your organisation's house style using a simple stylesheet.
Produce pivot tables, much like those in Excel, but in the terminal.
Convert NDJson format data (such as the Companies House PSC data) into CSV. Data is streamed, so files much bigger than the available memory can be converted. Takes into account nested Json objects.
Finds which Twitter accounts follow each other from a predefined list.
I've written a number of scrapers for various websites as part of stories or for other projects, including: