Finds matches in two spreadsheets, optionally using various fuzzy-matching algorithms. Used by organisations including the Guardian, the Times, and news agency Irin who used it to identify a company the United Nations had a contract with who was also on its own sanctions list.
Enriches data, adding new columns based on lookups to online services. For example, taking a spreadsheet of company numbers and turning it into a list of directors of those companies.
Guest lecture on the data processing pipelines that powered the Financial Times’ coverage of the 2020 US election poll tracker and live results page.
Some of the most interesting datasets started life ‘unstructured’ – as documents, emails, web pages, images, videos, and other formats that look nothing like a spreadsheet. This session covered the challenges in extracting data from these formats, what tools are available, and approaches for verifying the results. Slides here.
Fuzzy matching has become an increasingly important part of data-led investigations as a way to identify connections between public figures, key people and companies that are relevant to a story. This class shows attendees how it typically fits into the investigative process, and includes a practical introduction to using the CSV Match tool I developed.
Want to take your first steps with code but not sure how to begin? Or want to learn how code is being used in the newsroom and if it can help you and your team? This weekend workshop is an introductory primer to learning to code, showing recent story examples, explaining the fundamental concepts in programming, and demystifying the jargon.