Finds matches in two spreadsheets, optionally using various fuzzy-matching algorithms. Used by organisations including the Guardian, the Times, and news agency Irin who used it to identify a company the United Nations had a contract with who was also on its own sanctions list.
Enriches data, adding new columns based on lookups to online services. For example, taking a spreadsheet of company numbers and turning it into a list of directors of those companies.
Some of the most interesting datasets started life ‘unstructured’ – as documents, emails, web pages, images, videos, and other formats that look nothing like a spreadsheet. This session covers the challenges in extracting data from these formats, what tools are available, and approaches for verifying the results.
For those taking their first steps with data and code, the command line is essential. There are also many useful command line based applications – understanding it opens the door to these power tools. This session covers how it works, the basic commands and concepts, and some of the tools which can be useful in data investigations, including story examples.
Fuzzy matching has become an increasingly important part of data-led investigations as a way to identify connections between public figures, key people and companies that are relevant to a story. This class shows attendees how it typically fits into the investigative process, and includes a practical introduction to using the CSV Match tool I developed.
Want to take your first steps with code but not sure how to begin? Or want to learn how code is being used in the newsroom and if it can help you and your team? This weekend workshop is an introductory primer to learning to code, showing recent story examples, explaining the fundamental concepts in programming, and demystifying the jargon.