Sometimes how data changes can be more interesting than the data itself. For example, Wikipedia lets you see how a page has been edited - adding or cutting out certain bits of information. Using GitHub Actions, we can do something similar for any webpage. This session covers using Actions to regularly run a scraper, analysing the output, and identifying changes over time.
Fuzzy matching is a process for linking up names that are similar but not quite the same. It can be an important part of data-led investigations, identifying connections between key people and companies that are relevant to a story. This class covers how it fits into the investigative process, and includes a practical introduction to using the CSV Match tool I developed.
Ever relied upon an online source, only later to find it deleted or changed? This class covers how to get the most out of resources like the Wayback Machine – what they’re good for, and what they’re not. We also cover when and how to build your own private archives of web content.
Have you ever wondered how exactly your stories reach your readers? Ever wanted to know how to build a simple webpage? Or how to scrape information from the web? This session covers the principles of how web pages get onto your screen, and working with the two key web technologies of HTML and CSS. Dataharvest sessions taught with Rui Barros.
Where do you start using data in investigations? This training morning covers what data really is, developing a ‘data state of mind’ to spot opportunities, data sourcing including using unstructured data, hands-on scraping websites and interviewing datasets to get answers to your questions, as well as developing rigorous working practices that help you avoid mistakes.
You may have come across acronyms like HTTP and HTML, but what do they mean, and what does it matter? This class explains the concepts that underpin how the web works – which are simpler than you might think – as well as how you can use this knowledge to extract out the information you need, and understand how exactly your stories reach your readers.
Want to take your first steps with code but not sure how to begin? Or want to learn how code is being used in the newsroom and if it can help you and your team? This weekend workshop is an introductory primer to learning to code, showing recent story examples, explaining the fundamental concepts in programming, and demystifying the jargon.
Graph databases are incredibly useful to find connections or patterns within our data. This is a hands-on introduction to graph database Neo4j, showing examples of its use for investigative stories including the Panama and Paradise Papers, and teaching attendees how to build a graph of noteworthy individuals and match them with corporate data to see the networks involved.
Whether you find yourself collaborating on code, data, or prose, GitHub can work for journalists. This class covered what GitHub is, the benefits of using it, and how it is typically used both by people doing data analysis and by developers. Attendees were shown how to create a first repository and make pull requests.
How can code help you or your team with investigations? This two-hour session was a hand-holding hands-on introduction to programming, showing recent examples of published stories and demystifying the jargon. Attendees were guided through the tools needed, including text editors and an introduction to the command line. This later evolved into a weekend course.
Good communication between management and techies can make the difference between a website or app that makes money and one that loses customers, but the culture divide can be vast. This evening course covered working methods, jargon, and how to brief to avoid tension between the business parts of an organisation and mysterious, headphone-wearing coders.