Back to index

Talks & teaching

  1. Scraping the unscrapable: advanced approaches to deal with complex sites and evade anti-scraping systems

    Scraped data is often the backbone of an investigation, but some websites are more difficult to scrape than others. This session covers best practices for dealing with tricky sites, including coping with captchas, using proxy and other scraping services, plus the tradeoffs and costs of these approaches.

    1. Dataharvest 2026Mechelen, Belgium
    2. Dataharvest 2025Mechelen, Belgium
    3. Dataharvest 2024Mechelen, Belgium
  2. Finding needles in haystacks with fuzzy matching

    Fuzzy matching is a process for linking up names that are similar but not quite the same. It can be an important part of data-led investigations, identifying connections between key people and companies that are relevant to a story. This class covers how it fits into the investigative process, and includes a practical introduction to using the CSV Match tool I developed.

    1. Nicar 2025Minneapolis, USA
    2. Nicar 2024Baltimore, USA
    3. Nicar 2023Nashville, USA
    4. Nicar 2022Atlanta, USA
    5. Nicar 2022Atlanta, USA
    6. Nicar 2021Online
    7. Nicar 2020New Orleans, USA
    8. Nicar 2019Newport Beach, USA
    9. Nicar 2019Newport Beach, USA
    10. CIJ Summer Conference 2018London, UK
    11. Dataharvest 2018Mechelen, Belgium
    12. Nicar 2018Chicago, USA
  3. Tracking changes with GitHub Actions

    Sometimes how data changes can be more interesting than the data itself. For example, Wikipedia lets you see how a page has been edited - adding or cutting out certain bits of information. Using GitHub Actions, we can do something similar for any webpage. This session covers using Actions to regularly run a scraper, analysing the output, and identifying changes over time.

    1. Dataharvest 2024Mechelen, Belgium
    2. Dataharvest 2023Mechelen, Belgium
  4. Time travel for beginners: how to create and use web archives

    Ever relied upon an online source, only later to find it deleted or changed? This class covers how to get the most out of resources like the Wayback Machine – what they’re good for, and what they’re not. We also cover when and how to build your own private archives of web content.

    1. Global Investigative Journalism Conference 2023Gothenburg, Sweden
    2. CIJ Summer Conference 2023London, UK
    3. Dataharvest 2022Mechelen, Belgium
  5. Web basics: how the web works, and how to scrape it

    Have you ever wondered how exactly your stories reach your readers? Ever wanted to know how to build a simple webpage? Or how to scrape information from the web? This session covers the principles of how web pages get onto your screen, and working with the two key web technologies of HTML and CSS. Dataharvest sessions taught with Rui Barros.

    1. CIJ Summer Conference 2023London, UK
    2. Dataharvest 2023Mechelen, Belgium
    3. Dataharvest 2022Mechelen, Belgium
  6. How to be a (better) data editor

    As data journalism has become mainstream, more data editor positions have been created. But what makes a good data editor? In this panel we discussed what it takes to do the job effectively, the different things it can involve, and the different routes to getting there. With Marie-Louise Timcke, Jan Strozyk, Helena Bengtsson, Eva Belmonte, and Dominik Balmer, moderated by me.

    1. Dataharvest 2022Mechelen, Belgium
  7. Investigative data journalism

    Guest lecture covering the origins of investigative data journalism, the nature of data in investigations, where it comes from, plus what code is and how it is used in the newsroom to do this kind of work.

    1. Journalism by Numbers, Birkbeck UniversityLondon, UK
  8. An introduction to data for investigations

    Where do you start using data in investigations? This training morning covers what data really is, developing a ‘data state of mind’ to spot opportunities, data sourcing including using unstructured data, hands-on scraping websites and interviewing datasets to get answers to your questions, as well as developing rigorous working practices that help you avoid mistakes.

    1. Birn Summer School 2021Mlini, Croatia
  9. How to think about data

    This session explained concepts as well as covered tips, tricks, and traps to avoid when working with data. Together they can help you get more organised, better understand your data, ease the friction of collaborating with others, see new opportunities, and develop working practices that make it harder to be wrong. Slides here.

    1. Dataharvest Digital 2021Online
  10. Scraping from scratch

    You may have come across acronyms like HTTP and HTML, but what do they mean, and what does it matter? This class explains the concepts that underpin how the web works – which are simpler than you might think – as well as how you can use this knowledge to extract out the information you need, and understand how exactly your stories reach your readers.

    1. Dataharvest Digital 2021Online
  11. Election data: from source to screen

    Guest lecture on the data processing pipelines that powered the Financial Times’ coverage of the 2020 US election poll tracker and live results page.

    1. Visual Tools to Empower Citizens, University of GironaOnline
  12. Lessons learned extracting data from documents

    Some of the most interesting datasets started life ‘unstructured’ – as documents, emails, web pages, images, videos, and other formats that look nothing like a spreadsheet. This session covered the challenges in extracting data from these formats, what tools are available, and approaches for verifying the results. Slides here.

    1. Dataharvest Digital 2020Online
  13. Introduction to code for journalists

    Want to take your first steps with code but not sure how to begin? Or want to learn how code is being used in the newsroom and if it can help you and your team? This weekend workshop is an introductory primer to learning to code, showing recent story examples, explaining the fundamental concepts in programming, and demystifying the jargon.

    1. CIJ CoursesLondon, UK
    2. CIJ CoursesLondon, UK
    3. CIJ CoursesLondon, UK
    4. CIJ CoursesLondon, UK
    5. CIJ CoursesLondon, UK
  14. Exploring networks with graph databases

    Graph databases are incredibly useful to find connections or patterns within our data. This is a hands-on introduction to graph database Neo4j, showing examples of its use for investigative stories including the Panama and Paradise Papers, and teaching attendees how to build a graph of noteworthy individuals and match them with corporate data to see the networks involved.

    1. Global Investigative Journalism Conference 2019Hamburg, Germany
    2. CIJ Summer Conference 2019London, UK
    3. Dataharvest 2019Mechelen, Belgium
    4. CIJ Summer Conference 2018London, UK
    5. Dataharvest 2018Mechelen, Belgium
    6. CIJ Summer Conference 2017London, UK
    7. Dataharvest 2017Mechelen, Belgium
  15. Stories from the command line

    For those taking their first steps with data and code, the command line is essential. There are also many useful command line based applications – understanding it opens the door to these power tools. This session covered how it works, the basic commands and concepts, and some of the tools which can be useful in data investigations, including story examples. Slides here.

    1. Global Investigative Journalism Conference 2019Hamburg, Germany
  16. Why code?

    How code is being used in newsrooms to find stories? If you’re just starting out, where should you start, and how should you approach learning such skills? Panel with Helena Bengtsson and Niamh McIntyre, moderated by Leila Haddou.

    1. CIJ Summer Conference 2019London, UK
  17. How can code help your journalism?

    An introduction to how code is used in the newsroom, with recent story examples, explaining the fundamental concepts and demystifying the jargon. We also guided attendees through the most common programming languages, and gave a roadmap to deciding which to pursue. Slides here.

    1. CIJ Summer Conference 2018London, UK
  18. Don't fear the robots: five reasons investigative journalists should welcome automation

    This talk explained the ways automation is already being used in newsrooms, why the coming wave of automation is not a threat, and how we can embrace this new technology to improve the quality of investigative reporting at a time of shrinking newsroom resources. Slides here.

    1. Dataharvest 2018Mechelen, Belgium
  19. GitHub for journalists

    Whether you find yourself collaborating on code, data, or prose, GitHub can work for journalists. This class covered what GitHub is, the benefits of using it, and how it is typically used both by people doing data analysis and by developers. Attendees were shown how to create a first repository and make pull requests.

    1. Nicar 2018Chicago, USA
  20. Code for journalists

    How can code help you or your team with investigations? This two-hour session was a hand-holding hands-on introduction to programming, showing recent examples of published stories and demystifying the jargon. Attendees were guided through the tools needed, including text editors and an introduction to the command line. This later evolved into a weekend course.

    1. CIJ Summer Conference 2017London, UK
  21. Building a culture of knowledge sharing in journalism

    As data becomes increasingly important to journalism reporters need to keep their skills up to date. However, newsrooms have less budget for training and conferences than ever before. This was a lightning talk on how Journocoders tries to solve these problems. Video here. Slides here.

    1. Dataharvest 2017Mechelen, Belgium
  22. The power of fuzzy: finding connections in a messy world

    Like our reality, our data is often messy. Finding meaningful connections between such datasets often means using fuzzy matching algorithms. This was a high-level look at some of the most commonly used algorithms, their pros and cons, and how they are used in practice. Slides here.

    1. CSV Conf 2017Portland, USA
  23. Data for investigations: beyond Excel

    Though much data-led reporting is done in Excel, some can only be reported using other tools. This talk ran through a few stories which took different approaches. Write-up here. Slides here.

    1. Food Research Council Workshop 2015London, UK
  24. Muckrakers and tech makers

    Communication difficulties are common between journalists and technologists. This was a talk with an investigative reporter on our experiences working together at the Guardian.

    1. Dataharvest 2015Brussels, Belgium
  25. Digital journalism: the news and tech

    Panel discussing how news organisations have been challenged and transformed by the web, and how this has changed the way they interact with readers. Video here.

    1. Web We Want Festival 2014London, UK
  26. Using the Guardian's Content API

    An introduction to the Guardian's Content API, which lets developers build their own applications using Guardian content. Also judge on a prize for the best use of the API. Write-up here.

    1. MediaHackDays 2014Aarhus, Denmark
  27. How to work with web developers (and what we wish you knew)

    Good communication between management and techies can make the difference between a website or app that makes money and one that loses customers, but the culture divide can be vast. This evening course covered working methods, jargon, and how to brief to avoid tension between the business parts of an organisation and mysterious, headphone-wearing coders.

    1. Guardian MasterclassesLondon, UK
    2. Guardian MasterclassesLondon, UK