Back to index

Talks

  1. Scraping the unscrapable: advanced approaches to deal with complex sites and evade anti-scraping systems

    Dataharvest 2024Mechelen, Belgium

    Scraped data is often the backbone of an investigation, but some websites are more difficult to scrape than others. This session covers best practices for dealing with tricky sites, including coping with captchas, using proxy and other scraping services, plus the tradeoffs and costs of these approaches.

  2. How to be a (better) data editor

    Dataharvest 2022Mechelen, Belgium

    As data journalism has become mainstream, more data editor positions have been created. But what makes a good data editor? In this panel we discussed what it takes to do the job effectively, the different things it can involve, and the different routes to getting there. With Marie-Louise Timcke, Jan Strozyk, Helena Bengtsson, Eva Belmonte, and Dominik Balmer, moderated by me.

  3. Investigative data journalism

    Journalism by Numbers, Birkbeck UniversityLondon, UK

    Guest lecture covering the origins of investigative data journalism, the nature of data in investigations, where it comes from, plus what code is and how it is used in the newsroom to do this kind of work.

  4. How to think about data

    Dataharvest Digital 2021Online

    This session explained concepts as well as covered tips, tricks, and traps to avoid when working with data. Together they can help you get more organised, better understand your data, ease the friction of collaborating with others, see new opportunities, and develop working practices that make it harder to be wrong. Slides here.

  5. Election data: from source to screen

    Visual Tools to Empower Citizens, University of GironaOnline

    Guest lecture on the data processing pipelines that powered the Financial Times’ coverage of the 2020 US election poll tracker and live results page.

  6. Lessons learned extracting data from documents

    Dataharvest Digital 2020Online

    Some of the most interesting datasets started life ‘unstructured’ – as documents, emails, web pages, images, videos, and other formats that look nothing like a spreadsheet. This session covered the challenges in extracting data from these formats, what tools are available, and approaches for verifying the results. Slides here.

  7. Stories from the command line

    Global Investigative Journalism Conference 2019Hamburg, Germany

    For those taking their first steps with data and code, the command line is essential. There are also many useful command line based applications – understanding it opens the door to these power tools. This session covered how it works, the basic commands and concepts, and some of the tools which can be useful in data investigations, including story examples. Slides here.

  8. Why code?

    CIJ Summer Conference 2019London, UK

    How code is being used in newsrooms to find stories? If you’re just starting out, where should you start, and how should you approach learning such skills? Panel with Helena Bengtsson and Niamh McIntyre, moderated by Leila Haddou.

  9. How can code help your journalism?

    CIJ Summer Conference 2018London, UK

    An introduction to how code is used in the newsroom, with recent story examples, explaining the fundamental concepts and demystifying the jargon. We also guided attendees through the most common programming languages, and gave a roadmap to deciding which to pursue. Slides here.

  10. Don't fear the robots: five reasons investigative journalists should welcome automation

    Dataharvest 2018Mechelen, Belgium

    This talk explained the ways automation is already being used in newsrooms, why the coming wave of automation is not a threat, and how we can embrace this new technology to improve the quality of investigative reporting at a time of shrinking newsroom resources. Slides here.

  11. Building a culture of knowledge sharing in journalism

    Dataharvest 2017Mechelen, Belgium

    As data becomes increasingly important to journalism reporters need to keep their skills up to date. However, newsrooms have less budget for training and conferences than ever before. This was a lightning talk on how Journocoders tries to solve these problems. Video here. Slides here.

  12. The power of fuzzy: finding connections in a messy world

    CSV Conf 2017Portland, USA

    Like our reality, our data is often messy. Finding meaningful connections between such datasets often means using fuzzy matching algorithms. This was a high-level look at some of the most commonly used algorithms, their pros and cons, and how they are used in practice. Slides here.

  13. Data for investigations: beyond Excel

    Food Research Council Workshop 2015London, UK

    Though much data-led reporting is done in Excel, some can only be reported using other tools. This talk ran through a few stories which took different approaches. Write-up here. Slides here.

  14. Muckrakers and tech makers

    Dataharvest 2015Brussels, Belgium

    Communication difficulties are common between journalists and technologists. This was a talk with an investigative reporter on our experiences working together at the Guardian.

  15. Digital journalism: the news and tech

    Web We Want Festival 2014London, UK

    Panel discussing how news organisations have been challenged and transformed by the web, and how this has changed the way they interact with readers. Video here.

  16. Using the Guardian's Content API

    MediaHackDays 2014Aarhus, Denmark

    An introduction to the Guardian's Content API, which lets developers build their own applications using Guardian content. Also judge on a prize for the best use of the API. Write-up here.