Converting pdfs to data

  • Event: 2019 IRE Conference
  • Speaker: Disha Raychaudhuri of NJ Advance Media
  • Date/Time: Friday, Jun. 14 at 10:15am
  • Location: River Oaks A
  • Audio file: No audio file available.

This class will cover basic approaches for getting text out of PDF documents using powerful and freely available tools. We'll introduce basic concepts and walk through tackling common challenges encountered with tricky PDF documents.

This session is good for: People who are unfamiliar with PDF-to-text tools or would like to learn how these tools can be used for extracting difficult text from images embedded in a PDF document.

Speaker Bios

  • Disha Raychaudhuri is a reporter on the data and investigations team at NJ Advance Media. Prior to joining NJ Advance Media, she was a reporter in India and Bangladesh, where she covered education and produced oral history projects. Follow her on Twitter @Disha_RC.

Related Tipsheets

No tipsheets have yet been uploaded for this event.