The IRE Resource Center is a major research library containing more than 23,250 investigative stories — both print and broadcast. Add to that more than 3,000 tipsheets from our national conferences on how to cover specific beats or do specific stories and you have a resource that no reporter or editor should be without. These stories and tipsheets are searchable online or by contacting the Resource Center directly (573-882-3364 or rescntr@ire.org) where a researcher can help you pinpoint what you need. Browse or search the tipsheet section of our library below. Logged-in members can view the tipsheets free online:
Search results for "data extraction" ...
-
Human-assisted reporting: How to create robot reporters in your own image
Welch goes over how to, via programming, extract information from a continuous data stream, and the potential stories that may come out of it.
Tags: programming; computer-assisted reporting; CAR; data analysis
-
Environmental analyses for any newsroom
Lucas and Golden walks through the steps to analyze and extract information from environmental data.
Tags: environment; environmental analysis; environmental data; pollution; data; epa
-
Creative Uses of Webscraping
Todd gives insight in the field of data scraping (programmatic extraction of human-readable output) and specifically goes into the possibilities of webscraping.
Tags: data; collection; scraping; webscraping
-
PDF utilities
This document provides information on a number of PDF utilities which enable the one to extract data from pdfs.
Tags: pdftotext; pdftk; ImageMagick; tesseract; qpdf;
-
Lighting Talk: Point, Click EXTRACT!
Gebeloff provides a list of options available that provide PDF table conversion.
Tags: PDF; data; table conversion; PDF2XL; ABLE2EXTRACT; XPDF
-
Quick shortcut for using PDFTOTEXT
Milholland discusses shortcuts to converting a PDF to text through the use of the PDFTOTEXT application.
Tags: data extraction; data; PDF to textl; converting PDFs
-
Web Scraping
Heath details web scraping. "'Scraping' means extracting data from websites. Thing about any site that lets you search a data set. How much better would it be if you could get the whole thing?" Heath explores the benefits and reasons for scraping, and what tools you'll need to accomplish the task.
Tags: web scraping; data set; database; excel; scripting; Perl; PHP; Ruby; VB; C#; Open Kapow; Yahoo Pipes
-
Mining the Web for Data, Part 2
This tipsheet is a very detailed, in-depth guide to scraping the web for data. The authors walk you through various scenarios involving different web situation and explain the best way to extract data from websites.
Tags: data; web scraping; data analysis; programming; data mining
-
Programming skills: An introduction to Perl and regular expressions
Perry's PowerPoint covers Perl (Practical Extraction and Report Language. With a disclaimer that you can't learn Perl in an hour, Perry's presentation aims to provide: a basic understand of the philosophy behind Perl; introduction to Perl syntax; introduction tp basic regular expression syntax; common Perl idioms; common errors; and a list of where to find help.
Tags: Perl; syntax; Practical Extraction and Report Language; UNIX; regular expressions; parsing data;
-
(IPUMS) Integrated Public Use Microdata Series
This PowerPoint presentation details the uses of IPUMS, the Integrated Public Use Microdata Series. This microdata can be used when examining census statistics to "build your own tables to answer questions the summary file tables don't answer." PUMS was the Public Use Microdata Sample, which had issues including a \"tricky hierarchal format.\" IPUMS, from the University of Minnesota, is an online data extraction engine that is available free on the Web.
Tags: PUMS; IPUMS; Integrated Public Use Microdata Series; Public Use Microdata Sample