2012 CAR conference blog

From where? Validating data in the real world

By Anna Boiko-Weyrauch
@AnnaBoikoW

To understand your data, let’s go back to grade-school science class. Remember when you learned about the forest, and all the animals that call it home? The forest is a dynamic ecosystem. Your data is like a chimpanzee; it plays a role in the forest ecosystem.  Over time, the changes in the environment will affect your data/chimp.

In the session, “OK, but where did that data come from? Data validation in the digital age,” Managing Director at the Institute for Analytic Journalism J.T. Johnson said journalists need to remember that their data had a life before it was requested, downloaded, or scraped.

In the agencies where your data used to live, it was constantly evolving: people added to it, changed the names of fields, and even renamed the whole dang thing. Johnson said journalists should do a “biography” on their data and find out just how truthful it might be.

He said you should ask, “Have definitions changed over time?” “Did the agency change their collection methods?” “How is the data entered in  the first place: at a desk in City Hall, or by a worker who is paid per keystroke in Haiti?” Johnson said to remember that just because it came from an official agency, doesn’t mean your data is right.

Cheryl Phillips, The Seattle Times Data Enterprise Editor, gave some examples of inaccurate data, and how to deal with them. While reporting a map of bike accidents in Seattle, her team found that most accidents were recorded as taking place at noon. Well, the accidents didn’t really happen at that time, so they threw out that column when compiling this map. When Washington State spending figures showed an unusual dip, instead of just publishing the figures, The Seattle Times asked around and found out that the agency hadn’t included federal funds in their calculation.

Phillips and Jonson recommended keeping a baby-book of your data, and bringing the analyses in to the doctor for a check-up. Make a log of your steps during analysis (you might use Notesync), track the record numbers at every step and make a comment in Excel when you do calculations. Do a literature review on your data, consult with stats experts to confirm your findings.

When it comes time to publish the story, Johnson and Phillips said you can be transparent with your methods by using Google Fusion tables or publishing an article in the newspaper on your methodology. You data has a long history, let you readers get to know that side of the story, too.

Anna Boiko-Weyrauch is a graduate student at the University of Missouri School of Journalism.

Log in or register to comment on this story.