Cart 0 $0.00
IRE favicon

Using data to cover the racial inequality beat

By Alexis Allison

“The only thing that white people have worse than black people is osteoporosis,” Nikole Hannah-Jones, a staff writer at The New York Times Magazine, said during the “Investigating racial inequality” panel.

“That’s the amazing thing about America,” Hannah-Jones said. “Anything you want to measure, somebody’s tracking it based on race.”

Susan Smith Richardson of the Solutions Journalism Network began the session acknowledging that most journalists don’t disagree that racial inequality exists. The real question for her, for Hannah-Jones, for Teresa Córdova, director of the Great Cities Institute at the University of Illinois-Chicago, and for Ron Nixon, homeland security correspondent at the New York Times, was “how do we document that, how do we investigate that with authority, accuracy, context and impact?”

For Hannah-Jones, stories that merely describe the existence of racial inequality aren’t good enough.

“Anything you can measure, just about, black Americans are on the bottom … we need to ratchet up what we’re doing and show how these things are happening,” Hannah-Jones said.

She recommended journalists cover racial inequality like any other beat.

“Racial disparities come from real people who make real decisions and take real actions that harm real people,” Hannah-Jones said.

Understanding those disparities requires a great understanding of the law, whether it be the Civil Rights Act, Fair Housing Act, Home Mortgage Disclosure Act, Equal Credit Opportunity Act, Housing and Community Development Act or Equal Opportunity Employment Act. For the education world, it’s Brown v. Board, the Green Case, Milliken, Keyes.

Quite a few federal and state agencies house civil rights divisions tasked with enforcing civil rights provisions. Hannah-Jones said to find them, ask who is enforcing what and how, and reporters might just find a story.

Next, she recommends reporters seek data and reports, like those aggregated in Brown University’s Diversity & Disparities project — analyze them, interrogate them and become familiar with measurements such as the dissimilarity index, which quantifies segregation.

Amid these searches, Hannah-Jones said, reporters can get caught up in intent, which is difficult to prove. She said that a person doesn’t have to show intent to violate someone’s civil rights. Intent doesn’t matter — but consequences do.

“If Exxon Mobil has a spill in the gulf, you don’t actually care whether (the director) likes ducks or not,” Hannah-Jones said.

Finally, reporters can’t forget that they’re covering actual people with real problems.

“Remember that you’re not writing about data points,” Hannah-Jones said. “Every data point is a human being, often children ... make sure that you’re writing about humans and that you actually find some humans for your story.”

Córdova outlined the resources and research available through the Great Cities Institute; for example, when the institute publishes a report, they’ll follow up with a blog that inventories how media outlets used that specific report.

“You can access all of the reports at once on the website, as well as the journalism that’s come out of it,” Córdova said.

Nixon outlined four programs that have significant impact on communities of color and poor communities: Subprime loans, payday lenders, community development block grants and the EB5 visa program.

Ultimately, Nixon said the racial inequality beat touches all people.

“Writing about race is not just writing about black and brown people,” Nixon said. “White people have race, too … everything’s about race, particularly in this country.”

Alexis Allison is a journalism student at the University of Missouri.

By Virginia Ward

A series of small compromises between players and coaches often lead to high-risk operations within sports organizations. Syracuse University professor Jodi Upton, USA Today database editor Christopher Schnaars and Raycom investigative producer Jill Riepenhoff shared their experience investigating college and youth sports.  

From major infractions to Title IX investigations, journalists are uncovering scandal after scandal within college sports. When Upton reported with USA Today, she investigated college football coaches’ salaries.

Schnaars and his investigative team at USA Today spent the past 10 years creating a salary survey for coaches within power conferences. His tips for reading financial reports include:

Schnaars said when reporters request contracts, they commonly forget to ask for amendments and records of bonuses paid. More often than not, contracts contain perks packages filled with automobiles, country club memberships, vacations and private use of aircraft.

Upton said the FBI found 29 schools that were involved in bribery scandals, with one-third of those schools in the Power Five conferences.

“Don’t think for a minute that there is no story locally,” Upton said. “This is going on in, I guarantee you, every major university in the country.”

Riepenhoff said there are many untold stories within youth and high school sports. In her work, she’s found that media rarely investigate high school athletics. Because these organizations wield immense power over teenagers and collect large sums of cash from state tournaments, Riepenhoff said journalists should start exploring public records from these nonprofits.

Here are Riepenhoff’s tips for investigating sports organizations:

Virginia Ward is a journalism student at the University of Missouri.

By Meredith McGrath

Want to make sure your data is bulletproof and fact-checked so there aren’t any holes? Arm yourself with these tips from Tisha Thompson, investigative reporter for ESPN, and Sandhya Kambhampati, data reporter for ProPublica Illinois.

Get organized

When starting out, create a text file or a Word document and record basic information on the project. Make a file folder and name it as the story’s slug. Keep all your work related to the project in this folder, including PDFs of any emails you received from a FOIA officer. Save the raw file of the date here and make a copy of it. Don’t touch the original copy, so you’ll always have the pure data on hand.

Keep a data diary

While it may sound labor-intensive, keep a data log or journal and track the changes you make to your data set. This will help you reproduce your work, and if your data analysis is ever challenged, you’ll have a specific log of exactly what you did. Document how you clean your data before you start cleaning it.

Check for smelly data

When you first get a data file, check to see what’s wrong with it. There’s no such thing as a perfectly clean data set. Always look for holes. Check for totals hidden in the bottom of your Excel file and extra characters hidden in cells. Look for nulls and missing values. Are there any spelling mistakes?

Obtain a data dictionary

Ask the agency that gave you the data for a dictionary, which will define fields. Don’t assume you know what each field is. If the agency won’t give you one, call them out on it and let them know you need this for accuracy.

Slow down, and don’t take shortcuts

After you’re done interviewing a person, you know them inside and out. Know your data the same way. If something stands out as an outlier, be aware that it could be a hidden mistake. Check on this. Be meticulous.

Talk through your work

Find someone who doesn’t use data (maybe your mom or grandma) and show your work to them. Explain what you did and why. Talking out loud helps you identify mistakes. Ask them what they want to know about the data.

Check in with academics

Consult with experts and researchers about your data analysis and any gaps you find. Have them poke holes in your analysis. Keep going back to them, especially if your work is complex.

Refrain from overloading numbers

Don’t overload a reader with a lot of numbers in your story. There’s a point where the reader glazes over and doesn’t think them through. Pick the most important information to include.

Replicate your work

Overestimate the amount of time it will take you to check your work. Re-import your data from scratch. Mistakes can happen while importing data. Have your colleagues replicate your queries. The more eyes that can look over your work, the better.

Stand by what you publish

Be brave enough to stand up for what you put out for public consumption. Add a “nerd box” to your story on your site that explains how you obtained your data and summarizes your analysis.

Meredith McGrath is a journalism student at the University of Missouri.

By Tyler Wornell

The College Scorecard is a database with a treasure trove of data about higher education institutions, providing information about graduation rates, debt repayment rates and median income for career fields. There’s a wealth of story ideas sitting in the database, and knowing what data is there and how to use it can help you get started.

In Thursday’s panel “2,000+ data points on 7,000+ colleges: How to make stories & sense of the College Scorecard,” Sarah Butrymowicz from the Hechinger Report, Andrew Kreighbaum from Inside Higher Ed and Kim Clark from the Education Writers Association discussed the information available in the College Scorecard database and some key data errors to watch out for.

So, what exactly is the College Scorecard? It was created by the Department of Education during President Barack Obama’s administration to provide information about college outcomes and create a sort of college ranking system. It includes data on 7,000 colleges and is the largest-ever release of higher education data.

It shows earnings data for a typical graduate of each university, median debt held by graduates and repayment rates on graduates’ student loan debt, as well as other metrics. The data only represents students who received some sort of federal financial aid, which limits the sample size.

The data is great for comparing outcomes of schools across states and regions. It can answer questions such as, which colleges produce the biggest earners?

Additionally, the data can help raise further questions about institutions’ actions. Are they aware of their graduates’ struggles repaying loan debt compared to similar schools? What are they doing about it? Do they have information on whether graduates with certain majors struggle to repay loans? These are just some of the things to think about when analyzing the data.

There are flaws in the data to be aware of. Some fields are empty or suppressed based on enrollment at the college. If you’re looking at universities that are part of larger systems, the data gets clunky because of the codes by which schools are identified. For the debt data, systems that have multiple campuses are only tagged under one identifying code, making it impossible to disaggregate the campus-level data. Be aware of that if you see duplicates in the data.

With a little bit of cleaning, the College Scorecard is an easily accessible database that has stories ready to be written.

Tyler Wornell is a journalism student at the University of Missouri.

By Virginia Ward

In his CAR Conference session on demystifying data, Hadley Wickham said his job is to push R as far as it can possibly go.

The chief scientist at RStudio develops free tools to explore R, an open-source statistical language. He is also an adjunct professor of statistics at the University of Auckland in New Zealand, Stanford University and Rice University.

When first opening a database, Wickham said it’s common for the blinking cursor to intimidate users. His first tip in demystifying data science is to understand that programming languages are just languages with text. Not only can these texts be copied and pasted, but they can be reproduced and shared.

Wickham said the best way to learn a programming language is by joining an online community. Advanced and new users can work to troubleshoot together through organizations like https://rweekly.org or https://rladies.org.

Open-source programs like GitHub can enhance communication between developers and journalists, making the experience of learning R feel less daunting.

He said while learning a coding language for the first time can be difficult, it will pay off in the long run. Wickham encouraged data journalists to continue updating their knowledge because things keep changing.

“Embrace the change,” Wickham said. “You don’t want to end up stuck using technology from 30 years ago.”

It’s not just the individual pieces of R that give users the power, Wickham said. It’s the glue. R language can solve complex problems by combining simple pieces. These pieces can be learned along the way and help journalists solve problems in future projects.

While Wickham said R can be used to do just about anything, journalists can specifically use the language to tell great visual stories. Wickham hopes journalists can start asking what they want to do with their data rather than how to use their computers to work with it.

“The goal of R Studio is to have a positive impact on the world by creating open-source content,” Wickham said. “Giving people the tools to understand data is really important.”

Virginia Ward is a journalism student at the University of Missouri.

 

By Yue Yu

Kevin Collier from BuzzFeed News, Neena Kapur from the New York Times and Margot Williams from The Intercept shared experiences and tips at the CAR Conference on constructing a secure workstation while pursuing sensitive leads.

Collier talked briefly about the history of hackers working with journalists to produce big stories and getting jailed in the end. Although it doesn't completely wipe out the existence of hackers, Collier said, it creates a chilling effect on them.

Skepticism is necessary when working with hackers, Collier said. He shared his experience reaching out to a hacker named “Guccifer 2.0” who claimed he had sensitive information from the Democratic National Committee. Guccifer 2.0 was later identified as a persona created by Russian intelligence.

Treat hackers the same way you treat every source, Collier said, and ask them for technical details about how exactly they got the information they are sharing with you. Even if you don’t speak the language of data, getting help from a media-friendly expert would be helpful, he said.

Kapur, an information security analyst for the New York Times, offered tips for reporters to set up secure workstations and protect personal data.

A security breach is serious because it could compromise devices, personal data and source identity, and could create misunderstanding for government agencies, Kapur said. One way to reduce the risk is to use a separate device for research, she said. Using MiFi at a coffee shop, using gift cards for purchases and using encrypted USBs are all ways to minimize the risk of being hacked.

Setting up separate and inconspicuous accounts with a complex password can also help, Kapur said. Burner phones, VPNs and browsers that generate random IP addresses and allow anonymous login also increase the security of a work environment.

Williams elaborated on safe searching and using secure browsers. The tools she has used on her research computer include:

Yue Yu is a journalism student at the University of Missouri.

In response to high demand, we’re adding six hands-on classes today from 4:45-5:45 p.m. Originally, we had no labs scheduled during this time slot due to Lightning Talks. However, we want to be as responsive as possible to the number of attendees seeking hands-on training.

Here are additional hands-on sessions offered today, beginning at 4:45 p.m.:


Python 1: The fundamentals

Location: Lincolnshire

Trainer: Cody Winchester, IRE & NICAR

Description: An introduction to the Python programming language for absolute beginners. This class will cover basic fundamentals and syntax to prepare you for more advanced classes with a focus on data processing and analysis. This session is good for: People who are comfortable working with data in spreadsheets or database managers and want to make the leap to programming.


CARwash: Data cleaning in Excel

Location: Purdue

Trainer: Megan Luther, RayCom Media

Description: Dirty data lurk everywhere: in text files, spreadsheets, databases and PDFs. We'll walk you through some examples of the most common types of dirty data, point out telltale signs of data illness and explain how you can whip data into shape using some simple tools and methods. This session is good for: People with some experience working with data in columns and rows, in spreadsheets or database managers.


R 1: Intro to R and R Studio

Location: Northwestern / Ohio State

Trainer: Charles Minshew, IRE & NICAR

Description: Get a hands-on look at R, the powerful open-source programming language specifically designed to analyze data. In this class we'll explore Chicago municipal salary data, plus get a sneak peek at a couple of other cool things R can do in just a few lines of code. This session is good for: People who are comfortable working with data in a spreadsheet or database, and won't be afraid of typing into a command line.


Excel 1: Getting started with spreadsheets

Location: Great America

Trainer: Sarah Hutchins

Description: In this introduction to spreadsheets, you'll begin analyzing data with Excel, a simple but powerful tool. You'll learn how to enter data, sort it, filter it and conduct simple calculations like sum, average and median. This session is good for: Data beginners.


Excel 3: Pivot tables

Location: Michigan / Michigan State

Trainer: Mark Walker, IRE & NICAR

Description: A look at the awesome power of pivot — and how to use it to analyze your dataset in minutes rather than hours. This session is good for: Anyone familiar with formulas, sorting and filtering in Excel or another spreadsheet program.


SQL 1: Exploring data

Location: Indiana

Trainer: Denise Malan, IRE & NICAR

Description: Learning to manipulate data is a bit like learning a new language. Actually, it is a language, called structured query language (SQL). This session is an introduction to using SQL to zero in on your data by viewing slices and chunks of it and putting it into a useful order so you can spot the stuff you need to get started toward a story. We'll use SQLite and DB Browser, a free database manager. This session is good for: People with some experience working with data in columns and rows, in spreadsheets or database managers.

By John Sadler

Keeping a focus on your local coverage area can be difficult in the current information climate — idea generation, watchdogging and source cultivation all need to be juggled.

In Thursday’s panel “Putting your town under a microscope — and keeping it there,” John Diedrich of the Milwaukee Journal Sentinel, Matt Kiefer of The Chicago Reporter and Kate Howard of the Kentucky Center for Investigative Reporting shared tips for comprehensively covering your community.

First, double-check your archives. There’s nothing worse than diving into a topic to find it was already done a year before you got the job.

“You’re probably super smart — I’m sure your ideas are amazing — but there are also other people who may have had it, so make sure you’re looking first to see who else has done your beat well,” Howard said.

Flood the zone

Howard said an invaluable method of ensuring widespread coverage is spreading your reporting efforts across your community. Ask for employee lists and salary data so you can add context if crisis hits a certain agency or organization. Ask for emails, even if you don’t necessarily need them.

Record retention documents are also invaluable to request because they may give information on why records requests could be denied (and, therefore, may give you information on how to argue your case). Requesting lists of audits, both internal and external, completed and planned, will give you a sense of problems within the agency.

Reading meeting minutes and agendas is one way to stay up to date on things you may have missed, Howard said. She also stressed reading your competitors not with an eye for what you’ve been scooped on, but with an eye on what you can add to the story that’s now been brought to the public’s attention.

Fight for records

Kiefer, the data editor for the Chicago Reporter, said to make a courtesy phone call before the records request to try to clear up the format of the request, and reverse-engineer public forms to figure out what the records might look like.

If that fails, Kiefer gets creative. “If they give you pushback about exporting the data, like electronically, what I’ll often do is ask them what kind of software they use and then look up the user manual for that software and send it to them.”

Don’t neglect the human element

Follow up on your project — treat your stories like updating data sets and keep them relevant. And don’t hide from criticism.

“I think (engaging with critics) makes an impression on people that you own your stories and you’re willing to step up for them,” Howard said.

Diedrich, who worked on the Journal-Sentinel’s “Burned” investigation into drum reconditioning facilities (which relied heavily on whistleblowers), said scheduling time for source cultivation works wonders.

“I really try to carve out time to go out with sources and do source lunches on a regular basis,” he said. “The best place, for not just cops but really any beat, is to hang around a courthouse because it is just like a feeding frenzy of sources and news tips.”

John Sadler is a journalism student at the University of Missouri.

By Kelsie Schrader

For many, data journalism is a complex and daunting task. It requires time, skill and access to data and sources. Data stories on hard-to-access, marginalized communities, then, can often seem unapproachable.

The perceived difficulties of reporting on marginalized communities have resulted in a lack of data stories about and for non-white, non-elite communities. Three journalists discussed this issue at the CAR Conference and offered tips for how to develop a data story on a marginalized community.

Adriana Gallardo, an engagement reporter for ProPublica, and Anjeanette Damon, a government watchdog reporter for the Reno Gazette-Journal, have both reported on sensitive, undercovered populations. Gallardo covered maternal mortality in the U.S., which has the highest rates of maternal deaths of any developed country. Damon reported on prison deaths in Reno, which increased significantly after a new sheriff took over. Both stories came with challenges such as difficulty accessing data and sources, but positively affected marginalized communities and inspired conversation and change on local and national scales.

Eva Constantaras, a data journalist at Internews, used these stories as examples of quality data stories on marginalized communities. She offered three main steps to help journalists begin developing data stories on similarly marginalized populations.

Start with the background

Before jumping into a story, see what’s already been reported in other outlets. Stories on marginalized communities often come from breaking news stories. Use headlines in other papers as an opportunity to reveal the systemic issues underlying the breaking news of the day. Analyze what is and isn’t being covered. See what data is available on the subject. Discover what you can add to the conversation.

Form a hypothesis

After you’ve analyzed what already exists on the topic, identified what’s missing and examined available data, write your hypothesis. What do you think your story is about? What could be the causes of inequality and discrimination? Be careful to avoid common mistakes with hypotheses, such as statements that are too simple, too broad, too narrow or unable to be proved with data. 

Develop questions that will prove whether your hypothesis is true

Questions can fall into four categories: problem questions, impact questions, cause questions and solution questions.

Ultimately, Constantaras advocated for stories that have an impact. Stories should have information that marginalized communities can act on and use to engage policy makers. “These problems are everywhere,” she said. “Governments will brag about policies they say are helping people, and it our job to check it.”

Kelsie Schrader is a journalism student at the University of Missouri.

By Dariya Tsyrenzhapova

Location is a common thread that can lead a story and reveal meaningful findings to better serve a community. According to Victor Hernandez of Banjo, geodata also serves as a catalyst enlightening “a technological and a reporting breakthrough” to tell hidden or overlooked stories in underserved communities.

Joe Yerardi, a data reporter at the Center for Public Integrity, said GIS tracking and mapping can provide a leg up in reporting stories about the environment, natural disasters, health care and education. “Journalists can really better understand and master the early indicators around the location of a story,” Yerardi said.

A co-founder of Bloom, a geotagging platform for local news, Stephen Jefferson sees geodata as going beyond just a hashtag. It’s real data that provides “a bridge from digital to reality,” he said. While it offers important insights into the lives of local communities at large, newsrooms can also benefit from using geodata to better understand what their audiences need and want at a given time.

This so-called “editorial intelligence” can equip news organizations with ideas for relevant news coverage of communities they serve, Jefferson said. Metadata can yield compelling insights into user behavior: “Is the local community actually aware of the story, or are people viewing this outside of the community? For readers who are engaging the most with the story, where are they?”

The term “location intelligence” initially emerged from the business side as a strategy to target consumers. But that same idea could be extended to journalistic storytelling, said Amy Schmitz Weiss, an associate professor at San Diego State University. In the meantime, though, Weiss warned that geodata should not be taken at face value – trust, but verify. “If in doubt, don’t go with it,” she said. “There are still opportunities for manipulation. Be skeptical. Interviewing geodata is like interviewing any source.”

To bulletproof your results, Yeradi suggests posing these questions:

A tipsheet put together by the panelists provides a list of mapping tools that will help perform geospatial analysis and enhance story presentation, and also offers story ideas for using geodata to elevate the quality of enterprise reporting and storytelling.

Dariya Tsyrenzhapova is a journalism student at the University of Missouri.

109 Lee Hills Hall, Missouri School of Journalism   |   221 S. Eighth St., Columbia, MO 65201   |   573-882-2042   |   info@ire.org   |   Privacy Policy
crossmenu linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
My cart
Your cart is empty.

Looks like you haven't made a choice yet.