If you fill out the "Forgot Password" form but don't get an email to reset your password within 5-10 minutes, please email logistics@ire.org for assistance.
By David Rodriguez
Data journalism is still new to me. Despite that, I decided to dive in head first and attend the recent NICAR conference in Chicago.
It was one of my best experiences as a journalist so far.
I was reluctant to attend after feeling ignored for being "just an intern" at another event last year. But my editor at The Investigative Reporting Workshop, where I now intern, pushed me to go, promising that NICAR would be different. I’m glad that I did.
I arrived late, hungry and eager and dashed into a session on digging deep for radio and podcasts. It was a preview of what was to come over the next few days.
That hour blew my mind.
The conference led me to think about alternative sources for data — even for hidden communities. Bernice Yeung, a reporter for Reveal from the Center for Investigative Reporting, where I interned last year, described how she had to build her own database to report on undocumented workers who experience rape or sexual assault on the job. She recommended that journalists be upfront about the limitations of their data. Mary Hudetz, an Associated Press reporter who covers Indian Country and law enforcement in Albuquerque, New Mexico, said that local nonprofits are often good sources of data not kept by government agencies.
I raced through the the crowded conference floors to make it to as many sessions I possibly could. I heard sessions on investigating hate when the data isn’t there, investigating racial inequalities, investigating immigration and 30 neat tools available for all journalists.
But it wasn’t just the sessions that impressed me. Seasoned data journalists took the time to talk with me about their experience starting in journalism. Despite how large the conference has grown over the years, what hasn’t changed is the willingness of attendees to help young journalists like myself.
Since the conference, I’ve been asked what my favorite sessions were. I’m still processing all the tips offered by some of the world’s best journalists. I’m eager for the next NICAR conference. I can’t wait to see what methods journalists do to produce amazing stories in a time when we all need investigative journalism.
David Rodriguez is an intern at The Investigative Reporting Workshop. Reach him at davidrodriguezreporting@gmail.com or on Twitter at @DaveeJonesLock.
IRE is making it easier than ever to continue learning after the CAR Conference in Chicago.
In addition to our tipsheets & links page, we're providing all the data from hands-on classes as well as install guides to help you set up your computer with the software used in the hands-on labs.
Here’s a list of all the resources available on our website:
Speakers can send their materials to tipsheets@ire.org to add to the collection.
By John Sadler
Hate crimes are on the rise. According to the FBI’s annual report, near the end of the succeeding year, 2016 was the second year in a row in which reported hate crimes rose.
There is reason to believe this data is not comprehensive, though. It relies on local police reports, many of which are incorrectly classified. And it doesn’t account for unreported crimes.
Data on hate crimes was one of the topics covered in Thursday’s “Investigating hate when the data isn’t there” CAR Conference panel. The panel featured Duaa Eldeib, an investigative reporter for ProPublica Illinois, Melissa Lewis, the data editor and a developer at The Oregonian, Ken Schwencke, a journalist and developer on ProPublica’s news apps team, and Nadine Sebai, an independent radio reporter in the Bay Area.
The panel focused on navigating the sensitive world of reporting on hate crimes. Two of the reporters, Sebai and Schwencke, have worked on ProPublica’s “Documenting Hate” project, a nationwide collaboration to report on crimes that would otherwise be ignored.
Dealing with law enforcement classifications
One of the problems you may face in dealing with reporting on hate crimes is that local police departments differ on what they classify as a hate crime, Schwencke said. Don’t take their word for it — ask more specific questions about how they classify these crimes.
“If they don’t have records, or they have few reports, that’s my favorite because I say, ‘does this mean you’ve had no hate crimes, or that you don’t check hate crimes reported to you?’ And that question tends to shake loose some reports.”
Schwencke said another method of finding hidden reports is to ask for records of hate crimes reported to your local law enforcement agency, compare them to what has been reported to the FBI and see if they match up.
He also recommended filing FOIA requests for documents with any form of discriminatory epithets. “It’ll possibly ruin a FOIA officer’s day, but those things get written into a report,” he said.
Discussing hate crimes with affected minorities
Sitting down with victims of hate crimes can be a difficult experience, and there are techniques to be sure your journalistic due diligence doesn’t give the impression to your source that you are doubtful of their story.
Separately interviewing others that may have witnessed the incident and comparing their accounts is one way to verify the details. Sebai used this solution with a story she was working on.
“(A man I interviewed) had a hate incident walking his dog with his wife across the street, and I interviewed his wife who supposedly was there during the incident … I looked at my transcript with him, and then I interviewed her and I said, ‘OK, does this story match up?,’” she said. The man’s wife verified his story.
Another way is to ask for a timeline of the incident at the beginning of the reporting process, and throughout the remainder of the process, double-check the timeline by asking specific questions.
Eldeib said one of the most useful excuses for double-checking and verifying stories without insulting the interviewee was blaming the process on your editor. “That kind of lessens any potential hostility,” she said.
John Sadler is a journalism student at the University of Missouri.
By Alexis Allison
If you Google the ingredients in sausage, you’ll quickly notice that little standardization exists between recipes. The same is true for the care and keeping of government data — whether at the city, county, or federal level.
Partway through the panel dubbed, “Inside the sausage factory: An inside look at government data making,” Hunter Owens looked at his fellow panelists and asked, “Should we go into why everything is broken?”
Owens is the senior data scientist for the City of Los Angeles. His fellow panelists, Josh Kalov and Rebecca Williams, each represent a slice of data expertise at another level of government: county and federal, respectively. Kalov previously worked as an open data consultant within Cook County in Chicago, and Williams is a digital services expert at the White House Office of Management and Budget.
Behind-the-curtain challenges in government data collection and management were the bedrock of the panel.
For one, the policymaking process does not typically include a conversation about how data that actually measures the policy will be collected, Owens said. Data can be cleft into departmental silos, Kolav added, and tracking outcomes with it is “in its infancy at the county level.”
In the federal space, data collection and management is expensive and difficult to standardize and maintain in a user-friendly format. Three types of federal data exist: statistical, which is heavily monitored; management support data, such as finance data on the newly minted USAspending.gov; and programmatic (such as geospatial data), which Williams called the “wild west” of federal data.
“Most government data sets, if not all, have some sort of quality issue,” Williams said.
More complications arise in the data procurement and oversight realms. Changes in data management can be traced to procurement contracts, which exist on the federal level at sites like IT Dashboard.
“When you’re following the money for data changes, looking at the IT contracts is a fast way to get there,” Williams said.
When it comes to oversight, audit reports (sometimes 60-page PDFs) abound — they’re a good place to start digging for what’s available, Owens said. Counties may even maintain an inventory of audit reports, Kolav said. Otherwise, there’s often not an easy, centralized inventory of data.
“There’s no master list of where data lives,” Owens said. That’s why reporters should always send data and records requests to multiple departments, rather than relying on one records clerk.
As with sausage-making, data management is messy. But it’s not all bad. Across the board, there’s a renewed effort toward enterprise data management, or the transparent creation and consistent maintenance of data that’s accurate and up-to-date — both for internal and external use.
The federal government recently updated its FOIA website to better streamline the records request process; the more reporters use it and provide constructive feedback, the better it will be, Williams said. Oversight.gov can help reporters track down what data exists as well as relevant contact information and quality issues.
And, the more records requests are filed at any level, the more reporters help make the case for the release of open data, Kolav said.
Owens ended with a request of his own.
“I cannot stress how much we are in need of folks who are willing to put in the effort,” Owens said. “That’s the pitch — you should all work for government one day.”
Click here to view the tipsheet for this session.
Alexis Allison is a journalism student at the University of Missouri.
By Yue Yu
What data sets can reporters get ahead of natural disasters? How can reporters cover disasters as they happen? What kind of follow-up leads should they chase?
Matt Dempsey from the Houston Chronicle, Omaya Sosa from Puerto Rico’s Center for Investigative Journalism and Lee Zurik from WVUE-TV in New Orleans broke down the chain of reporting at their CAR Conference panel.
Preparing for disasters
Working in Houston, Dempsey knows his hurricanes. Disasters like hurricanes are a test of the knowledge a newspaper has of the community it serves, and it’s best to know the community’s vulnerable points, Dempsey said.
He offered a list to tell reporters where to find information ahead of time to prepare for all kinds of disasters:
Flooding
Wildfires
Earthquakes
Hurricanes
Tornadoes
Chemical release or explosion
Blizzards
General disaster preparedness
During disasters — starting from scratch
Sosa had only a handful of volunteers to cover the local news when hurricanes Irma and Maria hit Puerto Rico last year. She didn’t even know whether her editor was alive. They’d lost WiFi, water, electricity and food.
They had no fancy databases, either. Sosa decided she would build her own.
It all started from making sense of the situation Puerto Rico was in. In the first 72 hours, official data suggested that the death toll was 16. Sosa interviewed two doctors, and they had already had nine deaths. That was when she noticed the official statistics didn’t match with reality.
She put her sneakers on, went out in the field and started interviewing sources on the ground, including doctors, police agents, rescue workers, funeral home directors, city officials and neighbors. Meanwhile, her team started questioning the official death toll data and pressed the government for information.
The team also collected missing persons reports and fliers, lost-and-found posts on social media, and more information from community leaders and radio. They even sent out Google forms for people to fill out when their family members died during the hurricanes.
These sources helped the team build a database that started out as an incomplete spreadsheet in Excel. After months of data collection, it became a list of all the uncounted deaths the government failed to include in the official data.
After sharing her own experience, Sosa offered a few more final tips:
Recovery spending — follow the money
Like Sosa, Zurik’s data story started out simple. He built a huge story about heavy government spending on parish school constructions after Hurricane Katrina, based on spreadsheets, pivot tables and invoices.
The bigger the disaster, the longer the money spending stretches, Zurik said, and the best way is to “stay on it.”
Following the money in Hurricane Katrina recovery, Zurik used a pivot table and found out that the most money was paid to contractors. In one school district, the contractor earned a rocket-high salary, with salaries of $185 per hour instead of the average of $55 to $60.
Following the money, Zurik eventually was able to reveal the discrepancy between earnings of HOV Services and of other construction companies. For tracing the money flow after a disaster, he suggested that the reporters should always keep some datasets in their laptops:
Data only tells part of the story, Zurik said, and letting data lead you to other documents, such as invoices in the case above, is also helpful.
Yue Yu is a journalism student at the University of Missouri.
By Jing Ren
Nick Penzenstadler from USA TODAY, Matt Drange from Forbes and Kim Smith at the University of Chicago Crime Lab discussed statistics and documents reporters covering guns should routinely gather at their CAR Conference panel.
Because of the nature of her work, Smith’s team has access to many administrative statistics on firearms. She said she and her team treat data very carefully, and she recommended journalists have the same attitude.
“We can’t just use the data for whatever purpose we want,” Smith said. She said her team always makes sure the analysis they are doing is within the scope of the agreements they have with the administration.
For illegal firearms, Smith pointed out that police data and statistics from the Bureau of Alcohol, Tobacco, Firearms and Explosives cannot capture the full supply chain of illegal gun markets. After getting ATF trace data and figure out information on first retail sales, she and her team usually conduct ethnographic interviews and jail surveys to help them analyze secondary transfers of illegal firearms.
“For the most part, people were being truthful about the circumstances of their arrests,” Smith said. Therefore, her agency normally tries to link the results from jail surveys to administrative data from the police department. Paying attention to filling the missing statistics is equally important to reporters, she emphasized.
Most of gun reporting is reactionary, Drange said: He thinks news organizations sometimes lack interest in covering guns on a more regular basis and simply react to a shooting incident after it happens.
He mentioned ShotSpotter, a real-time gunshot detection and alert system that gathers gunshots from a series of microphones throughout neighborhoods with high crime in different cities. To get the data set, reporters need to file an open records request and ask for gunshots recorded by the sensors.
Penzenstadler said he thinks firearms inspection reports from the ATF are underreported. Categories of statistics in the inspection records that are available via FOIA requests include inspection history, violations, narrative and corrective action.
Because of data restrictions, all the speakers agreed that sometimes journalists need to build up their own data sets. The statistics reporters can gather from such searches aren’t always complete, but are still valuable.
Drange recommended tracing gun sales through social media groups and online sellers like eBay. Before visiting these websites, he said it’s important for reporters to understand firearm jargon and how the online market communicate.
The three speakers also included possible records reporters should always try to get when covering firearms:
Jing Ren is a journalism student at the University of Missouri.
By Tyler Wornell
Tracking the flow of money in an election can be a crucial reporting tool for knowing who’s influencing elections and how. Tracking some of that money could prove difficult, though.
In Friday’s CAR Conference panel, “Wagging the Dog: Using campaign finance data to cover the midterm election,” Denise Roth Barber from the National Institute on Money in State Politics,
Ken Schwencke from ProPublica and Christopher Schnaars from USA Today discussed how to access Federal Election Commission filings and what data is available from them. These reports can be useful in discovering the key players who may be trying to influence an election, and can give you an idea of what type of support candidates have.
When looking at campaign finance data, there are few things to look for, including:
The campaign finance reports can be downloaded as CSV files directly from the FEC’s website, making it easy to obtain and analyze the data. You’ll need to watch out for a few things in the data, though. The forms and where they show information can vary depending on the candidate you’re looking at. Donations to a House candidate may appear on a different line of the form than they do on a form for a presidential candidate. Always double check the numbers that you’re pulling from the form.
ProPublica has a great tool called the ElectionBot, which provides real-time updates on campaign finance data. It also provides other information such as when candidates are trending on Google, when candidates delete tweets and vote activity from members of Congress. It takes all of the information from those FEC filings and compiles them into a real-time stream, making it easy to search and sort through information.
If you’re looking for campaign finance data that’s already been sifted and sorted, the National Institute on Money in State Politics is a great place to go. The group collects comprehensive campaign finance data for both federal and state elections and provides analysis based on that data.
The NIMSP collects data for direct contributions to federal, state and select local campaigns, independent spending for federal campaigns and selected states, and state lobbying expenditures in selected states. The group also provides contributions disclosure scorecards, independent spending disclosure scorecards and a competitiveness index, among other tools.
When looking at campaign finance data, it’s important to look out for “dark money.” This is money used to support a candidate that is funneled through nonprofit groups. It’s dubbed “dark money” because these certain nonprofits are not required to disclose their list of donors. Dark money is becoming a large issue in state elections, and the NIMSP also has a scorecard for states who allow dark money in campaigns.
Campaign finance data is one of many tools available to find stories about politics and elections. You just have to follow the money.
Tyler Wornell is a journalism student at the University of Missouri.
By Dariya Tsyrenzhapova
Only one-third of victims of sexual harassment ever report those incidents to the authorities, Bernice Yeung said. Yeung, a journalist with Reveal from The Center for Investigative Reporting and a member of award-winning teams that produced documentaries "Rape in the Fields" and "Rape on the Night Shift,” spoke as part of a CAR Conference panel on responsibly covering sexual misconduct.
Jason Hancock, the Capitol correspondent for the Kansas City Star, has broken several stories of sexual harassment allegations involving high-profile Missouri politicians and lobbyists. Two state employees resigned following his coverage. Hancock said he collected personal accounts of more than 40 women, and five of them went on the record, emboldened by each other’s bravery to speak up publicly.
It takes both time and trust before a victim agrees to open up. “This is not a situation when you want to go into an ultra-aggressive reporter mode,” said Ellen Gabler, an investigative reporter at The New York Times. Yeung said the approach is similar to “a very long, slow dance” that involves going through a third-party intermediary and requesting an off-the-record meeting to build trust.
“It’s not about convincing anyone, but it’s a matter of presenting them with an opportunity to share the story,” Yeung said. “It’s not about trying to sell them on that idea, but providing them with enough information about what and why are you doing it so that they could make an informed choice.”
Collecting information on this highly sensitive and personal topic is difficult and onerous, often taking months of reporting. Individual accounts drive the narrative thread, but to scope out larger trends and patterns, the panelists suggested studying respondent data from census polls. The National Survey of Family Growth includes a question about forced sex as a measure to track incidents of sexual assault in a family setting, Yeung said.
The Equal Employment Opportunity Commission, a federal agency that enforces laws against workplace discrimination, collects sexual harassment complaints and annually publishes summary statistics by gender, resolution result and size of monetary settlements. In 2017, the EEOC received more than 6,500 complaints, a consistent number for the past four years.
Giving the victims and the accused an equal chance to speak up is important in getting the facts right. While reaching out to the accused, Yeung said, “We should put in as much time as we do trying to seek out the victim’s perspective.”
Citing an example of Rolling Stone’s retracted article about a rape case at the University of Virginia, BuzzFeed News senior reporter Lam Thuy Vo said, “Sometimes, getting it wrong can do much more harm that it can actually help the cause.”
Dariya Tsyrenzhapova is a journalism student at the University of Missouri.
By Meredith McGrath
In order to hold officials accountable and shine light on injustices, journalists are digging deep into the intricate data surrounding the drug world and court systems.
Ed Silverman from STAT, Teri Sforza from the Orange County Register and Michael Braga from the Sarasota Herald-Tribune shared their stories of investigations, shed light on the hidden dangers in data and gave tips for finding stories of your own during their CAR Conference session.
Globally, getting access to medicines at affordable prices is an issue for patients. Drug companies are charging outrageous prices for life-saving drugs, and governments are wrestling with how to respond. Ed Silverman provided several websites to help journalists in their reporting on Big Pharma:
In Southern California, Teri Sforza and her colleagues found that broke and homeless addicts are worth hundreds of thousands of dollars to addiction treatment centers. Treatment facilities are recruiting addicts from across the country, bilking their insurance companies and sending them back into the streets without curing them. When investigating this in your own state, ask for data on citations issued to licensed substance abuse treatment facilities. Ask for data on complaints received and get data on deaths in licensed treatment facilities.
In California, many deaths happen in “non-medical detox.” Other states don’t allow this. Figure out what your state’s approach is.
Across the country in Florida, Michael Braga and his team found through data analysis that black defendants get harsher punishment for drug crimes than their white counterparts and far less access to treatment for their addictions.
The findings are being published in a series of installments and derived from exploration of two databases: the Offender Based Transaction System, which contains more than 80 million records, and a database from the Department of Corrections. OBTS showed that bias was rampant in Florida’s criminal justice system, but there were flaws in this analysis because the team wasn’t comparing apples to apples.
While it clearly showed that blacks are arrested and convicted more often than whites and spend longer in lockup, there was no way to make sure that black and white defendants were being treated equally. There was no perfect way to ensure that disparities weren’t caused by blacks having longer rap sheets or by having committed more serious crimes, so the team turned to the Department of Corrections data to help, which ended up being riddled with errors, but the team didn’t realize until after publication.
Ultimately, they learned from this project that data is seductive and dangerous. The bigger and more complicated it is, the more you want to use it. There’s no assurance that data provided by county clerks and government organizations is error-free. Take your time with data and don’t rush. Carefully examine what might be causing disparities. Through your reporting, you can get the criminal justice system in your state to be more fair and blunt.
Meredith McGrath is a journalism student at the University of Missouri.
By Jing Ren
Steven Rich from The Washington Post, Sarah Ryley from The Trace and Annie Waldman from ProPublica shared their insights on how reporters should request open records at the state and national level at their CAR Conference panel.
Waldman focused her presentation on clarifying the roles and functions of the Health Insurance Portability and Accountability Act (HIPAA) and the Family Educational Rights and Privacy Act (FERPA). She said public officials who are covered from these two laws often misuse them. Therefore, journalists need to fully understand what the legislation specifically applies to when they try to request data sets.
Any of the following personally identifiable information is protected under HIPAA when combined with health information, Waldman said:
On the other hand, HIPAA does not protect:
Waldman noted that HIPAA also allows for de-identified information to be released, so reporters can ask for such data sets as well. She also mentioned that reporters should check out state legislation and know the privacy restrictions before filing the data request.
It’s important for reporters to check that the data they’re requesting isn’t already online before they reach out to experts and negotiate, Ryley said. She also noted that it’s crucial for reporters to treat everything they write as legal documents, and all communications should be clear, concise, professional and formatted in the right way.
Journalists need to keep in mind the different groups of people they encounter, she said, because among these people are not only public record officers, but also attorneys and IT experts who code the data the media request.
Rich is adept at filling mass FOIA requests. Compared to separate FOIA requests, he thinks filling numbers of FOIA requests to different agencies and asking for the same information make a difference in data journalism.
However, reporters should keep in mind that even though different departments keep the same information, they may code it differently. Reporters need to make sure the records are standardized and that there is no issue with definitions in different systems. Journalists should keep an eye on missing fields as well, he said.
Jing Ren is a journalism student at the University of Missouri.
Looks like you haven't made a choice yet.