Tags : data cleaning

Records shed light on lax landlords, broken housing code system

The deaths of a young couple and a 4-year-old child in a Christmas Eve fire exposed significant problems with landlords renting dilapidated and dangerous properties in Columbus, Ohio.

In the immediate aftermath, city officials – acting on public outcry – made promises to fix its broken housing code system. But when the outcry died, so did those promises, prompting The Columbus Dispatch’s “Legacy of Neglect” series.

The four-day series produced such overwhelming results that the mayor, other city officials and their housing code enforcement unit immediately declared war on slumlords who, our reporting found, regularly rented houses with unsafe electrical systems ...

Read more ...

Dealing with inaccessible data and finding a needle in a million haystacks

By Jordan Gass-Poore’

Amanda Zamora of ProPublica answers questions during a panel on how to build a thorough data-based investigation with inaccessible, incomprehensible, and indeterminate data. Photo: Travis Hartman.

Leading journalism professionals spoke about the search for finding meaning in messy data during Thursday morning’s session “Finding the needles in a million haystacks: How to build a thorough data-based investigation with inaccessible, incomprehensible and indeterminate data.”

Amanda Zamora, ProPublica engagement editor, used the publication’s “Free the Files” initiative -- last fall’s attempt to make sense of thousands of political campaign ad spending documents from various U.S. markets ...

Read more ...

Integrity checks and simple data cleaning – the art of doubt

There is a saying about software engineering that could easily be applied to formatting data. The truism goes something like, "it’s like looking for black cats in a dark room that has no cats in it." And then, someone yells, 'I got one!'”

Well, Joe Kokenge of ProPublica is practicing animal control.

His presentation on integrity checks and simple data cleaning was peppered with useful bits of knowledge from his experience. “I wanted to put together a list of things that everyone can do, but there is no 10 bullet proof things to do to make sure your data ...

Read more ...

Centers for Medicare and Medicaid Services data reveals fraudulent offices

Our newspaper’s analysis of Centers for Medicare and Medicaid Services (CMS) data revealed that 131 providers in the Atlanta metropolitan area claimed a UPS Store mailbox as their medical office.

In turns out, Atlanta medical providers were not conducting medical procedures in mailboxes. Most of these providers filled out the federal paperwork incorrectly.  But dozens of others committed fraud by  using the UPS Store mailboxes as purported real offices. With a sham provider number and a UPS Store address, they could also provide what looked like a real physician’s approval for unnecessary or non-existent medical services and equipment ...

Read more ...

SBA disaster loan data updated in NICAR Database Library

In the wake of a disaster, individuals and business owners are often left with severely damaged property. Many turn for help to the Small Business Administration, which approves low-interest loans to help rebuild. For declared disasters in 2011 alone, the Small Business Administration approved over $1 billion in loans.

NICAR has updated the SBA database of these loans, which is now current through Sept. 2012. 

WHAT'S IN IT?
Disaster loans through the SBA are one of the primary forms of federal assistance for individuals and non-farm, private-sector businesses who have suffered losses. The data have information on the borrower ...

Read more ...

HMDA data updated in the Database Library

The Home Mortgage Disclosure Act (HMDA) data have just been updated in the NICAR Database Library -- and we'll help you turn it into a story.


WHAT'S IN IT?

This Act requires all banks, savings and loans, savings banks and credit unions with assets of more than $33 million and offices in metropolitan areas to report mortgage applications. Each loan record contains demographic information about loan applicants, including race, gender and income; the purpose of the loan (i.e. home purchase or improvement); whether the buyer intends to live in the home; the type of loan (i.e. conventional ...
Read more ...

Behind The Story: Analyzing and mapping salary data for small-town mayors

In August, reporter Kate Martin of the Skagit Valley Herald analyzed salary data for mayors across Washington state and ended up with a story about mayors from small towns in her coverage area -- Mount Vernon and Anacortes -- who had salaries on par with mayors from cities several times larger. In reporting the story, Martin first had to gather the data and then reconcile it with the realities of small-town civic duties.

The idea for the story arose through her typical reporting practices: each year, she requests salary data for all of the agencies that the Skagit Valley Herald covers.

“I ...

Read more ...

OSHA Workplace Safety data updated at NICAR Data Library

The Workplace Safety database from the Occupational Health and Safety Administration (OSHA) has just been updated in the NICAR Database Library.

WHAT’S IN IT?

This ten-table database holds information on workplace inspections performed by both federal and state OSHA offices in all states and U.S. territories, from 1972 to Oct 2011 – just under 4 million records.

OSHA classifies businesses by their location, name and North American Industry Classification System (NAICS), making it possible to analyze inspections, violations and accidents involving a certain occupation or those in a given region or city. The data also include details on the ...

Read more ...

From where? Validating data in the real world

By Anna Boiko-Weyrauch
@AnnaBoikoW

To understand your data, let’s go back to grade-school science class. Remember when you learned about the forest, and all the animals that call it home? The forest is a dynamic ecosystem. Your data is like a chimpanzee; it plays a role in the forest ecosystem.  Over time, the changes in the environment will affect your data/chimp.

In the session, “OK, but where did that data come from? Data validation in the digital age,” Managing Director at the Institute for Analytic Journalism J.T. Johnson said journalists need to remember that their data had ...

Read more ...

Fighting for open records in Spain

By Hilary Niles
@nilesmedia

Spain is an “information black hole,” journalist Mar Cabra said during the Against All -Spanish- Odds. She and software developer David Cabo are taking suggestions on how to fix that. 

Among the European countries with a population more than 1 million, Cabra said, Spain is the only one not to have freedom of information laws. On the technical side, David Cabo described what this looks like for people working with data (if they can get it):

  1. Administrations love PDF files and generally refuse to hand over raw data, text or Excel files
  2. There is little consistency ...
Read more ...