The New Data Journalism

Below is a presentation I gave at “Data Driven Connecticut 2014: Progress and Possibilities | Moving from Data to Action: A Connecticut Data Collaborative Conference” on Friday, November 24, 2014 at Yale School of Management, Evans Hall, New Haven, Connecticut. I joined a panel of professional journalists to explore new developments and trends in data-driven reporting and engage the audience in a discussion of where the field is going. My prepared script is below the slides.

Data has always played a role in journalism. When I was a college student 20+ years ago, journalists called it COMPUTER ASSISTED REPORTING. Now all reporting involves computers.

Data used to be hard to get. News organizations rarely collected their own data sets, so journalists were dependent upon public agencies, academics and think tanks. It was the journalists’ job to find out what data public agencies had in their possession and use FREEDOM OF INFORMATION requests get access to that information.

Getting the data often was an arduous process. Cleaning and parsing the data was equally difficult and time-consuming.

But now, thanks to the internet and the push for OPEN DATA, that scenario has changed for the most part. Data is more abundant than ever, and much of it is publicly available online.

The software used to format, analyze and visualize data quickly is more abundant, too.

We still have to file FOI requests. We could argue that government agencies now post data online so they DON’T have to talk to journalists.

But journalists still have to ask questions and do lots of reporting to understand the data.

Why should the public care whether journalists are using data?

Public opinion is often driven by news coverage. Storytelling is a central part of news.

Journalists have an important role in shaping how the public perceives certain issues by nature of their large audiences and choosing what information to frame and amplify.

Data journalism begins in one of two ways: either there is question that needs data to answer, or a dataset in need of questioning.

To trust the quality of the data, means cleaning it. Cleaning typically takes two forms: removing human error; and converting the data into a format that is consistent with other data you are using.

For example, datasets will often include some or all of the following: duplicate entries; empty entries, bad formatting, multiple names for the same thing. UConn vs University of Connecticut.

Context: who gathered it, when, and for what purpose? How was it gathered? (The methodology). What exactly do they mean by that phrase, or combination or numbers and letters?

Combine: Put together two or more datasets with a common data point. That might be a politician’s name, for example, or a school, or a location.

Data provides the foundation for a story.

Journalists think about data as clues to a story.

In the past month, my journalism students at UConn have analyzed specific data sets from the University’s Office of Institutional Research and created data visualizations from those sets, that reveal trends and these story ideas:

  • Why is biological sciences now the largest and most popular Liberal arts degree at UConn?
  • Why has the number of international students at UConn has risen 278 percent in the last 10 years? Last year, half of UConn’s international students came from China.
  • Why is enrollment among African American women at UConn outpacing African American men?
  • Journalists can start with that trend or outlier and then ask more questions.

    Journalists armed with data have the ability to highlight issues that may have slipped under the radar of policymakers.

    Anecdotal evidence backed by a data set is more powerful than either one alone.

    Data serves at the foundation for new news organizations.

  • In 2008, Nate Silver, a relatively unknown baseball statistician, correctly predicted every Senate race and all but one state in the presidential election. His blog fivethirtyeight became very popular. It was connected to the New York Times. Last year, he shopped it around and was picked up by ESPN.
  • Vox – data & explanatory journalism
  • Propublica – A non profit investigative journalism organization gives away the data sets it obtains through FOI, and sells other data sets it worked hard to crunch.
  • Crowdsourced Data & Sensor Journalism

    That’s when a journalism organization asks the public to participate and help in collecting data. Crowdsouring has inherent issues of accuracy, but can still be helpful, especially during natural disasters. For example, during the Connecticut Snow-pocalypse of October 2011, the Courant crowdsourced the public to find out the locations of local gas stations that were open. Another example was the Google Person Finder database during the devastating Super Typhoon Haiyan that hit the Philippines.

    If data journalism means the analysis of and reporting on data sets that already exist, sensor journalism goes a step further. News organizations and individual journalists are using sensor technology to create their own real-time data and then report on it.

    What sensors do best is detect characteristics of the physical world — properties such as light, heat, sound, pressure, vibration, air quality and moisture.

    Journalists brainstorm stories they could tell if only they had some data. If they can collect their own data with relatively little money and effort and experiment outside the traditional tools journalists work with – building their own little sensory devices

    Journalists acting like scientific researchers can take human observations and impressions and make them specific, so that they might be used for comparisons.

    Example: By using data from toll transponders – a public sensor – a news organization in Florida was able to prove police officers were speeding. They could determine the officer’s speed based on how quickly he/she reached the next toll.

    Ethical Considerations

    Pursuit of the news is not a license for arrogance or undue intrusiveness.

    [PICTURE: LO HUD GUN MAP – Journal News Westchester, NY]

    Realize that private people have a greater right to control information about themselves than public figures and others who seek power, influence or attention. Journalists must weigh the consequences of publishing or broadcasting personal information.

    Journalism can overstate data conclusions to the public. Most journalists aren’t statisticians, so we need experts to help us.

    Even worse, other sites and analysts may intentionally misrepresent data in order to confuse the public, using the news outlet as their megaphone.

    It is possible to find data saying almost anything. Data analysis must be performed to determine what the “truth” is or to make predictions, but this analysis has assumptions built into the model.

    News organizations should make their assumptions publicly available and easily understandable.

    “We’re very close to being able to gather data on what people do most of the time,” so choice becomes paramount. “When you can survey everything, what do you report?”

    Journalists must be aware that who collects the data affects the data collection methods, the analysis of the data, and the accessibility of the data.