The Yale Law Journal

VOLUME
126
2016-2017
Forum

Is Open Data the Death of FOIA?

21 Nov 2016
Beth Simone Noveck

For fifty years, the Freedom of Information Act (FOIA)1 has been the platinum standard for open government in the United States. The statute is considered the legal bedrock of the public’s right to know about the workings of our government. More than one hundred countries and all fifty states have enacted their own freedom of information laws.2 At the same time, FOIA’s many limitations have also become evident: a cumbersome process, delays in responses, and redactions that frustrate journalists and other information seekers.3 Politically-motivated nuisance requests bedevil government agencies.4 With over 700,000 FOIA requests filed every year, the federal government faces the costs of a mounting backlog.5

In recent years, however, an entirely different approach to government transparency in line with the era of big data has emerged: open government data. Open government data —generally shortened to open data—has many definitions but is generally considered to be publicly available information that can be universally and readily accessed, used, and redistributed free of charge in digital form.6 Open data is not limited to statistics, but also includes text such as the United States Federal Register, the daily newspaper of government, which was released as open data in bulk form in 2010.7

To understand how significant the open data movement is for FOIA, this Essay discusses the impact of open data on the institutions and functions of government and the ways open data contrasts markedly with FOIA. Open data emphasizes the proactive publication of whole classes of information. Open data includes data about the workings of government but also data collected by the government about the economy and society posted online in a centralized repository for use by the wider public, including academic users seeking information as the basis for original research and commercial users looking to create new products and services. For example, Pixar used open data from the United States Geological Survey to create more realistic detail in scenes from its movie The Good Dinosaur.8

By contrast, FOIA promotes ex post publication of information created by the government especially about its own workings in response to specific demands by individual requestors. I argue that open data’s more systematic and collaborative approach represents a radical and welcome departure from FOIA because open data concentrates on information as a means to solve problems to the end of improving government effectiveness. Open data is legitimated by the improved outcomes it yields and grounded in a theory of government effectiveness and, as a result, eschews the adversarial and ad hoc FOIA approach. Ultimately, however, each tactic offers important complementary benefits. The proactive information disclosure regime of open data is strengthened by FOIA’s rights of legal enforcement. Together, they stand to become the hallmark of government transparency in the fifty years ahead.

The Impact of Open Government Data

An open data movement has taken hold in many countries, and notably in the United States. On his first day in office in 2009, President Obama called for a shift to open data when he signed the Memorandum on Transparency and Open Government. The Memorandum declared that “[i]nformation maintained by the Federal Government is a national asset,” called for the use of “new technologies to put information about [agency] operations and decisions online and [make it] readily available to the public,” and encouraged “executive departments and agencies [to] solicit public feedback to identify information of greatest use to the public.”9 Later the same year, the Office of Management and Budget (OMB) published a directive explaining to federal agencies that transparency was not limited to information about the workings of government, but also included “high value” information, which they defined as information that “can be used to increase agency accountability and responsiveness; improve public knowledge of the agency and its operations; further the core mission of the agency; create economic opportunity; or respond to need and demand as identified through public consultation.”10 In other words, the directive suggested that the public benefits of certain kinds of data, such as locations of reported crimes or weather information, or information that could foster new businesses, justify the effort to publish the data openly.

High value data, therefore, goes beyond the data that the government produces about its own operations (e.g., budgets, salaries of public officials, or names of those who attend meetings at the White House) and includes data and facts that public institutions collect in their role as regulators (e.g., workplace safety and injury records, airplane flight on-time logs or doctors’ prescribing records) or information gathered in their capacities as scientific research organizations (e.g., weather data or information about the human genome). This open data policy—part of a broader set of open government mandates—set in motion a process, enabled by new technology, for agencies to inventory and disclose the information they collect and to move toward the proactive publication of classes of information in their entirety, such as spending records, visitor logs, air and water quality indicators, and measures of hospital infection rates.

Since 2011, seventy countries, including the United States, have signed onto the Open Government Partnership Declaration. The Declaration, which copies the U.S. framework, calls for governments to commit to “pro-actively provide high-value information, including raw data, in a timely manner, in formats that the public can easily locate, understand and use, and in formats that facilitate reuse.”11 Fifteen countries have adopted the International Open Data Charter, which goes further by calling for making government data open in digital formats by default and for investing in the creation of a culture of openness.12

Since the organization that collects and maintains information is not always in the exclusive position to use the information well, the ability of third parties to access and use open data is what makes the concept truly transformative. Open data facilitates reuse by public institutions as well as entrepreneurs, activists, researchers, and students who can analyze, visualize, and combine open government data—both data the government collects about its own workings and data it collects in its role as regulator or researcher—to achieve impressive results, including the creation of useful tools in the public interest.

For example, the Fire Department of San Ramon, California created the PulsePointapp to enable the public to save lives in medical emergencies using a real-time feed of open data from emergency 911 calls. Effective CPR administered immediately after a cardiac arrest can potentially double or triple a victim’s chance of survival, but less than half of victims receive that immediate help because they have to rely upon the arrival of small number of official first responders, such as paramedics or police, to administer aid. PulsePoint takes this function typically performed exclusively by government officials and decentralizes it by enabling trained citizens to serve as first responders.13

In communities where emergency calls are published as open data, PulsePoint is able to notify registered and trained CPR users—off-duty doctors, nurses, police, and trained amateurs, for example—near the victim to come to the aid of stricken neighbors. PulsePoint sends them a text message in capital letters: “CPR Needed.” The app has been used in eleven thousand cases to activate citizen responders in response to cardiac emergencies.14

One consequence of opening government data, when there is a desire to do so, is the ability to improve transparency and accountability in government. In the United States, at the federal level, open data facilitated the creation of USASpending.gov, a set of online tools for exploring the federal budget.15 At the local level, open data drives various “open checkbook” websites;16 in Austria, for example, over eight hundred municipalities have made their spending data more transparent and easy to visualize online.17

But open data is not limited to information about spending. Opening local government data about public works in Zanesville, Ohio revealed a fifty-year pattern of discriminatory water service provision and led to a successful civil rights lawsuit against Zanesville in 2008.18 While access to clean water from the City of Zanesville water line spread throughout the rest of Muskingum County, residents of the predominantly African-American area of Zanesville, Ohio were able only to use contaminated rainwater or to drive to the nearest water tower in order to truck water back to their homes.19

Although shining a light on government is important, the potential impact of open data is far broader than FOIA’s paradigmatic goals of improving government transparency.20 Open data can improve the accountability of private organizations and institutions. For example, several states are moving to release data collected on doctors about their opioid pain medication prescribing patterns.21 By showing doctors their own practices in comparison to that of others to create an incentive for behavior change, states like Arizona are already showing a ten percent reduction in opiate prescriptions and a four percent reduction in overdose deaths in those counties that used open data in this fashion over those counties that did not.22

Open data, however, does more than promote accountability. It also enables the creation of tools to improve consumer choice and citizen decision-making. The data collected by the government from universities, for example, has been transformed into a calculatorThe College Scorecardto help parents and students make more informed financial decisions about their college education.23

Sometimes the benefits of open data ripple out beyond the immediate motivations for its publication. For instance, while publishing government contracts can boost public integrity, it can also catalyze greater business competition and entrepreneurship.24 A study by McKinsey estimates that open data will end up creating $3 billion in economic value.25 The Open Data 500 tracks thousands of companies worldwide that use open government data as a core business asset. For example, the New York-based start-up Aidin shows patients released from the hospital the best available post-acute care providers, using information published by the Department of Health and Human Services.26

We are undoubtedly living in the middle of an open data revolution, which is only in its infancy. One thousand people attended the 2015 International Open Data Conference, an event dedicated to showcasing the impact of opening governing data, and one thousand five hundred in 2016.27 In the United States, an Open Data Executive Order (2013) augments the earlier Open Government Memorandum and Directive by effectively declaring that open data will become the new normal for the United States government.28 Recent domain-specific legislation and regulation include promises of more real-time open data, as in the DATA Act, which calls for publishing federal government spending data as open data in standardized formats.29Legistation passed by Congress foran evidence-based policy commission in 2016 shows a growing interest in basing policies on empirical research and data, and is fueling demand for more access to administrative information of all kinds, including the data that agencies collect about companies, workplaces, the environment, and the world beyond government.30 In criminal justice, there is a national policy initiative to open up access to administrative data from criminal justice and related agenciesnot only about those agencies’ practices but also about crime more broadly—to enable empirically informed innovation and reform.

Open Data and FOIA: Competitors or Partners?

The explosion of newly available data coupled with the availability of technologies to make use of these large quantities of data—so-called Big Data—gives rise to the question of whether open data, as a policy framework, should supplant the FOIA system. In practice, open data: a) promotes broad scale transparency; b) simplifies the disclosure process; c) requires publication in reusable electronic formats; and, above all, d) focuses on disclosure of information collected by government as regulator and researcher, not exclusively on data created by government about its own workings. As a result, open data promotes a collaborative rather than an adversarial process and catalyzes co-creation of solutions by government working with the public. At the same time, FOIA indispensably complements open data by a) providing a legal right of action to compel disclosure; b) suggesting the kinds of data to prioritize releasing; and c) disclosing who is using what data how.

Open data encourages proactive disclosure and publication in open formats on a centralized data portal such as Data.gov. FOIA, by contrast, imposes limited affirmative obligations on agencies,31 emphasizes ex post disclosure, and, in most cases, releases information on paper only to the individual requestor32 rather than to the general public. “The process of drafting and submitting FOIA requests and then waiting for the agency’s response,” writes David C. Vladeck, “is a breeding ground for delay and cynicism over the Act’s efficacy.”33Under FOIA, there is no centralized “reading room”: each agency must create its own website and post only those documents that are the result of three or more FOIA requests.34 To be sure, there is a certain incentive for some requestors, such as the journalist looking for a scoop, to be the only one to see the information that results from a FOIA request. But that benefit is far outweighed by the gains to be had from broad information disclosure to a larger audience who can make use of the information.

In the era of big data technologies, when information storage is cheap and plentiful, if the goal is to promote greater transparency, it is hard to imagine why we should continue to invest in the legal framework and its attendant practices for demanding data after the fact when, instead, we can build the platforms and policies to ensure proactive and prospective publication of government information in reusable formats online. Notwithstanding frequent calls by Attorneys General35 for more rapid disclosure under FOIA, the largely paper-based process, whereby the seeker has to file a written request specifying the desired information, is inherently fraught with delays and backlogs.

By contrast, with open data, technology helps to transform information transparency from a legal principle into a practical reality. In essence, whereas FOIA is a legal regime, open data is a set of technology standards and practices. The need to populate Data.gov drives agencies to automate the process of inventorying all of the data sets stored on their servers without needing to know who created or currently maintains the data, thereby accelerating the process of discovering the data sets to which Data.gov can then point. When Data.gov launched in May 2009, it made forty-seven datasets searchable; that number skyrocketed to 186,000 in 2016.36

Even more important than the volume of raw information is the positioning of open data as a collaborative rather than an adversarial process. FOIA is contentious, giving rise to litigation when the government refuses a request and perversely reinforcing a culture of closed-door governing. Lawyers and specially trained FOIA officials acculturated in this cat-and-mouse process dominate the FOIA process. Chief Innovation, Chief Data, or Chief Information Officers, in contrast, are tasked with identifying and posting open data, which is structured for computability and reusability and which by its nature is designed for collaboration. The Commerce Department’s Data Service website not only makes the federal department’s information more searchable; it provides training, tutorials, and guidance precisely in order to help the public use its data to generate useful insights, products, and tools.37

As a practical matter, the open data disclosure process affords various new efficiencies and improvements. For example, why take time to black out sensitive passages and redact personally identifiable information, when government can invest in technologies for automatically de-identifying Social Security numbers when storing information so it can be automatically released? In an open data process, managed by technologists, there is greater attention to the application of new technology, which can enhance the publication of appropriately public information while protecting the privacy of personally identifiable information. In place of lawyers who have to eyeball documents manually, open data creates the impetus to hire computers scientists to write software that strips Social Security numbers and other sensitive personal data automatically, as well as social and data scientists who can extract value from large scale administrative data sets and translate those findings into improvements for government and society.

Similarly, by requiring publication of digital records, open data enables publication of information that was “born digital,” in contrast to FOIA, which has traditionally focused on distribution of paper copies of records. Why be content with receiving paper when information—especially information that is created digitally in the first place—could and should be disseminated digitally to enable more meaningful and productive reuse? In June 2016, President Obama signed into law amendments to the Freedom of Information Act striking the phrase ”for public inspection and copying” and inserting ”for public inspection in an electronic format.”38

In light of FOIA’s contentiousness and its narrow focus on information about government’ workings, open data would seem to be a natural evolution from, and improvement over, FOIA.39 However, there are strong arguments in favor of blending the two approaches and drawing on the best of FOIA to improve how open data works, treating one regime as a supplement rather than a substitute.

First, where gaps exist in the open data regime, FOIA provides the legal right of action to fight for the data that government refuses to disclose when it should. The data to be found on Data.gov is not always the contentious nuclear secrets, budget models, or national security information most in demand by journalists, activists, and researchers. In 2013, transparency activist Carl Malamud had to use FOIA to request nine nonprofit tax returns from the IRS. Although nonprofit returns are required to be disclosed by law and the filers submitted those returns electronically, the IRS wanted to send Malamud image files of the returns. He sued for the digital originals and won.40 The significance of the decision is that the IRS will now make all electronically filed nonprofit tax returns—about 60% of those filed since 2011—digitally downloadable as open data.41The IRS did not release the nonprofit tax returns as downloadable data until Malamud filed suit and won.

Despite seven years of open data policy in the United States, FOIA requests have risen from under 600,000 to well over 700,000 during the same period.42 It is hard to know whether that increase is due to greater secrecy on the part of the Obama administration, decreased efficiency on the part of its FOIA officers or, in fact, a growing culture of openness simply spurring more requests as a result of the coexistence of the two regimes. But, even discounting for spurious requests (the JFK assassination and Area 51 conspiracy theories remain popular subjects for requests), open data has not yet prompted the online release of everything that needs to be published either to improve government transparency or effectiveness. As a last resort, the legal right of action to sue for information under FOIA is essential.

Second, FOIA doctrine articulates clear priorities. FOIA emphasizes the release of information about the workings of government, such as the activities of government officials, and focuses on one-off demands by requestors who have to guess what information the agency has and then contend with denials and delays in the processing those requests. Open data, meanwhile, requires inventorying all data and creates an opportunity for reasoned debate between the public and the agency about what to publish, with what frequency, and in what formats. Open data focuses on publishing whole classes of information in bulk. Given that data often has to be cleaned and formatted to make it suitable for downloading as open data, creating open data requires an investment of both time and money to release. Hence, prioritization is important. Drawing upon the lessons of FOIA—notably, the emphasis on disclosure of data by government about government —open data policy, too, should evolve to articulate normative guidance to agencies about what should be published online and when.

However, those priorities should take account of the unique properties of open data. Because open data lends itself to analysis, visualization, and the creation of algorithms to drive decision-making, efforts to publicize the calendars of cabinet secretaries or the salaries of government officials onlinevery much FOIA prioritiesare misplaced for open data because, absent gross malfeasance, such disclosures will not drive immediate changes in how government operates. Naked disclosures alone are not enough; rather, change depends upon the actions that are taken following the publication of data.

Open data can be such a powerful tool because the proactive disclosure is often—although not always—accompanied by a plan for how to use the data. For example, Brazil has long published government spending data—an initiative that has led to a decrease in certain public expenditures because of the ability to analyze the entire corpus of credit card records. At the same time, the publication of spending data has not prevented the country’s endemic corruption.43 Clearly, the Brazilian transparency website is better than nothing, but the example highlights that it is as important to act upon the data as to publish it. Whether disclosed as a result of a FOIA petition or an open data policy push, steps must be taken to make use of the data to enable more evidence-based practices.

In some areas of government operations, open data can immediately improve how services are delivered both because the data is available and because incentives are aligned between those inside and outside of government to act upon the data to change how government works. For example, opening up the entire class of data about food-borne illnesses in Chicago prompted the creation of an algorithm to re-organize how the city allocates scarce resources to doing restaurant inspections.44 For open data, like for FOIA, priorities should be articulated, and should emphasize improvement of policies and services.

Third, FOIA requests are counted and logged through a process of agencies reporting to the Department of Justice, which then reports FOIA statistics on the FOIA.gov website. By contrast, open data turns on a data fire hose. One of the unintended consequences of making data freely available for reuse without restriction or registration is that it is very difficult to trace the downstream uses of such data. Although there is mounting evidence of the positive impact of open data, it is hard to know with certainty how valuable any given dataset is. We can be fairly certain that agencies post a lot of data on Data.gov that no one wants, knows exists, or uses. Thus, ideally, the hi-tech world of open data should be married to the accounting practices of FOIA. For example, the Learning Registry is a collaborative effort between the Departments of Education and Defense and the White House to create “data about data” and make it possible to learn who is using open learning content on the web.45

Conclusion: The Way Forward

Of course, FOIA and open data are similar. They both emphasize disclosure to the public of information created or collected by the government. Because open data has its roots in the data processing technologies of the big data era, however, most open data projects focus on the analysis of facts collected by the government in its role as regulator and researcher. As a result, open data substitutes a utilitarian rationale for transparency in place of a justification based on moral obligation.

In other words, open data is rooted in a theory about government effectiveness whereas FOIA is grounded in a theory of governmental legitimacy. The narrative underlying FOIA—though hardly borne out in practice—suggests that shining the light of transparency on the workings of government—knowing the calendar of a cabinet secretary or what the postman earns—will de facto lead to a decrease in corruption, a resulting improvement in government accountability, and therefore greater legitimacy on the part of public institutions. In the FOIA narrative, this is an almost automatic process; hence, the law enshrines the right to ask for information for any reason or no reason at all but focuses on individual requests for specific facts and data. As a result, writes David Pozen, FOIA creates an “entitlement program with no eligibility criteria . . . [that] establishes nondisclosure as the default norm.”46

Controversially and by contrast, the open data narrative is largely unrelated to accountability. Instead, open data is a tool for use by diverse stakeholders to make government more effective at solving problems but also to solve social problems, create jobs, and generate entrepreneurship.47 Open data emphasizes the instrumental value of information as an asset for evidence-based decision-making, service delivery, and economic growth. The Department of Education posts data about the costs of college tuition so that the agency can create a financial aid calculator. Municipalities release a feed of 911 call data so that PulsePoint can operate a citizen first responder system.48 Hence certain classes of informationwhat Cass Sunstein calls output transparencyshould be disclosed a priori whether or not people request it to improve government effectiveness.49

Some view this utilitarian calculus of the usefulness of open data for solving problems as a dilution of FOIA’s sui generis goals of transparency for its own sake. But the open data movement has progressed far enough worldwide to demonstrate that it is not going away. Arguably, the greatest advances in transparency in recent years have come either as a result of extra-legal disclosures such as WikiLeaks or the Snowden revelations, or as a result of the publication of open data.

As open data’s scope continues to expand and its culture becomes a comfortable part of government agencies’ behavior, policymakers have to re-evaluate the role of FOIA. FOIA officials currently convene as a government-wide community, but they need to talk and work more closely with Chief Information Officers, Chief Data Officers, and Chief Innovation Officers to ask and answer how the FOIA process should take advantage of the technological affordances of open data and, above all, to embrace more of the collaborative nature of open data practices, which emphasize problem solving. At the same time, those responsible for the implementation of open data stand to learn from fifty years of FOIA practice about the urgency of responsive information disclosures for achieving good government and greater value to society. Evolving our legal and policy framework for public information collection and publication will engage the legal profession, technologists, and policymakers—as well as the general public that has benefitted from the open data movement and from FOIA.

Beth Simone Noveck is the Florence Rogatz Visiting Clinical Professor of Law, Yale Law School, Jerry M. Hultin Global Network Professor of Engineering, New York University and director of The Governance Lab. Her most recent book is Smart Citizens, Smarter State: The Technologies of Expertise and the Future of Governing (Harvard University Press, 2015). She tweets @bethnoveck. The author is grateful to David Pozen and Michael Schudson of Columbia University for organizing an excellent conference on FOIA’s fiftieth anniversary in June 2016 at which many of the ideas in this Essay were presented and to David Schulz of the Media Freedom and Information Access Clinic at Yale Law School for help refining them.

Preferred Citation: Beth Simone Noveck, Is Open Data the Death of FOIA?, 126 Yale L.J. F. 273 (2016), http://www‌.yale‌law‌journal‌.org/forum/is-open -data-the-death-of-foia