The Yale Law Journal

November 2021

A Relational Theory of Data Governance


abstract. This Feature advances a theoretical account of data as social relations, constituted by both legal and technical systems. It shows how data relations result in supraindividual legal interests. Properly representing and adjudicating among those interests necessitates far more public and collective (i.e., democratic) forms of governing data production. Individualist data-subject rights cannot represent, let alone address, these population-level effects.

This account offers two insights for data-governance law. First, it better reflects how and why data collection and use produce economic value as well as social harm in the digital economy. This brings the law governing data flows into line with the economic realities of how data production operates as a key input to the information economy. Second, this account offers an alternative normative argument for what makes datafication—the transformation of information about people into a commodity—wrongful. What makes datafication wrong is not (only) that it erodes the capacity for subject self-formation, but instead that it materializes unjust social relations: data relations that enact or amplify social inequality. This account indexes many of the most pressing forms of social informational harm that animate criticism of data extraction but fall outside typical accounts of informational harm. This account also offers a positive theory for socially beneficial data production. Addressing the inegalitarian harms of datafication—and developing socially beneficial alternatives—will require democratizing data social relations: moving from individual data-subject rights to more democratic institutions of data governance.

author. Academic Fellow, Columbia Law School. Many thanks to the members of the 2020 Privacy Law Scholars Workshop, the Information Law Institute Fellows Workshop at NYU Law, and the Digital Life Initiative Fellows Group at Cornell Tech for their careful and generous comments. Additional thanks to Ashraf Ahmed, José Argueta Funes, Chinmayi Arun, Yochai Benkler, Elettra Bietti, Julie Cohen, Angelina Fisher, Jake Goldenfein, Ben Green, Lily Hu, Woodrow Hartzog, Aziz Huq, Amy Kapczynski, Duncan Kennedy, Issa Kohler-Hausmann, Michael Madison, Lee McGuigan, Lev Menand, Christopher Morten, Helen Nissenbaum, Amanda Parsons, Angie Raymond, Neil Richards, Thomas Schmidt, Katherine Strandburg, Thomas Streinz, Mark Verstraete, Ari Ezra Waldman, and Richard Wagner. An early version of this work was presented in 2018 at Indiana University’s Ostrom Workshop.


In recent years, the technology industry has been the focus of increased public distrust, civil and worker activism, and regulatory scrutiny.1 Concerns over datafication—the transformation of information about people into a commodity—play a central role in this widespread front of curdled goodwill, popularly referred to as the “techlash.”2

As technology firms mediate more of our daily lives and grow more economically dominant, the centrality they place on data collection raises the stakes of data-governance law—the legal regime that governs how data about people is collected, processed, and used. As data becomes an essential component of informational capital, the law regulating data production becomes central to debates regarding how—and why—to regulate informational capitalism. There is broad consensus that current data-governance law has failed to protect technology users from the harms of data extraction, in part because it cannot account for this large and growing gap between data’s de jure status as the subject of consumer rights and its de facto status as quasi capital.3

Data-governance reform is the subject of much debate and lively theorizing, with many proposals emerging to address the status quo’s inadequacy.4 This Feature evaluates the legal conceptualizations behind these proposals—in other words, how proposed reforms conceive of what makes datafication worth regulating and whose interests in information ought to gain legal recognition. How datafication is conceptualized shapes and constrains how the law responds to datafication’s effects. If data-governance law is inattentive to how data production creates social benefits and harms, it will be poorly equipped to mitigate those harms and foster data production’s benefits.

This Feature’s core argument is that the data-collection practices of the most powerful technology companies are aimed primarily at deriving (and producing) population-level insights regarding how data subjects relate to others, not individual insights specific to the data subject. These insights can then be applied to all individuals (not just the data subject) who share these population features.

This population-level economic motivation matters conceptually for the legal regimes that regulate the activity of data collection and use; it requires revisiting long-held notions of why individuals have a legal interest in information about them and where such interests obtain.

The status quo of data-governance law, as well as prominent proposals for its reform, approach these population-level relational effects as incidental or a byproduct of eroded individual data rights, to the extent that they recognize these effects at all. As a result, both the status quo and reform proposals suffer from a common conceptual flaw: they attempt to reduce legal interests in information to individualist claims subject to individualist remedies, which are structurally incapable of representing the interests and effects of data production’s population-level aims. This in turn allows significant forms of social informational harm to go unrepresented and unaddressed in how the law governs data collection, processing, and use.

Properly representing the population-level interests that result from data production in the digital economy will require far more collective modes of ordering this productive activity.5 The relevant task of data governance is not to reassert individual control over the terms of one’s own datafication (even if this were possible) or to maximize personal gain, as leading legal approaches to data governance seek to do. Instead, the task is to develop the institutional responses necessary to represent (and adjudicate among) the relevant population-level interests at stake in data production. In other words, responding adequately to the economic imperatives and social effects of data production will require moving past proposals for individualist data-subject rights and toward theorizing the collective institutional forms required for responsible data governance.

This Feature builds on prior digital-privacy and data-governance scholarship that points out the importance of social causes and social effects of privacy erosion.6 It takes up these insights to offer an account of why the social effects of privacy erosion should be considered of greater relevance—indeed, central relevance—for data-governance law. By placing data relations and their population-level effects at the center of discussions regarding why data about people is (and ought to be) legally regulated, this Feature offers two contributions to the literature on data-governance law.

First, it aligns the legal debates regarding how to govern data production with the economic transformation of data into a key input of the information economy. This in turn illuminates the growing role (and heightened stakes) of data-governance law as a primary legal regime regulating informational capitalism.

The descriptive contribution of this Feature details how data production in the digital economy is fundamentally relational: a basic purpose of data production as a commercial enterprise is to relate people to one another based on relevant shared population features. This produces both considerable social value and many of the pressing forms of social risk that plague the digital economy. As this Feature explores further below, data’s relationality results in widespread population-level interests in data collection and use that are irreducible to individual legal interests within a given data exchange. Contending with the economic realities of data production thus expands the task of data-governance law: from disciplining against forms of interpersonal violation to also structuring the rules of economic production (and social reproduction) in the information economy.

Second, this Feature departs from prior work to offer an alternative normative account for what makes datafication wrongful. Privacy and data-governance law have traditionally governed forms of private interpersonal exchange in order to secure the benefits of data-subject dignity or autonomy. Yet as data collection and use become key productive activities (i.e., economic activities that define the contemporary economyas an information economy), new kinds of information-based harm arise. There is growing evidence of the role that digital technology plays in facilitating social and economic inequality.7 Digital-surveillance technologies used to enhance user experience for the rich simultaneously provide methods of discipline and punishment for the poor. Algorithmic systems may reproduce or amplify sex and race discrimination.8 Even seemingly innocuous data collection may be used in service of domination and oppression.9 The pursuit of user attention and uninterrupted access to data flows amplifies forms of identitarian polarization, aggression, and even violence.10 Such evidence suggests that social processes of datafication not only produce violations of personal dignity or autonomy, but also enact or amplify social inequality.

Prior accounts rightly identify the deep entanglement between the challenges of protecting autonomy in the digital economy and the realities of how data production operates as a social process: without securing better social conditions for data production for everyone, the personal benefits of robust privacy protection cannot be realized.11 On this view, the supraindividual nature of digital-privacy erosion matters because it raises additional complications for securing the benefits of robust digital-privacy protection for individuals.

This Feature departs from such accounts in that it places the inegalitarian effects of data extraction on equal footing with its autonomy-eroding effects. Privacy erosion’s social effects do implicate the personal (and social) value of individual autonomy. But the inequality that results from data production should be considered relevant to the task of data governance for its own sake, and not only for the effects inequality has on data subjects’ individual capacities for self-formation and self-enactment. This Feature thus argues that, alongside traditional concerns over individual autonomy, the social inequalities that result from data production are also forms of informational harm.

Both current and proposed data-governance law fail to adequately grasp the socioeconomic and normative centrality of data relations. This poses two problems. The first problem is conceptual: a central economic imperative that drives data production goes unrepresented in both existing and proposed laws governing datafication. As a practical matter, this leaves the law out of step with many of the ways that information creates social value and allows material forms of social informational harm to persist unaddressed. This presents U.S. data-governance law with a sociality problem: how can data-governance law account for data production’s social effects?

The second problem is a matter of institutional design. Individualist theories of informational interests result in legal proposals that advance a range of new rights and duties with respect to information but practically fall back on individuals to adjudicate between legitimate and illegitimate information production. This not only leaves certain social informational harms unrepresented (let alone addressed), but also risks foreclosing socially beneficial information production. This presents U.S. data-governance law with a legitimacy problem: how can the legal regimes governing data production distinguish legitimate from illegitimate data use without relying on individual notice and choice?

The sociality problem demonstrates the need in data-governance law for an expanded account of the interests at stake in information production, while the legitimacy problem points to the need for data-governance law to expand its remit by considering whose interests are relevant for deciding whether a particular instance of data production is legitimate, and on what grounds.

This Feature offers a response to these conceptual and institutional design problems. Conceptually, it offers an account of the sociality problem that recognizes the ubiquity and the relevance of the population-level interests that result from data production. From such recognition follows this Feature’s response to the legitimacy problem, which argues for governing many types of data as a collective resource that necessitates far more democratic, as opposed to personal, forms of institutional governance.

This in turn leads to a different line of inquiry regarding the legal challenges facing data-governance law. Current debates center on how to secure greater data-subject control, more robust protections for data-subject dignity, or better legal expressions of data-subject autonomy. An account of data social relations focuses future inquiry on how to balance the overlapping and at times competing interests that comprise the population-level effects of data production. This line of inquiry raises core questions of democratic governance: how to grant people a say in the social processes of their mutual formation; how to balance fair recognition with special concern for certain minority interests; what level of civic life achieves the appropriate level of pooled interest; and how to recognizethat data production produces winners and losers and, in turn, develop fair institutional responses to these effects.

This Feature proceeds in four Parts. Part I describes the stakes and the status quo of data governance. It begins by documenting the significance of data processing for the digital economy. It then evaluates how the predominant legal regimes that govern data collection and use—contract and privacy law—code data as an individual medium. This conceptualization is referred to throughout the Feature as “data as individual medium” (DIM). DIM regimes apprehend data’s capacity to cause individual harm as the legally relevant feature of datafication; from this theory of harm follows the tendency of DIM regimes to subject data to private individual ordering.

Part II presents the Feature’s core argument regarding the incentives and implications of data social relations within the data political economy. Data’s capacity to transmit social and relational meaning renders data production especially capable of benefitting and harming others beyond the data subject from whom the data is collected. It also results in population-level interests in data production that are not reducible to the individual interests that generally feature in data governance. Thus, data’s relationality presents a conceptual challenge for data governance reform.

Part III evaluates two prominent sets of legal reform proposals that have emerged in response to concerns over datafication. Data has been extensively analogized, and proposals for reform locate data at different points on the continuum from “object-like” to “person-like.”12 On one end of this spectrum, propertarian proposals respond to growing wealth inequality in the data economy by formalizing individual propertarian rights over data. These reforms call for formalizing an alienable right to data as labor or property, to be bought and sold in a market for goods or labor. On the other end, dignitarian reforms conceive of data as an extension of data-subject selfhood. Dignitarian reforms respond to how excessive data extraction can erode individual autonomy by strengthening the fundamental rights data subjects enjoy over their data as an extension of their personal selfhood. While propertarian and dignitarian proposals differ on the theories of injustice underlying datafication and accordingly provide different solutions, both resolve to individualist claims and remedies that do not represent, let alone address, the relational nature of data collection and use.

Finally, Part IV proposes an alternative approach: data as a democratic medium (DDM). This alternative conceptual approach recognizes data’s capacity to cause social harm as a fundamentally relevant feature of datafication. This leads to a commitment to collective institutional forms of ordering. Conceiving of data as a collective resource subject to democratic ordering accounts for the importance of population-based relationality in the digital economy. This recognizes a greater number of relevant interests in data production. DDM responds not only to salient forms of injustice identified by other data-governance reforms, but also to significant forms of injustice missed by individualist accounts. In doing so, DDM also provides a theory of data governance from which to defend forms of socially beneficial data production that individualist accounts may foreclose. Part IV concludes by outlining some examples of what regimes that conceive of data as democratic could look like in practice.

Before continuing, three definitional and stylistic notes regarding this Feature’s use of key terms are in order:

  • Data. For the sake of brevity, “data” refers to data about people unless otherwise noted. Data about people is the data collected as people “invest, work, operate businesses, socialize,” and otherwise go about their lives.13 This data is of greatest interest to competing digital-technology companies and to observers of the business models built from data collection. It is also deliberately more expansive than U.S. definitions of “personal data” or the closely related term “personally identifiable information.”14 Furthermore, this Feature will refer to “data” as a singular, not a plural noun. This stylistic choice is in line with the common rather than the strictly correct usage.

  • Data subject and data collector. This Feature will use the term “data subject” to refer to the individual from whom data is being collected—often also referred to in technology communities as the “user.” “Data processor” is used synonymously with “data collector” to refer to the entity or set of entities that collect, analyze, process, and use data. The definitions of “data subject” and “data processor” are loosely derived from the European Union’s General Data Protection Regulation (GDPR).15 While the GDPR’s definition of personal data offers some capacity for nonindividualistic interpretation, any reference to “data subject” in this Feature will refer to the individual from whom or about whom data is being collected.

  • Informational Harm. Individual informational harmrefers to harm that a data subject may incur from how information about them is collected, processed, or used. In contrast, social informational harm refers to harms that third-party individuals may incur when information about a data subject is collected, processed, or used.