abstract. Statistical evidence is crucial throughout disparate impact’s three-stage analysis: during (1) the plaintiff’s prima facie demonstration of a policy’s disparate impact; (2) the defendant’s job-related business necessity defense of the discriminatory policy; and (3) the plaintiff’s demonstration of an alternative policy without the same discriminatory impact. The circuit courts are split on a vital question about the “practical significance” of statistics at Stage 1: Are “small” impacts legally insignificant? For example, is an employment policy that causes a one percent disparate impact an appropriate policy for redress through disparate impact litigation? This circuit split calls for a comprehensive analysis of practical significance testing across disparate impact’s stages. Importantly, courts and commentators use “practical significance” ambiguously between two aspects of practical significance: the magnitude of an effect and confidence in statistical evidence. For example, at Stage 1 courts might ask whether statistical evidence supports a disparate impact (a confidence inquiry) and whether such an impact is large enough to be legally relevant (a magnitude inquiry). Disparate impact’s texts, purposes, and controlling interpretations are consistent with confidence inquires at all three stages, but not magnitude inquiries. Specifically, magnitude inquiries are inappropriate at Stages 1 and 3—there is no discriminatory impact or reduction too small or subtle for the purposes of the disparate impact analysis. Magnitude inquiries are appropriate at Stage 2, when an employer defends a discriminatory policy on the basis of its job-related business necessity.
author. Yale Law School, J.D. expected; Yale Philosophy, Ph.D. expected; Rutgers University, B.A. 2012. I thank the Yale Law Journal staff, especially Notes Editors Greg Cui, Joe Falvey, and Urja Mittal. This argument’s examples involve impacts on communities of which I am not a member. Such advocacy is “a touchy sort of subject,” in the words of SJA Germanotta: “Can you stand up for people [when] you are not necessarily fully part of that community in a way that [members] can understand?” Most special thanks to Owen Fiss and the 2016 Community of Equals seminar participants who taught me a tremendous amount, including how to approach this question.
Statistical evidence is crucial in each stage of disparate impact’s three-stage analysis: (1) the plaintiff’s prima facie demonstration of a policy’s disparate impact; (2) the defendant’s job-related business necessity defense of the discriminatory policy; and (3) the plaintiff’s demonstration of an alternative policy without the same discriminatory impact. There is a circuit split on the role of “practical significance” inquiries at the prima facie stage,1 raising a fundamental question about disparate impact theory: Are such “small”—effects, about whose existence we are confident—legally insignificant? For example, is an employment policy that causes a one percent disparate impact an appropriate object of disparate impact litigation?
This question calls for a broader analysis of “practical significance” at each of disparate impact’s three stages. Importantly, courts use “practical significance” in multiple ways. The present argument’s primary focus is practical significance referring to the magnitude of an effect supported by statistical evidence. I call courts’ evaluation of the size of an effect a “magnitude inquiry.” Another sense of practical significance involves the strength of the inference from an empirical-statistical finding to the real world. I refer to a court’s evaluation of this aspect of practical significance as a “confidence inquiry.” This is an important distinction, and courts and commentators often use “practical significance” in ways that are ambiguous between these two aspects.2 The second aspect—practical significance as the strength of the inference supported by statistical evidence—is obviously relevant to disparate impact analysis, in the same way that assessing the strength of the inference supported by evidence is always relevant. A debate remains regarding “magnitude inquires,” evaluations of whether some effect is sufficiently large, at each stage of analysis.
I argue that such magnitude inquiries are inappropriately used to evaluate whether a “large enough” prima facie disparate impact exists or whether an alternative policy with less discriminatory impact promises a “large enough” decrease in discriminatory impact, at the first and third stages of disparate impact litigation. However, magnitude inquiries are more appropriate when an employer defends a discriminatory policy on the basis of its job-related business necessity, at the second stage of disparate impact litigation. Thus, this argument’s primary contribution is an analysis of “magnitude inquiries,” one aspect of practical significance, across all three stages of disparate impact.
The Note proceeds in three parts. Part I describes disparate impact theory, highlighting the logic of the shifting burden of proof,3 and relevant statistical concepts. Part II analyzes statistics’ role at three stages of disparate impact analysis: the plaintiff’s establishment of prima facie disparate impact, the defendant’s rebuttal of establishing a test’s job-relatedness and business necessity, and the plaintiff’s proposal of a less discriminatory alternative policy. I argue that disparate impact law supports the rejection of magnitude inquiries for a plaintiff’s prima facie case of disparate impact and proposal of a less discriminatory alternative, but it supports a more robust magnitude inquiry during an employer’s establishment of a disparity-causing test’s job-relatedness and business necessity. Part III provides recommendations for improving the use of statistics in disparate impact analysis.
This Note contributes a defense of the First Circuit’s decision, which has previously been subjected to critical commentary.4 Importantly, it highlights the distinction between two aspects of “practical significance” sometimes obscured in disparate impact discussions: magnitude and confidence. The Note also contributes a comprehensive analysis of practical significance, providing recommendations for the use of statistics at all three stages of disparate impact litigation. In doing so, it calls for courts to reflect broadly about whether their use of statistics at each stage is consistent with their uses at the two other stages, their underlying theory of statistics and evidence, and their disparate impact theory.
Given the amount5 and importance6 of disparate impact litigation, addressing key questions that can determine the outcome of these actions, such as courts’ use of magnitude inquiries, can be of great consequence. Indeed, these issues have provoked controversy. Today, the role of “practical significance” in the prima facie stage of disparate impact analysis is at the heart of a circuit split. The First, Third, and Tenth Circuits oppose practical significance inquiries; the Second, Fourth, Fifth, Sixth, Ninth, and Eleventh Circuits endorse them; and the D.C., Seventh, and Eighth Circuits have no clear precedent.7
Before turning to the analysis, it is worth noting that these legal questions arise against a particular scientific and cultural backdrop: the danger of relying on mere statistical significance in interpreting empirical studies is the subject of scientific and increasingly popular concern, and looking to “practical significance” is a popular remedy.8 Calls to move science beyond simple statistical significance testing are not exclusive to the current moment,9 nor are calls to move toward some form of practical significance testing.10 Unreflective reliance on scientific trends might suggest that practical significance inquiries of all forms—including magnitude inquiries—are necessary parts of sound methodology, including throughout disparate impact analysis.
This Note cautions otherwise.11
This Part provides an overview of disparate impact litigation and its three-stage burden-shifting framework: the plaintiff’s prima facie demonstration of a disparate impact, the defendant’s job-related business necessity defense, and the plaintiff’s demonstration of a suitable alternative policy with less discriminatory impact. Then, I describe disparate impact theory’s fundamental aims, the purpose of each stage, and the two key statistical concepts: statistical significance and practical significance. The discussion of practical significance outlines the fundamentally different aspects of practical significance testing that courts use: “magnitude inquiries” evaluate whether an effect is sufficiently large to be legally relevant, while “confidence inquiries” evaluate whether statistical evidence sufficiently supports a claim. For instance, in evaluating whether a prima facie showing of disparate impact has been made, a court might examine whether the impact is sufficiently large (for instance, is a one percent disparity legally relevant?) or whether the evidence supports the claim that the policy caused a disparity.
Title VII of the Civil Rights Act of 1964 prohibits workplace discrimination on the basis of protected characteristics: race, color, religion, sex, and national origin.12 In early opinions, courts read the Act to protect individuals against intentional discrimination.13 In 1971, the Supreme Court articulated a broader understanding of Title VII in Griggs v. Duke Power Co., the landmark decision that introduced disparate impact theory.14 Griggs held that Title VII prohibits “not only overt discrimination but also practices that are fair in form, but discriminatory in operation.”15 This theory of disparate impact allows a plaintiff to recover when an employer implements a test or policy that adversely affects a protected group. Unlike disparate treatment, disparate impact does not require employer animus or particular intentions.16 The “touchstone” of disparate impact theory, according to the Griggs Court, is business necessity.17 In order to justify a practice that has a discriminatory impact, an employer must show that the disparity-causing practice is a business necessity.
Post-Griggs decisions refined disparate impact theory. Notably, in 1975, the Court in Albemarle Paper Co. v. Moody outlined a three-part burden-shifting framework for disparate impact litigation.18 The Supreme Court stepped back from this approach in Wards Cove Packing Co. v. Atonio,19 limiting Griggs by modifying the standard of business necessity to require merely a “legitimate business justification” for a discriminatory practice.20 But two years later, the Civil Rights Act of 1991 superseded Wards Cove, restoring the disparate impact framework preceding Wards Cove.21
The Civil Rights Act of 1991 codified disparate impact theory developed in case law, including the three-part burden-shifting framework from Albemarle Paper Co.22 Under this framework, the plaintiff (the employee) must first make a prima facie demonstration that a policy or practice has a disparate impact on the plaintiff’s protected class.23 Next, the defendant (the employer) must demonstrate that its policy or practice is “job related” and “consistent with business necessity.”24 If the defendant meets this burden, the plaintiff has the burden of demonstrating that there is a suitable alternative employment practice with less discriminatory impact.25 The plaintiff can recover if the employer fails to meet its burden at the second stage or if the plaintiff meets his or her burden at the third stage.
This language in the Civil Rights Act of 1991 indicates an intention to codify the principles of Griggs and its legacy,26 including the three-part burden-shifting test articulated in Albemarle Paper.27 It also echoes the language of job-relatedness and business necessity: after the prima facie demonstration of adverse impact, a defendant must “demonstrate that the challenged practice is job related for the position in question and consistent with business necessity.”28
Griggs articulates a simple but powerful antisubordination principle:29 employment practices must be revised such that protected classifications become irrelevant.30 Any unnecessary and discriminatory employment practice—intentional or unintentional, of large or small magnitude—must be removed. The primary purpose of disparate impact is antisubordination. Discriminatory employment practices should be removed so that protected classifications become irrelevant. As the Griggs Court put it, the fundamental aim of Title VII is “the removal of artificial, arbitrary, and unnecessary barriers to employment when the barriers operate invidiously to discriminate on the basis of racial or other impermissible classification.”31
Of course, the goal of antisubordination has an unavoidable limit. It does not entirely “preclude the use of [employment] testing or measuring procedures.”32 In the absence of a less discriminatory alternative, policies that have a disparate impact may be permitted if “they are demonstrably a reasonable measure of job performance.”33 Therefore, when the goal of antisubordination and a legitimate business interest clash, disparate impact is tolerated—to an extent—for the sake of business interests that are sufficiently substantial and in the absence of an alternative policy of less discriminatory impact.
The overarching antisubordination aim and the business necessity limit inform the structure of the three-part burden-shifting framework.34 First, the plaintiff must demonstrate a prima facie case of disparate impact: “that a [defendant] uses a particular employment practice that causes a disparate impact on the basis of race, color, religion, sex, or national origin.”35 Then, the plaintiff must identify a discriminatory employment practice, one that functions to make a protected status like race relevant. The employer can also demonstrate that the practice does not cause the disparate impact: “If the [defendant] demonstrates that a specific employment practice does not cause the disparate impact, the [defendant] shall not be required to demonstrate that such practice is required by business necessity.”36 In rebuttal, the defendant must demonstrate that “the challenged practice is job related for the position in question and consistent with business necessity.”37 Note that the practice must not only be related to the job, but must also be a reasonable measure of job performance, one that justifies a departure from disparate impact’s primary aim to make factors like race and religion irrelevant. Finally, even if the discriminatory practice is job related and consistent with business necessity, the plaintiff may succeed by presenting an alternative employment practice38 that also serves the employer’s legitimate interests “without a similarly undesirable [discriminatory] effect” but that the respondent refuses to adopt.39 The fundamental purpose of this three-part framework is to eliminate unnecessary and discriminatory employment barriers. Some discriminatory barriers might be business necessities—barriers that have been permitted despite the motivation to make factors like race irrelevant. Yet if there is an alternative policy that serves the same purpose without equal discriminatory impact, the employer must adopt that policy instead.
The fundamental aim of antisubordination might be achieved in court or out of court. Although it is easy to focus primarily on disparate impact litigation, successful lawsuits are only one way through which disparate impact law might dismantle unnecessary and discriminatory barriers to employment. Another, less costly way that disparate impact law serves its function is by creating incentives for employers to remove problematic and unlawful barriers to employment before litigation commences.
Several statistical concepts are relevant to disparate impact analysis. Here, I detail the most important concepts for the purposes of this Note: statistical significance and practical significance.
Statistical significance is a concept that is frequently applied to empirical results. One of the most common forms in which statistical significance is expressed is through a p-value (e.g., “p < .05”). A p-value is the probability of obtaining results that are at least as extreme as if the null hypothesis were true. Smaller p-values provide evidence that is less consistent with the null hypothesis.
In the context of Title VII employment discrimination litigation, a null hypothesis might assume equal selection rates by an employer among different racial applicant groups. For instance, suppose the evidence shows that a policy differentially rejects blacks and that this difference is statistically significant with a p-value of five percent. This means that, assuming equal selection rates for each group, there is a five percent chance of arriving at a difference in selection rates of equal or greater magnitude.
In disparate impact analysis (and elsewhere), p-values should be interpreted cautiously; statistical significance testing should not be relied upon in isolation.40 In a recent volume, the American Statistical Association summarized some of the key principles and flaws in how p-values have been used in empirical analysis:41
1.p-values can indicate how incompatible the data are with the model being tested.
2.p-values do not tell you the probability the model is true or the probability the data are random.
3.No decision—scientific, business, legal or otherwise—should be based solely on p-values passing a cutoff value (i.e., a “bright line,” such as p < .01 or .05).
4.The proper understanding of statistical tests requires full reporting and transparency (i.e., report all statistical analyses and p-values; do not cherry-pick results to be reported).
5.A p-value does not indicate the size or importance of an effect that is obtained, no matter how small the p-value is (and large p-values do not tell you that an effect does not exist, only that it is not supported by the data).
6.The p-value does not tell you how good your model or hypothesis is (i.e. a high p-value may support the null hypothesis, yet many other models might also be supported by the data).42
These lessons highlight the dangers of relying solely on p-values or interpreting them inappropriately.43 For instance, “p = .05” does not mean that the null hypothesis has only a five percent chance of being true, nor does it mean that the observed data would occur only five percent of the time under the null hypothesis.44 A p-value is simply the probability of the observed result or a more extreme result occurring, given that the null hypothesis is true. It is important to remember that a p-value is calculated on the assumption that the null hypothesis is true. Therefore, the p-value is not the probability that the null hypothesis is false.
Consider what p-values can tell us in disparate impact analysis. Suppose our null hypothesis is that there is no racial effect of a business’s hiring policy. That is, the null hypothesis is that any difference in hiring rates between two racial groups is simply due to chance. If the real-world data indicate a statistically significant difference in the employer’s hiring rates between black and white groups with a p-value of less than five percent, we have learned that, assumingno racial effect, we would find a difference in white and black hiring rates at least this extreme less than five percent of the time. The data do not tell us that there is less than a five percent chance that the racial disparity is due to chance.
Practical significance refers to the real-world import of a statistical finding. In disparate impact cases, the term is used in two notably different ways. One is to refer to a “magnitude inquiry,” an analysis of the magnitude of a result supported by statistical evidence—for instance, the size of the effect indicated by a statistically significant finding. The other is a “confidence inquiry,” an analysis of the strength of the inference drawn between statistical evidence and the conclusion one draws from it about the real world.
A magnitude inquiry is an assessment of the size of an effect.45 For instance, a statistically significant effect can be small in size. Suppose there is evidence that an employer had a hiring pool of ten thousand applicants. A five percent racial disparate impact might be statistically significant given the large sample size, but nevertheless deemed to have a small effect size, since some may think that a five percent difference is “small” in size.46 Of course, whether an effect size is “large” or “small” is fundamentally a conventional or normative judgment and not derived purely from statistical analysis.
In contrast, a confidence inquiry is an assessment of the strength of the evidence, which asks how strong the inference is between the evidence and the claim it supports about the world. For instance, we might evaluate the statistical evidence of an observed disparity by asking whether it really supports the existence of a real-world disparity caused by the hiring policy in question. Imagine that statistical evidence suggests a three percent disparity in the hiring rates of black and white applicants. Courts might ask whether this result is practically significant in the sense of whether this evinces any real-world disparity. This aspect of practical significance is important, but it is also a standard inquiry: we can, should, and do regularly ask whether any piece of evidence is practically significant in this second sense.
Even an effect with a size that is considered “medium” or “large” in the first sense might be deemed as having little practical significance in the second sense, especially when the evidence is based on a small sample size. For instance, suppose an employer has a hiring pool of ten applicants, half from one group and half from another, and a hiring test excludes all but three.47 Even if the difference in hiring rates suggested by this evidence is of large magnitude, we might doubt the real-world inference of a disparate impact supported by these results.
This distinction—practical significance as a measure of a disparity’s magnitude vs. practical significance as a measure of confidence in the strength of evidence—is crucial. This Note focuses on magnitude inquiries. This is not to say, however, that evaluation of the inference between statistical evidence and the real world is irrelevant. To the contrary, such evaluations should remain fundamental at each stage.
Consider another example. Suppose a company has reviewed applications from one hundred candidates, forty-five of whom are white and fifty-five of whom are black. The application requires a hair follicle drug test, which more white applicants pass. By conventional standards (significance determined by p < .05), the effect of race is on the border of statistical significance. Depending on the assumptions, different statistical tests lead to different results.48 This demonstrates an important but overlooked feature of statistical significance testing: Despite its allure of objectivity, its results vary based on its assumptions.
Regardless of statistical significance, the effect’s practical significance remains. First, consider the “magnitude” aspect of practical significance: what do these statistical analyses imply about the magnitude of the disparity? A conventional measure of effect size suggests that this is a “small” or “weak” effect size.49 But we can still ask about the “confidence” aspect of practical significance: how strongly do these statistical facts (including our analysis of effect size) support the existence of any real-world disparity? In other words, how strong is the evidence of a disparity?
Although there is an important distinction between two aspects of practical significance—magnitude and confidence—authorities sometimes emphasize only one aspect. Consider the Federal Judicial Center’s definition, which understands practical significance only in terms of magnitude: practical significance means that “the magnitude of the effect being studied is not de minimis—it is sufficiently important substantively for the court to be concerned.”50 Some courts have adopted a similar understanding of practical significance. In Frazier v. Garrison I.S.D., the Fifth Circuit held that a 4.5% difference in selection rates did not have sufficient practical significance when 95% of applicants were selected.51 The Frazier Court justified its decision by citing a case in which it had previously held
that employment examinations having a 7.1 percentage point differential between black and white test takers do not, as a matter of law, state a prima facie case of disparate impact. Therefore [in this case in which the difference is 4.5 percentage points], there is no significant statistical discrepancy between minority and non-minority pass rates.52
Thus, the court applied a practical significance requirement in the sense of a magnitude inquiry. This was not an inquiry into how strongly the evidence supported the possibility of a real-world disparity. The Frazier Court was essentially performing a logical deduction: since a 7.1% difference was not big enough to constitute prima facie disparate impact, a 4.5% difference was also insufficiently large.
Statistics play a crucial role at each of the three stages of disparate impact litigation: the plaintiff’s prima facie case of disparate impact, the defendant’s rebuttal relating to job-relatedness and business necessity, and the plaintiff’s demonstration of a suitable alternative practice. This Part outlines the role of statistics at each stage and presents arguments for the appropriate use of statistics and “practical significance” inquiries at each stage.
Section II.A argues that many courts inappropriately conduct magnitude inquiries at the prima facie stage of disparate impact analysis. Scrutinizing a disparity’s “practical significance” through a magnitude inquiry at the prima facie stage is to ask whether the disparity is big enough to warrant the court’s attention. This question is antithetical to the statutory text, purpose, and precedents of disparate impact law.
Section II.B argues that a robust magnitude inquiry is more appropriate at the second stage of disparate impact analysis. Although such a requirement is incongruous at the prima facie stage, it is apt when assessing the merit of an employer’s rebuttal that some disparity-causing policy is a job-related business necessity—employers must demonstrate that a disparity-causing test has large enough relevance to justify permitting discriminatory impact on the basis of certain legitimate business interests. A magnitude inquiry at this rebuttal stage is more consistent with disparate impact law.
Comparatively fewer cases proceed to the third stage of disparate impact analysis: the plaintiff’s proposal of a less discriminatory alternative policy. Section II.C argues that the logic underlying the elimination of magnitude inquiries during the prima facie stage applies to the third stage as well. Just as the aim of the first stage is to identify a policy that causes any disparity, the aim of the third stage is to identify an alternative policy that provides any decrease in discriminatory effect. Plaintiffs should not be required to show that their proposal reduces discrimination by a particular magnitude. As long as the proposal satisfies the employer’s legitimate interest without a similarly undesirable effect on potential or current employees, the plaintiffs should be found to have met their burden.
The plaintiff typically provides evidence of statistically significant disparities to help support the prima facie demonstration of a disparate impact.53 Courts adopt a variety of approaches in assessing these disparities. One common approach is to adopt thresholds based on standard deviations.54 Some courts hold that disparities not rising to a certain level of statistical significance are insufficient proof of disparate impact.55
Other courts adopt the EEOC’s “four-fifths” or “eighty percent” rule as a standard for measuring prima facie disparate impact.56 The four-fifths rule compares the ratio of selection rates between the rate of selection for the protected class and the greatest rate of selection for any group and asks whether this ratio is less than four-fifths. The Supreme Court famously branded the four-fifths rule as one that “has not provided more than a rule of thumb.”57 Moreover, the EEOC guidance itself acknowledges that “[s]maller differences in selection rate may nevertheless constitute adverse impact, where they are significant in both statistical and practical terms or where a user’s actions have discouraged applicants disproportionately on grounds of race, sex, or ethnic group.”58 In other words, the four-fifths rule yields at most a first cut of easily decided cases of prima faciedisparate impact: a prima faciecase is demonstrated by group selection rates with a ratio below four-fifths, but smaller differences (i.e., larger ratios) require further scrutiny.
These guidelines—statistical significance (and other measures like standard deviation analysis) and the four-fifths rule—can be combined with each other. For instance, a court might adopt an analysis that looks first to the four-fifths rule and then to statistical significance for data failing the four-fifths rule. The four-fifths rule is essentially a guideline that takes practical significance into account, allowing prima facie impact to be established when the effect size (disparity) is large enough. The guideline might also be supplemented by an interpretation that holds practical significance is not established where the disparity is insufficiently large. This is the magnitude inquiry debate at the heart of the circuit split.
The Supreme Court has consistently stated that the essence of demonstrating a prima facie disparate impact is showing statistically significant evidence of a disparity.59 This view was recently reaffirmed in Ricci v. DeStefano: “[A] prima facie case of disparate-impact liability [is] essentially, a threshold showing of a significant statistical disparity . . . and nothing more . . . .”60 The Court characterized the prima facie demonstration as one not requiring a disparity of any particular magnitude. This reaffirms the core commitment of disparate impact theory: “Title VII tolerates no racial discrimination, subtle or otherwise.”61
There are various considerations weighing against magnitude inquiries at the prima facie stage of disparate impact analysis, such as definitions of “disparate impact”62 and the legislative history of the relevant statutes.63
Statutory text and Supreme Court precedent demonstrate that practical significance is irrelevant at the prima facie stage. According to Title VII, a complaining party must demonstrate “that a respondent uses a particular employment practice that causes a disparate impact on the basis of race, color, religion, sex, or national origin . . . .”64 The text indicates that a plaintiff must show a disparate impact, not asubstantial, notable, large,or even significant disparate impact. A confidence inquiry is relevant in determining whether the evidence presented supports causation, but there is no basis in the text for a magnitude inquiry, which asks whether the evidence supports a disparate impact that is big enough to be worth proceeding.
Supreme Court precedent supports the same interpretation. Griggs interprets Title VII as aimed at the removal of unnecessary barriers that have a discriminatory impact in employment:
What is required by Congress is the removal of artificial, arbitrary, and unnecessary barriers to employment when the barriers operate invidiously to discriminate on the basis of racial or other impermissible classification. Congress has now provided that tests or criteria for employment or promotion may not provide equality of opportunity merely in the sense of the fabled offer of milk to the stork and the fox. On the contrary, Congress has now required that the posture and condition of the job seeker be taken into account. It has—to resort again to the fable—provided that the vessel in which the milk is proffered be one all seekers can use. The Act proscribes not only overt discrimination, but also practices that are fair in form, but discriminatory in operation. The touchstone is business necessity. If an employment practice which operates to exclude Negroes cannot be shown to be related to job performance, the practice is prohibited.65
This well-known passage is worth careful attention. As Griggs interprets Title VII, Congress is not concerned only about the barriers that cause the largest disparities; rather, if an unnecessary barrier causes any disparity, the barrier must be removed. Albemarle Paper reinforces this early understanding.66
Considering practical significance at the prima facie stage is equally inconsistent with recent Supreme Court opinions on disparate impact. Recall Ricci’s straightforward avowal: “[A] prima facie case of disparate-impact liability [is] essentially, a threshold showing of a significant statistical disparity . . . and nothing more . . . .”67 Inquiry into a disparity’s size is unambiguously something more. It is also worth noting the unanimity in understanding given the ideological diversity represented among the authors and signers of just these two opinions. Chief Justice Burger wrote the opinion in Griggs on behalf of a unanimous Court; four decades later, Justice Kennedy wrote the Ricci opinion on behalf of the Court’s conservatives. Requiring a demonstration of this sufficient magnitude aspect of practical significance entails a subjective verdict on the importance of some (“small”) disparity. This is at odds with the textual basis, aims, and precedent (from Griggs to Ricci68) of prima facie disparate impact demonstration.
This argument raises two important questions: (1) how does the argument square with the four-fifths rule, a commonly accepted mode of inquiring into practical significance; and (2) if magnitude inquiries are so clearly inappropriate at the prima facie stage, why is there a controversial circuit split on the issue?69
Although rejecting a prima facie case on the basis of practical significance is inappropriate, many courts look to practical significance as a shorthand to demonstrate a prima facie disparate impact through the four-fifths rule.70 The four-fifths rule has an air of objectivity: if the hiring rate for the impacted group is lower than this sharp cut-off—eighty percent of the rate for the favored group—then there is a prima facie disparate impact. But this rule has different effects depending on selection rates. For instance, if a favored group is hired at a rate of twenty percent, then any impacted-group hiring rate less than sixteen percent would establish prima facie disparate impact. But if a favored group is hired at a ninety-five percent rate, then any impacted-group hiring rate less than seventy-six percent would establish the prima facie case. In other words, based on the hiring base rate, the four-fifths rule’s guidance fluctuates between a group-group difference of zero to twenty percent.
Crucially, the EEOC’s characterization of the four-fifths rule advises that any rate less than four-fifths of the higher selection rate establishes the prima facie case without showing further practical significance, but smaller differences may “nevertheless constitute adverse impact” if those differences are statistically and practically significant.71 In other words, the four-fifths rule advises granting the demonstration of prima facie disparate impact under certain conditions, but it never advises denying it on such a basis. Smaller differences should be considered in further detail to determine whether they evince prima facie disparate impact.
The four-fifths rule is essentially a practical significance guideline that functions as a ceiling, not a floor. If the effect size is large enough, there is a prima facie disparate impact. A number of other courts have suggested that something beyond mere statistical significance should be required in demonstrating the prima facie case of disparate impact.72 This requirement is a demonstration of a certain form of “practical significance”: the statistically significant result must evince a substantial disparity.
Now consider the second question. If practical significance inquiries are so clearly inappropriate at the prima facie stage, why is there a circuit split? Recall that the First Circuit rejected a practical significance requirement in Jones v. City of Boston, but the Fifth Circuit held that a disparate job selection rate was too small to establish a prima facie case in Frazier v. Garrison I.S.D.73 Part of the answer, I suspect, is that some courts prefer a thoughtful, contextual analysis of the evidence that supports the prima facie disparate impact. A contextualized inquiry—for instance, examining sample size, statistical significance, and effect size—is appropriate in a confidence inquiry. It is inappropriate, however, for courts to smuggle a magnitude inquiry floor into a confidence inquiry. At the prima facie stage, courts should ask whether the evidence supports a finding ofdisparate impact, not what amount of disparate impact merits attention.
Magnitude inquiries are a necessarily subjective practice. Frazier held that a 4.5% difference was trivial, when ninety-five percent of applicants were selected.74 The justification for such reasoning is unclear: Would a 4.5% difference be more relevant if only eighty percent of applicants were selected? What if only thirty percent of applicants were selected?
Confidence inquiries are appropriately contextual. There are many factors to consider in a confidence inquiry. When evaluating how strongly the evidence supports the existence of adisparate impact, courts might look to the statistical evidence’s sample size, the size of the respective group categories, and even the effect size.
But the subjective contextualism of a magnitude inquiry is more dangerous. Determining what magnitude of disparate impact is sufficient to demonstrate a prima facie disparate impact allows—and invites—judgment about the importance of some disparate impact on a protected class. This allows lines to be drawn differently in different contexts. For instance, some jurisdictions might consider a five percent hiring difference significant, while others might consider the difference trivial. This injects subjectivity into the core of disparate impact analysis. Moreover, it contradicts the text of Title VII and Supreme Court precedent, which require plaintiffs to identify adisparate impact—and nothing more.
To be more precise, one reason that practical significance testing at the prima facie stage is ever invoked is that courts consider an impact’s magnitude in the name of practical significance, when they really are invoking the confidence inquiry aspect of practical significance. This is a statistical fallacy. While confidence inquiries are an appropriate consideration at the stage of prima faciedisparate impact, and effect size can serve as relevant evidence for a confidence inquiry, a magnitude inquiry is not in itself necessary to satisfy a confidence inquiry. It may be that courts commit the fallacy of requiring consideration of what is merely one source of possible evidence. The relevant, crucial question at the prima facie demonstration stage is this: is there good evidence that the policy caused some disparity? Evidence of a large disparity helps build confidence in the proof of some(perhaps even smaller) disparity. But evidence of a large disparity is not required. In some cases, we expect it to beabsent—namely, when there is a small real-world disparity.
A similar confusion underlies appeals to the four-fifths rule. The theory of disparate impact does not privilege “large” disparities over “smaller, insubstantial” disparities. The appropriate justification for recommending acceptance of “big” disparities as clear evidence of prima facie disparate impact is not that they reflect big real-world disparities. Rather, such evidence typically inspires more confidence than evidence of smaller disparities that some real disparity exists. Smaller differences are no less important, but smaller differences generally provide less confidence that anydifference exists (sample size and all else equal). The exception that proves this rule is a case like Jones,75 where there is a small effect size but a very large sample size, supporting the court’s confidence that the disparity is not the product of chance.
Thus, the four-fifths rule really ought to be a “rule of thumb.”76 As the EEOC guidance recommends, smaller differences than advised by the rule should not be rejected as insufficient proof of prima faciedisparate impact; instead, they should be scrutinized more closely.77
These considerations also indicate an important way in which the standard for prima facie disparate impact demonstration should be strong. It is possible that some statistical evidence for “large” differences over the four-fifth rule’s cutoff are actually unconvincing evidence. The most intuitive example is evidence involving a small sample. Imagine five people, two white and three black, apply for a job. The two white applicants and one black applicant are not excluded by the company’s policy. This involves an enormous disparity between white-applicant and black-applicant hiring rates. Yet, this does not give us confidence that the defendant’s policy caused a disparate impact. Accordingly, courts have recognized the limited value of small sample sizes in disparate impact cases.78 This exemplifies the appropriateness of a confidence inquiry.
This is made all the more complicated by the multiple meanings of “practical significance.” Some courts use it to analyze the magnitude of a disparity,79 which I argue is inappropriate at the prima facie stage. Yet other courts refer to practical significance when pointing to a worry about the confidence in a statistically significant difference.80 Unlike the former, the latter is a legitimate inquiry at the prima facie stage of disparate impact.
This issue is not merely terminological. Judges writing in support of a “practical significance” requirement or inquiry should investigate which meaning of practical significance they intend to employ. For instance, when discussing whether a disparity is “substantial” (read in the magnitude sense),81 the First Circuit was concerned with whether the disparity was “due to chance” (closer to the sense of a confidence inquiry), not whether the disparity was of a certain magnitude.82It is misleading to interpret these decisions as support for a practical significance requirement in the sense of an inquiry into the sufficiency of a disparity’s size. Practical significance, in the sense of a disparity of requisite size, is distinct from confidence in a statistically significant result.
Commentators also commit this error:
On the one hand, statistical significance allows plaintiffs to demonstrate that a particular practice causes some disparity between classes (the “disparate” prong of the inquiry); on the other, practical significance determines if that disparity is large enough to have real-world implications (the “impact” prong of the inquiry).Practices that do in fact create a noticeable disparate impact would implicate both of these considerations.83
A prima facie case of disparate impact does not depend on whether we care sufficiently about the size of the impact; evidence of adverse impact establishes the prima facie case, even if that adverse impact is small.
A tempting policy counterargument is that prohibiting magnitude inquiries at the prima facie stage would incentivize frivolous disparate impact litigation.84 But this claim underestimates the strength of the statistical significance requirement. As Jones explained,85 requirements to show statistical significance will frequently eliminate frivolous lawsuits, since small-sized impacts will require large sample sizes to demonstrate statistical significance.86 Second, if the defendant shows job-related business necessity, the plaintiff will still have to prove an alternative practice with less impact. This will be relatively easier when the magnitude of the disparity is large, providing a balanced corrective.87 In cases in which the prima facie impact is small, the plaintiff will still have a larger burden in the demonstration of an alternative practice since the alternative policy has less room to reduce the disparity than if the disparity were large.
A further reason not to fear a rise in trivial claims is that complainants have little personal incentive to bring disparate impact claims. Disparate impact relief is limited to equitable relief and back pay. Compensatory and punitive damages are not available, as they are for disparate treatment claims.88
A final, but important, response to this counterargument concerns the logic of burden shifting in disparate impact litigation. The previously articulated responses address worries about an increase in frivolous employee complaints by justifying the unlikelihood of an effect on the complainant. But disallowing consideration of practical significance at the prima faciestage might instead have an effect on employers. Knowing that any robustly proven disparity can shift the burden to the defendants can have an important effect outside of litigation, encouraging employers to reflect on whether their policies and procedures that have such an impact are actually job-related business necessities, or whether less discriminatory alternatives exist. One way of responding to potential litigation is to reinforce incentives for employers to eliminate the very practices and procedures that unnecessarily impact protected classes.89 Thus, concern about the litigation effects of changing the statistical burden in fact provides an additional reason for condemning the use of magnitude inquiries at the prima facie stage.
Statistics are also relevant to the second stage of disparate impact litigation, in which a defendant must prove the job-relatedness and business necessity of a policy that has been shown to have a prima facie disparate impact. Compared to the plaintiff’s prima facie standard, the defendant’s proof of job-related business necessity is typically described as a more stringent standard. Prima faciedisparate impact requires a plaintiff to “only show” that a policy causes a “discriminatory pattern,” while job-related business necessity requires proof that the policy has “a manifest relationship to the employment in question.”90
This distinction supports the aims of disparate impact. The prima faciestage only identifies a disparity-causing policy. The second stage offers the employer the opportunity to prove that the discriminatory policy falls within the subset of policies that Title VII is willing to tolerate on the basis of business necessity. As such, the second stage requires a more robust consideration of the policy’s significance to business interests; a mere relation is not necessarily sufficient to permit discrimination.
At the second stage, the defendant must show that the contested policy is “job related” and “consistent with business necessity.”91 The EEOC’s Uniform Guidelines indicate three measures of validation in assessing this demonstration of job-related business necessity: criterion-related, content, and construct validation. Criterion-related validation requires empirical data showing that the selection procedure “is predictive of or significantly correlated with important elements of job performance.”92 Content validation requires “data showing that the content of the selection procedure is representative of important aspects of performance on the job.”93 Construct validation requires data showing that the selection “procedure measures the degree to which candidates have identifiable characteristics which have been determined to be important in successful performance in the job.”94
Many courts require showing both statistical and practical significance in defending a discriminatory test.95 To assess these showings, courts often look to the correlation coefficient, a numeral measure from -1 to 1 of the relation between two values, between the test and job performance.96 But many do not look specifically at the practical significance of the (statistically significant) correlations presented as evidence of validation. Although defendants often have to show a moderate correlation between a policy (e.g., a test) and the outcome (e.g., job performance), this is often not interpreted through the lens of a magnitude inquiry, asking how big the actual relationship is.
The use of correlation coefficients ought to be accompanied by a practical significance analysis of the policy or procedure’s job-relatedness and business necessity. Specifically, it ought to be accompanied by consideration of both the relevance of the evidence to a real-world job-related business necessity and the magnitude of this relation. A test that is merely correlated with job performance might not actually be related to the job or a “business necessity.” For instance, achieving a certain score on a general standardized achievement test might be correlated with some aspect of job performance, even though that achievement is not actually a strong predictor of job success.
Taking practical significance into account means rejecting implausible claims of job-relatedness in which there is no strong relation between the policy and outcome. For instance, in Dickerson v. U.S. Steel Corp., a statistically significant correlation between a policy and job performance of 0.3 was rejected since it was found to have little practical significance—indicating only nine percent of job success attributable to the disparity-causing policy.97
In other cases, paltry consideration of practical significance permits evidence of job-relatedness that has little practical significance. Consider United States v. City of Garland.98 There, the court determined that police and firefighter job examinations were job related on the basis of a significant correlation between those exams and performance on academy exams and state certification exams. Yet an important practical significance question was obscured: what magnitude of significance do these exams have to the job? The practical significance of an exam’s results for job-relatedness and consistency with business necessity cannot be inferred simply from a correlation between the exam and another exam. Moreover, there is little rigorous consideration of the magnitude of this effect; even if the evidence supports a good inference for the job-relation, does it support evidence of a sufficiently large effect consistent with business necessity?
This particular decision is even more problematic. In the same decision, the court determined that there was insufficient practical significance to establish a prima facie case of disparate impact,99 and there would be sufficient proof of job-relatedness consistent with business necessity, without serious consideration of the practical significance of this evidence.100 This provides an example of a bizarre practice: a relatively high practical significance requirement in the prima facie stage of disparate impact, but a paltry one in the job-relatedness business necessity stage.
At the third stage, courts use statistics to evaluate the plaintiff’s demonstration of a nondiscriminatory alternative policy.101 This stage provides a final opportunity for the plaintiff to rebut the defendant’s job-related business necessity defense by offering an alternative policy that could serve the business’s legitimate interests without the same discriminatory impact. Comparatively few disparate impact cases proceed to this third stage, but the cases that do consider alternative proposals may look to statistics to evaluate the merits of the alternative proposal. Courts may conduct practical significance inquiries following the logic of the Stage 1 inquiry, asking whether the evidence indicates that the alternative proposal will have large practical significance, greatlyor sufficiently reducing discriminatory impact. Such an inquiry is a magnitude inquiry, assessing the sufficiency of the effect size. As in the other stages, there is also room at the third stage for a confidence inquiry, asking whether the court is confident that the alternative proposal will reduce discrimination (at any rate).
The argument here follows directly from the logic of Section II.A. Given disparate impact’s foundational texts, purposes, and interpretations, practical significance testing—in the sense of measuring the magnitude of some disparity reduction—should not be relevant to assessing an alternative policy.102 A suitable alternative policy should be accepted regardless of whether it decreases discriminatory impact by a small or large amount. What matters is that the policy can be expected to actually reduce discriminatory impact.103 Disparate impact aims to remove all unnecessary discriminatory barriers, not just the largest ones. The third stage inquiry asks whether the employer “refuses to adopt an available alternative employment practice that has less disparate impact and serves the employer’s legitimate needs.”104 The employment practice need not have dramatically, substantially, or even significantly less disparate impact—just “less.”
Because comparatively fewer cases proceed to this stage of disparate impact analysis, this Section first considers a stylized example, an imagined case in which this issue of practical significance is clearly implicated at the third stage of disparate impact analysis. Imagine, as an example, that a disparate impact case has proceeded to the third stage. The contested policy results in a twenty-percentage-point racial difference between black and white test takers, but the alternative policy would still result in a fifteen-percentage-point differential. Given some courts’ treatment of practical significance at the prima facie stage, one could imagine courts deciding that a five-percentage-point reduction does not support a suitable alternative policy because it is not “practically significant.”
The fundamental aim of disparate impact is the removal of unnecessary and discriminatory barriers so as to make factors like race irrelevant in employment practices. This means that assessment of an alternative practice’s magnitude of disparity reduction is dubious. Does the fact that some alternative practice would be only modestly less discriminatory justify rejecting it? The broader disparate impact framework clearly recommends any policy that offers even modest discrimination reduction. While confidence inquiries are relevant—at this stage and all the others—magnitude inquiries are not appropriate in weighing an alternative policy.
How should courts resolve these questions of disparate statistics? This Part proposes three solutions. The most theoretically justifiable solution is to strike magnitude inquiries from the first stage’s prima facie demonstration and the third stage’s evaluation of a suitable alternative, but to adopt a more robust analysis at the second stage when the defendant presents a job-relatedness and business-necessity rebuttal.
Before turning to these recommendations, it is worth summarizing the framework that has been established and the broad conclusions. First, recall the distinctions between magnitude inquiries—those asking whether some disparity, business interest, or impact reduction is big enough—and confidence inquiries—those asking whether how strong an inference can be drawn from statistical evidence to the real world.
This figure highlights that courts that conduct magnitude inquiries at Stage 1 but not Stage 2 are not only conducting investigations at odds with the text and interpretation of disparate impact, as I have argued, but are also conducting unjustifiably asymmetric inquiries.
The primary recommendation is that courts should reject a magnitude inquiry for the demonstrations of a prima facie disparate impact and a suitable alternative stages of analysis, but insist on a more robust inquiry during the defendant’s job-related business necessity rebuttal. This solution is the most faithful to the text, purpose, and precedent of disparate impact law. Requiring a showing of prima facie disparate impact of a particular size is out of line with the original interpretations of Title VII,110 as well as with more recent Supreme Court precedents.111 Equally inappropriate is a thin or absent investigation into the practical significance of a defendant’s defense that a disparity-causing impact is a job-related business necessity,112 requiring proof of a manifest113 relationship between the contested policy and a job-related function that is not simply justified by a business interest,114 but is a “business necessity.”115
Even if this primary recommendation is not adopted on the basis of Part II’s arguments, there are two more modest possibilities that deal with the asymmetry of conducting magnitude inquiries at some but not all stages of disparate impact analysis. The previous Part justifies a disparity in the use of statistics in one direction—magnitude inquiries during the second stage, but not during the first or third stages—given the aims of disparate impact theory. However, an asymmetry in the other direction is entirely lacking in justification.
One option to deal with the asymmetry is to “level down” the magnitude inquiry requirement across the stages of disparate impact analysis, for example by removing it from the prima facie disparate impact standard. If a court engages in no serious magnitude analysis at the business-necessity-rebuttal stage, it should not engage in a rigorous one at the prima faciestage (or third stage). The other option is to “level up” the practical significance requirement across stages of disparate impact litigation, most crucially by raising the level of magnitude inquiry at the business-necessity rebuttal. If a court engages in a robust magnitude analysis at the prima faciestage or evaluation of a less-discriminatory alternative stage, it ought to do the same at the job-related and business-necessity-rebuttal stage.
These two options represent each half of the primary recommendation. They are less justifiable than the primary recommendation, but they do correct the unjustified asymmetry and they may be more immediately attainable. And they would be preferable to current practices that require a magnitude demonstration for the plaintiff’s prima faciedemonstration, but not for the defendant’s business-necessity demonstration.
If adopted, any of these recommendations will likely affect both disparate impact litigation and business practices out-of-court. First, the recommendations may result in different decisions in disparate impact cases. The Jones116 decision is recent, but already there are suggestions that removal of a prima facie stage magnitude inquiry will result in different outcomes for disparate impact cases. For instance, Smith v. City of Boston held that a police department’s lieutenant-selection process had a racially disparate impact and was not a job-related business necessity that survived disparate impact analysis.117 Like many analyses of disparate impact, the discussion of the prima facie stage involved much consideration of statistics. The court frequently cited to Jones in requiring that the impact not be the result of chance,118 declining to require a strict statistical significance cutoff,119 and rejecting the necessity of the four-fifths rule, interpreted as a practical significance requirement.120 Removing such magnitude requirements from the prima faciestage will allow a broader range of legitimate disparate impact claims to go forward. Just because a policy has a “small” discriminatory impact does not mean the interests of those people are less deserving of protection.
Requiring a more robust analysis of the magnitude and confidence aspects of practical significance at the business necessity stage would also affect disparate impact litigation. Cases like United States v. City of Garland121would require a more rigorous analysis. Discriminatory policies justified on the basis of business necessity would be held to a more robust inquiry into the degree to which the business necessity affects the legitimate interest.
Similarly, prohibiting magnitude inquiries at the third stage would allow disparate impact to better realize its aims. The third stage targets any real reduction in discrimination. Together these recommendations remedy inappropriate uses of statistics in disparate impact litigation and provide opportunity for greater plaintiff success in legitimate employment discrimination suits.
Second, the recommendations would also have out-of-court effects. The recommendations might seem likely to increase disparate impact litigation. Regardless of whether that would actually occur,122 removing barriers to successful disparate impact claims should incentivize employers to reflect more thoughtfully on their employment practices. Disparate impact litigation is just one mechanism to achieve the aims of greater employment equality. Another mechanism—in many ways a preferable one—is for employers to simply remove disparity-causing policies and procedures that are not business necessities or replace necessary policies with less discriminatory ones. Whether or not disparate impact litigation increases, it is plausible that there is power in the mere threat of litigation; any real disparity—no matter how small—caused by an employment practice could be the target of disparate impact litigation. It may be difficult to quantify the number of discriminatory employment practices that the employer might remove to comply with disparate impact law or to minimize the threat of litigation. But that effect is equally important.
These arguments are not partisan; they do not unfairly help plaintiffs bring disparate impact claims, whether meritorious or not. In fact, the recommendations might result in reduced litigation—for example, if employers respond to the threat of litigation by removing unnecessary and discriminatory employment barriers. Rather, these arguments seek to correct the use of statistics to achieve the proper understanding of both disparate impact’s antisubordination aim and its business necessity limit. Whether the acceptance of such recommendations would help plaintiffs, defendants, employees, or employers depends on empirical questions about the parties’ responsive behaviors. The removal of discriminatory employment practices, subject to the limit of business necessity, whether through litigation or other incentives, is at the heart of disparate impact law. Remedying disparate and inappropriate uses of statistics moves the law closer to this fundamental aim.
Consider again the very project of paying attention to such “small” disparities. It might seem unimportant to focus on ameliorating employment practices where the local discriminatory impact reduction is relatively small, such as reducing a discriminatory impact by just five percent. But such reasoning belies the theory of disparate impact: no discriminatory effect is too small to matter. But even one unconvinced by disparate impact theory and compelled by a singular focus on big effects must address a second practical consideration. Small discriminatory effects at multiple points—in an individual’s life or across a group—result in large cumulative disadvantages.123 This Note’s recommendations suggest broad changes in the way courts treat numerous small disparities. It may be the case that the current social landscape does not consist primarily of policies that cause infrequent and large disparities but rather, an enormous web of smaller-disparity-causing policies, the combination of which results in large disparities overall. If so, the total effect of attending more judiciously to “small” disparities may not be so small.
Statistics, particularly “practical significance,” play a crucial role in disparate impact analysis. This Note distinguishes between two types of practical significance inquiries: magnitude inquiries—questions about the magnitude of a finding supported by statistical evidence—and confidence inquiries—questions about the strength of statistical evidence. Looking across the three stages of disparate impact analysis, I argue for the inappropriateness of magnitude inquiries at the first prima facie stage of demonstrating disparate impact and at the third stage of providing a less discriminatory alternative, but that such a robust inquiry should come at the second stage of a defendant’s job-related business necessity rebuttal.
This buttresses recent court decisionsto not require demonstration of a particular magnitude of disparity at the prima facie stage. It also outlines a holistic conception of practical significance testing across every area of disparate impact analysis, a project bearing on the current circuit split and also the doctrine’s future challenges.
The consequences of these conclusions should not be underestimated. A universal rejection of magnitude inquiry at the prima facie stage of disparate impact would have a large effect. Cases like Moore v. Southwestern Bell Telephone Co.124 and Frazier v. Garrison125 would require different justifications. Requiring a more robust analysis at the defendant’s rebuttal stage would be equally impactful, requiring more thoroughgoing analysis in cases like United States v. City of Garland.126
These changes are justified. Requiring that a prima facie disparate impact be of a certain magnitude invites inappropriate subjective weighing, asking judges to assess whether a disparity is big enough. Failing to inquire robustly about the practical significance of a defendant’s rebuttal is equally problematic—resulting in the justification of policies that have a discriminatory impact on the basis of slack correlations. So too is requiring that an alternative proposal be sufficiently less discriminatory, rather than simply less discriminatory.
All of these practices are at odds with the motivation and aims of disparate impact: a prima facie disparate impact must simply demonstrate a disparity caused by the contested policy on a protected class; a job-related business necessity defense is meant to show the weighty significance of the contested policy, which must bear a manifest relationship to the employment, justifying the permission of a discriminatory policy; and an alternative proposal is meant to provide a policy that serves the legitimate business interests, with a large or even small degree of lesser discriminatory impact.
The Note recommends analyzing and correcting these uses of practical significance testing across the three stages of disparate impact analysis. The recommendations advance disparate impact’s fundamental aim: removing artificial and arbitrary barriers that operate to discriminate on the basis of a protected classification.