It’s rare for a federal court of appeals to toss a defense jury verdict in an employment-discrimination case, and rarer still for the panel to order entry of a judgment in favor of a plaintiff. Yet both things happened in yesterday’s Seventh Circuit decision, which held that a group of female paramedic applicants proved they were unlawfully screened out of employment due to an unreliable physical-skills entrance examination.
Ernst v. City of Chicago, No. 14-3783 (7th Cir. Sept. 19, 2016): From the 1970s through the year 2000, the Chicago Fire Department (CFD) hired paramedics without any physical-skills test. But in 2000, the CFD imposed such a test and hired Deborah Gebhardt, of Human Performance Systems, Inc., to develop and administer it. “Between 2000 and 2009, nearly 1,100 applicants took Gebhardt’s entrance examination. Among these, 800 were men, and 98% of the male applicants passed. An other 300 were women; 60% of female applicants passed.”
The five plaintiffs, all with relevant paramedic experience with other units, failed the CFD physical-skills test and were thus denied employment. They brought this Title VII lawsuit alleging two, parallel theories. First, they argued that the CFD adopted the test specifically to bar women from paramedic jobs (“disparate treatment”). Second, even if the test were not deliberately created to discriminate, the test had a disproportionate effect on female applicants and did not accurately measure job-related skills (“disparate impact”).
The disparate treatment claim was tried to a jury. A key part of the jury charge, though, caused consternation for the parties and the jury. rather than plainly instruct the jury that it could enter a verdict for plaintiffs if they proved that the CFD adopted the test deliberately to bar women, the jury was instead charged that it must find that the five women would have been hired if they were men. Because they had failing scores under the physical-skills test, the instruction was practically a direction that plaintiffs must lose. And, indeed, the jury – after twice asking for clarification of the instruction – promptly entered a defense verdict.
In the second trial before the bench, the judge found that city met its burden of proving that Gebhardt’s test was job-related and consistent with “business” (operational) necessity, and that the plaintiffs failed to prove a lesser-discriminatory alternative.
The Seventh Circuit reverses the judgment, ordering a new jury trial on the disparate treatment claim and entry of a judgment in favor of the plaintiffs on the disparate impact claim. (The reason that the second jury trial may be significant, besides exposing possible sex bias in the CFD, is that a verdict in plaintiffs’ favor may entitle them to emotional distress and other compensatory damages, beyond the make-whole relief that a court may order for a disparate-impact violation.)
On the disparate treatment claim, the panel holds that the district court misinstructed the jury. The judge erred by misapprehending the claim, believing that evidence of hiring statistics must be harnessed to a Title VII “pattern-or-practice” case. But resort to “pattern-or-practice” inferences was unnecessary here:
“[T]he plaintiffs in this case argue that Chicago created a new standard operating procedure, with the specific intention of reducing or removing women from among its new paramedic hires. They do not rely on generalized claims of statistical bias against women; instead, they argue that There was no legitimate professional or safety need for Chicago to implement this particular skills test.”
Thus, the instruction misfocused the jury on sex as a factor in the specific decisions not to hire the five plaintiffs, rather than the central issue of whether the CFD “had an anti-female motivation for creating its skills test.” Not only was the charge legally erroneous, but There was evidence that the error affected the outcome, i.e., the jury’s two notes asking for clarification of the instruction. “Only four minutes after the district judge instructed them to take Instruction 24 at face value, they returned a defense verdict.”
The panel accordingly orders a new jury trial on disparate treatment (which will be assigned to a new judge under 7th Circuit Rule 36). The panel (in the concluding pages of the opinion) also considered certain evidence rulings by the district court – admission of handwritten meeting minutes as FRE803(6) business records, impeachment of Gebhardt with the “evidence that [she] had previously engaged in conduct [with the CFD] that reduced the number of jobs for which women qualified,” the use of differential testing standards for paramedic and fire-fighter applicants. These ruling will presumably guide any retrial before a jury.
On the disparate impact case, the panel determines that Gebhardt’s validity test for the physical-skills exam was so methodologically infirm that the CFD failed to prove that the exam was job-related.
Under Title VII, “[v]alidity is the extent to which a study accurately measures what it sets out to measure.” In relevant part, “[a] criterion-related validity study” as deployed here “measures a study’s validity by comparing the assessment-tool results with the criteria.” The accepted technical standards for validity are committed to EEOC regulation, 29 C.F.R. § 1607.14(B)(4). The CFD offered Gebhardt’s test as a “concurrent validity study,” where “the researcher takes two measures at the same time,” with “one measure (which is known to be valid) [used] to validate the other measure (which needs to be validated).”
The panel ticks through a long list of short-cuts and errors made in designing and administering the test.
1. The subjects used to design the test were self-selected instead of randomly-selected. “This self-selection presents an obvious concern: when an employer asks its employees to volunteer for testing, the strongest employees are most likely to volunteer.”
2. Gebhardt admitted that the volunteer subjects “did not represent the skill-set in the general population of Chicago paramedics.”
3. Because Gebhardt had only a small sample of Chicago applicants, she rounded the data out with testing results of New York paramedics, without correcting for the finding that Chicago applicants tended, on average, to score above-average. Notes the panel, “[f]or the combined Chicago and New York City scores to result in a truly normal or average score, however, the New York City paramedics’ scores would have to be significantly lower than normal,” which was not established by any data.
4. The correlation of the test to the lift-and-carry work sample had a “reliability score” of only 0.503. “That is a 50/50 chance of reliability.” Further, despite this doubtful correlation, “There was no apparent effort to separate the lift and carry from the rest of the study.”
5. “Even if reliability was fully established in this case, validity would be a problem. The plaintiffs legitimately question whether the work samples themselves are a valid measure of job skills. The problem here is that Chicago used the work-sample tests to validate the skills tests-without ever validating the work samples.”
6. The CFD failed to establish a need for timed exams:
“[F]aster performance is not always the most careful performance. We can see where some speed, though not excessive speed, may be job-related for paramedics who answer time-sensitive emergency calls. But in staying faithful to the record, we do not have the information necessary to analyze and reach a conclusion on the appropriateness of this timed test. Regardless of how appropriate it was to time the lift and carry, this work sample did not prove reliable. And an unreliable assessment tool cannot be validated.”
7. One work sample, the stair-chair push, was not shown to be related to any important function. To the contrary, “[o]n the record, the use of stair chairs in Chicago appears limited. According to Gebhardt’s findings, stair chairs are used for transporting patients into ambulances, usually through the side door. She did not indicate that they were ever used past this point.”
8. An other work sample, the stretcher lift, likewise “does not resemble skills learned on the job.” As the panel notes:
“Real paramedics raise a stretcher and then move. We question why paramedics in the work sample have their arms ‘locked.’ Regardless, when paramedics transport a real patient, they do not cycle the patient-laden stretcher up and down, and the record shows that they typically travel less than 100 feet. It is hard to imagine paramedics requiring nearly four-and-a-half minutes to cross 100 feet.”
In sum, “at least two out of three work samples are not valid. The validity of the three skills that are tested in Chicago’s entrance examination, however, depends on all three work samples being valid. This undermines the entire physical-skills entrance test that Chicago administers.”
Ultimately, the panel holds, the “lack of connection between real job skills and tested job skills is, in the end, fatal to Chicago’s case,” and “the plaintiffs should have prevailed on their Title VII disparate-impact claims.” Thus, on remand, the new judge will have to award instatement, back- and/or front-pay relief, and interest to the plaintiffs.