Estimating the seroprevalence of SARS-CoV-2 infections: systematic review

Abstract Background: Accurate seroprevalence estimates of SARS-CoV-2 in different populations could help gauge the true magnitude and spread of the infection seroprevalence. Reported estimates have varied greatly, but many have derived from biased samples, and inadequate testing methods. Objective: To estimate the range of valid seroprevalence rates of SARS-CoV-2 in different populations, and compare these seroprevalence estimates with the cumulative cases seen in the same population. Methods: We searched PubMed, Embase, the Cochrane COVID-19 trials, and Europe-PMC for published studies and pre-prints from January 2020 to 25 May 2020 that reported anti-SARS-CoV-2 IgG, IgM and/or IgA antibodies for serosurveys of either the general community or of defined sub-populations, such healthcare workers and other organizations. Results: Of the 837 studies identified, 49 were assessed and 14 were includable. Included studies represented 10 countries and 100,557 subjects: 9 from randomly selected populations, 2 from healthcare workers, 2 from industry populations, and 1of parturient women. The seroprevalence proportions in 10 studies ranged between 1%-10%, and 2 study estimates under 1%, and 2 over 10% - from the notably hard-hit regions of Gangelt in Germany and from Northwest Iran. The two studies in healthcare workers, in Italy and Spain, had seroprevalence rates at higher range of estimates, with the Barcelona hospitals having a higher rate than the Spanish national survey. For only one study was the seroprevalence estimate higher than the cumulative incidence, though these were proximate for several studies. In five studies, the seroprevalence was similar to the cumulative case numbers in the same population. For seropositive cases not previously detected as COVID-19 cases, the majority had prior COVID-like symptoms. Conclusion: The seroprevalence of SARS-CoV-2 mostly less than 10% with the level of infection lower in the general community, suggesting levels well below herd immunity. The similarity of seroprevalence and reported cases is several studies, and high symptom rates in seropositive cases suggest that gaps between seroprevalence rates and reported cases are likely due to undertesting of symptomatic people.


Introduction
Accurate seroprevalence estimates of SARS-CoV-2 in the general population may help assess the true magnitude and spread of the infection. Seroprevalence tests measure a persons' immune response (serum antibodies) to SARS-CoV-2 and thus allow determination of past virus exposure, independently of whether a person had COVID-19 symptoms or not. In contrast, reverse transcriptase polymerase chain reaction (RT-PCR) tests, which detect the virus's genetic material (ribonucleic acid, RNA) on nasal or throat swabs or saliva, are typically only used in people with presumed or possible acute infection based on their symptoms. 1 The number of COVID-19 cases based on RT-PCR testing in a population is likely to underestimate the true extent of infection, as many infected people may have only minor symptoms and will therefore not have been tested. 2 Barriers to accessing testing (fear of stigma, cost associated with testing, younger age etc.) may additionally contribute to under-testing among symptomatic people.
Seroprevalence estimates, if done correctly, will therefore provide a more accurate reflection of the true extent of SARS-CoV-2 infection among a population and the impact of public health interventions. However, reliable seroprevalence estimates for a population depend on two major factors: a representative population sample and accuracy of the antibody testing. For example, antibody testing should not be biased towards including predominantly symptomatic people or people who know that they have been exposed to a person with  Incorrect sampling for serosurveillance data will lead to errors in the estimated population proportion with presumed immunity to the virus, the infection fatality rate, and the effective reproductive number (Rt). 4 The accuracy of most antibody-tests, which measure immunoglobulin (Ig) M, IgG, and occasionally IgA antibodies against SARS-CoV-2, has been highly variable and tend to improve up to three weeks since symptom onset, which is in line with antibody development against viral infections, but data beyond three weeks are scarce. [5][6][7] Some evidence suggest that in infected asymptomatic people, a reduction of serum antibodies is already observed during the early convalescent phase. 8 We aimed to identify and summarise all studies that reported seroprevalence estimates for SARS-CoV-2 infection using a representative target population sampling framework. The results elucidate differences between the cumulative incidence of confirmed COVID-19 cases (using RT-PCR testing) and the likely true extent of the infection among a population based on seroprevalence estimates using antibody testing.

Methods
We conducted a systematic review using enhanced processes and automation tools. 9 We searched the PROSPERO database to ensure that no similar review had already been performed or planned; then searched PubMed, Embase, Cochrane COVID-19 trials for published studies, and Europe PMC for pre-prints from January 2020 to 25 May 2020. A search string composed of Medical Subject Headings (MeSH) terms and words was developed in PubMed and was translated to be run in other databases using the Polyglot . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 15, 2020. . https://doi.org/10.1101/2020.07.13.20153163 doi: medRxiv preprint Search Translator. 10 The search strategies for all databases are presented in Supplement 1. We also conducted forward and backward citation searches of the included studies in the Scopus citation database.
We included published and pre-print reports of primary data that contained sufficient details for risk of bias assessment. We anticipated cross-sectional prevalence surveys or repeated surveys would make up the majority of eligible reports. No restrictions on language were imposed. We excluded studies for the following reasons: high risk of bias in sampling, i.e. the study sample was likely not representative of the target population; a response rate below 25%; government reports without sufficient details to evaluate risk of bias; modelling or simulation studies even if they used real data (but sources of real data were checked for possible inclusion); lack of information about the antibody test(s) used to determine seroprevalence; and editorial or historical accounts without sufficient data to calculate the primary outcome (e.g. lack of data on cumulative cases in the population detected using RT-PCR). A list of excluded studies can be found in Supplement 2 with reasons for exclusion.

Participants
We included seroprevalence studies of representative random sample of the population to assess overall seroprevalence in general community, and studies of representative special population samples (e.g. health care workers (HCWs) who cared for patients with COVID-19) to assess seroprevalence among these special populations. We included seroprevalence testing that tested for anti-SARS-CoV-2 IgG, IgM, and IgA antibodies in combination or separately.

Outcomes
Our primary outcome was the comparison of the estimated proportion of the population with antibodies against SARS-CoV-2 compared to the cumulative case incidence in the same target population. Secondary outcomes were: (1) the comparison of the seroprevalence based on antibody testing in the study sample with the cumulative confirmed incidence of people tested positive for SARS-CoV-2 by RT-PCR in the study sample or in the target population and; (2) a cumulative incidence estimated from the cumulative COVID-19-specific mortality 2 weeks after the seroprevalence and assuming a case-fatality rate of 1%. 11 Study selection and screening Two authors (OB and CCD) independently screened titles, abstracts, and full texts according to eligibility criteria. All discrepancies were resolved via group discussion with the other authors. Reasons for exclusion were documented for all full text articles deemed ineligible (Supplement 2) -see PRISMA diagram ( Figure 1).

Data extraction
Five authors (OB, CCD, KB, PG, DPR) extracted the following information from each study and from outside sources: 1. Methods: study authors, year of publication, country or region, publication type . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 15, 2020. . https://doi.org/10.1101/2020.07.13.20153163 doi: medRxiv preprint 2. Participants: sampling frame, sample size, age, sex, setting, previous exposure or testing for COVID-19 3. Outcomes: Types of tests (specific name of test, and whether rapid or lab-based), specificity and sensitivity of the tests used (study specific or referenced diagnostic test accuracy report), study seroprevalence (point estimate and confidence interval) regional seroprevalence (point estimate adjusted for study design and test accuracy when this was reported), and cumulative COVID-19 cases in the study sample. Where the regional seroprevalence was not reported, then a calculated regional seroprevalence estimate was derived by multiplying the regional case rate by the ratio (study seroprevalence/study cumulative cases). 4. Date of seroprevalence sampling (to enable identification of separately reported cumulative incidence rate in the sampling frame at around the same time as seroprevalence study). 5. Other information: Cumulative incidence of COVID-19 and cumulative COVID-19 specific mortality in the larger region at around the time that the study was done.

Risk of bias assessment
We used a combination of risk of bias tools for prevalence studies 12 and diagnostic accuracy 13 and adapted the key signaling questions on sampling frame, ascertainment of immune status, acceptability of methods and tests, and appropriateness of testing and sample collection timeframe, as shown in Supplement 3 in full.

Data synthesis
We used absolute numbers and proportions for the primary outcome. As only studies deemed to be of sufficient quality after critical appraisal were included in the analysis, no sensitivity analysis of high versus low quality studies was undertaken. We did not pool the estimates due to heterogeneity of populations.

Results
We screened titles and abstracts of seven hundred and eighty-six articles and the full text of 49 articles for potential inclusion ( Figure 1). The major reason for exclusion was high risk of bias in the selection of participants. (Full list of excluded studies with reasons is in Supplement 2.) Fourteen articles -9 preprints, 3 published studies, and 2 government reports-from 10 countries (Brazil (2), Spain (2), United States of America (USA) (2), Germany (2), and one each from Italy, Croatia, Iran, Luxembourg, Switzerland and the Channel Islands) that tested a combined total of 100,557 participants met eligibility criteria for the estimation of the primary outcome. [14][15][16][17][18][19][20][21][22][23][24][25][26][27] . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Figure 1. Screening and selection of articles for the review
The study populations included a randomly sampled general populations 17,20-27 ; most health care workers (HCWs) of hospitals 14,16 ; most industry workers 18,19 ; and all parturient women. 15 (Table 1) Eight studies tested adults only (>16 years) [14][15][16]18,19,23,24,27 and six tested population of all ages 17,[20][21][22]25,26 -the proportion of children and young people (0-19 years) ranged from 9% to 34% and the proportion of participants aged over 70 years ranged from 0% to 14%. Ten studies tested for both anti-SARS-CoV-2 IgG and IgM, the rest tested for IgG only or IgG and IgA. A half of the studies also collected nasopharyngeal swabs for RT-PCR testing at the same time as serologic testing. [14][15][16]19,20,23,25 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 15, 2020. . https://doi.org/10.1101/2020.07.13.20153163 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 15, 2020. . https://doi.org/10.1101/2020.07.13.20153163 doi: medRxiv preprint The estimated infection rates at the study level ranged considerably (Figure 2A): ten studies reported seroprevalence between 1%-10%; two studies had estimates under 1%, 19,22 and two studies had estimates over 10%. 21,25 The (unadjusted) seroprevalence estimates in the included studies (blue filled squares with confidence intervals) ranged from 0.1% (general community, Rio Grande do Sul, Brazil 22 ) to 22.2% (general community, Guilan, Iran 21 ). The cumulative case prevalence in the study population (based on RT-PCR testing) was available in nine studies (red filled circles with confidence intervals), and ranged from 0.3% (industry workers, Frankfurt, Germany 19 ) to 9.1% (healthcare workers, Barcelona, Spain 16 ). For some studies the two types of estimate were similar (e.g. healthcare workers in Barcelona, Spain 16 ; industry workers in two counties, Croatia 18 ), but for others the seroprevalence estimate was substantially higher than the cumulative case estimate (e.g. healthcare workers in Trieste, Italy; general community in Guilan, Iran).
The estimated cumulative incidence rates at the regional levels are shown in Figure 2 (B). For two studies in healthcare workers (Barcelona, Spain 16 and Trieste, Italy 14 ), regional seroprevalence estimates were not reported in the primary study, and calculated estimates were used instead (multiplying the regional cumulative case incidence by the ratio (study seroprevalence/study case incidence). The adjusted seroprevalence estimates for regions (blue unfilled squares) ranged from 0.1% (general community, Rio Grande do Sul, Brazil 22 ) to 33.0% (general community, Guilan, Iran 21 ). The corresponding cumulative reported case incidence for regions (red unfilled circles) ranged from 0.05% (general community, Guilan, Iran 21 ) to 3.1% (general community, Gangelt, Germany 25 ). The calculated cumulative case incidence for regions imputed from reported COVID-19 deaths (assuming true CFR of 1%, red crosses) ranged from 0.05% (general community, Rio Grande do Sul, Brazil 22 ) to 8.4% (general community, Gangelt, Germany 25 ). For regions where the estimated seroprevalence was low (<4%), all three estimates were similar (e.g. general community, Brazil 22 ) or the seroprevalence estimate was similar to one of the two other estimates (e.g. seroprevalence and reported cases: Barcelona, Spain 16 ; seroprevalence and calculated cases based on reported deaths: Luxembourg 23 ). For regions where the estimated seroprevalence was higher (≥4%), the seroprevalence estimate was generally higher than the other estimates, and sometimes substantially so (e.g. general community Guilan, Iran 21 ).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 15, 2020. . https://doi.org/10.1101/2020.07.13.20153163 doi: medRxiv preprint The relationship between all the outcome estimates for each study/region on the log scale are shown in Figure 3. The horizonal axis shows (unadjusted) study seroprevalence, ranging from 0.1% in Rio Grande do Sul, Brazil 22 to 22.2% in Guilan, Iran 21 , with most estimates between 1.0-10.0% (shaded blue column). The matching estimates are plotted on the vertical axis for each study: adjusted regional seroprevalence in open blue squares, imputed regional cumulative cases based on deaths in red crosses, cumulative study cases in filled blue circles and regional reported cumulative cases in open red circles. The upper diagonal (identity) line indicates estimates that are equal to the study seroprevalence estimate, and the lower diagonal line indicates estimates that are 1/10 that of the study seroprevalence estimate. The regional seroprevalence estimates for each study are closest to study seroprevalence estimates, as shown by their location closest to the upper diagonal identity line. Variations away from the identity line reflect adjustments made for study sampling frame and test accuracy. In general, cases imputed from reported deaths are next closest to the seroprevalence estimates, although there is considerable variation in how close: imputed cases for Philadelphia, USA 15 matched the seroprevalence almost exactly, while those for Spain 20 and Guilan, Iran 21 were around 1/10 of the seroprevalence. Next closest were the study cumulative case estimates, where differences in test accuracy of antibody vs RT-PCT tests may explain most of the within study differences (for most studies all individuals had both types of test). The estimates that differed the most from those of the study seroprevalence (furthest away from the identity line) were the reported regional case estimates, with many falling below the 1/10 seroprevalence line, some notably so (Guilan, Iran; 21 Philadelphia, U.S; 15 and Geneva, Switzerland 26 ). All raw data used for Figure 2 and 3 are provided in Supplement 4.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 15, 2020. .

Figure 3. Log-log plot of study seroprevalence (x-axis) vs three cumulative case estimators for each study. Diagonal lines indicate rates equal to seroprevalence (solid) or 1/10 seroprevalence (dashed).
Typical COVID-like symptoms prior to serologic testing -which would help assess possible untested or undetected cases -where more common in the seropositives. Only 8 of the 14 studies provided data on this, and the symptoms, time frames, and measures were varied. (Table 2) Self-reported symptoms included the frequency of any acute respiratory infection, fever, cough and loss of smell and taste among the participants. About third to a half of the participants in six studies reported having typical COVID-like symptoms in the 2 weeks to 3 months prior to the serologic testing. Symptoms were more common in people who had positive compared to negative serology but varied with the type of symptom. Having any acute respiratory infection (ARI) symptoms increased the odds of positive serologic testing by 1.6 to 8.8-fold, whereas for the specific individual symptoms this ranged from 2.8-fold (fever) to 18-fold (loss of smell and taste). Snoeck et al 23 also reported other non-specific symptoms such as headache, chest pain, skin rash, nausea, and fatigue for 7/35 (20%) participants and no symptoms for 10/35 (29%).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 15, 2020. . https://doi.org/10.1101/2020.07.13.20153163 doi: medRxiv preprint  Table 3 summarizes the overall risk of bias assessment of the 14 included studies (full extension of risk of bias questions in Supplement 3). Most of the studies were evaluated as low risk of bias for the sampling frame because they recruited participants randomly from the general population. Four studies had selective population of HCWs and industry workers that were not fully representative of the general population (Domain 1). Three studies had a response rate between 31-44% and three between 55-68%. The rest of the studies had response rates between 73-100% (Domain 2). Domain 3 assessed the potential to over-or underestimate the seroprevalence based on the diagnostic accuracy of the individual antibody tests used in each study. Although each study provided specificity and sensitivity for the tests based on internal and external (manufacturer) validation, it was difficult to confidently evaluate the impact on the study results without a single-source validation that would enable unbiased comparison. Every study except for the Spanish national sero-survey used the same test and test specimens in all study participants (Domain 4). The Spanish survey did not collect a serum sample that required venipuncture from children, but only tested them using the rapid test (finger prick blood sample). All studies but one reported the dates of sample collection and testing, which were all within 3 weeks. The Iranian study reported the study timeline as April only without specific details. (Domain 5)

Risk of Bias of included studies
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 15, 2020. . Table 3. Risk of bias in 14 included studies. Green smiley face denotes low risk of bias; yellow straight facemoderate or unclear risk; and red sad face high risk-of bias.

Discussion
The seroprevalence proportions in the 14 valid studies identified ranged considerably. Two studies had estimates under 1%, and two studies had estimates over 10% -from the two notably hard-hit regions of Gangelt in Germany and Northwest Iran. The two studies in healthcare workers, in Italy and Spain, had seroprevalence estimates at the higher end, with the Barcelona hospitals having a higher rate than the Spanish national survey. For all but one study, the seroprevalence estimate was higher than the cumulative incidence as expected, although these were close for several studies. However, low testing and detection rates are likely to have occurred in several of the regions. This under-detection is suggested by both the differences in estimates from the cumulative cases versus that inferred from the COVID-19 death rates, and by the high rates of COVID-19-like symptoms in the serologically positive patients who reported no past history of infection.
Strengths of this review lie in the methodological rigour, namely in thorough searching of published and unpublished literature without language restrictions and appraising the potential bias of the included studies. However, there are several limitations. First, we excluded several studies because of their high risk of volunteer and/or responder bias in the sampling process, but the remaining studies still had significant degrees of non-response. Second, the accuracy of the serological tests used at these early stages of the pandemic, was often unclear. A particular concern was the specificity and the likelihood of false positives in these low prevalence settings leading to potential overestimation. 6 For example, a specificity of 98% implies a 2% false positive rate even in settings with no past infections, which may be a problem for Frankfurt and Rio Grande do Sul. Third, the comparison between seroprevalence rates and reported cumulative incidence is limited by the adequacy of testing . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 15, 2020. . https://doi.org/10.1101/2020.07.13.20153163 doi: medRxiv preprint and detection processes in different regions and countries; we could only partially address this problem by using an imputed incidence from the COVID-19 death rates. A notable example of this problem is the study in North-West Iran where the apparent case fatality rate is amongst the highest in the world, and there is also some evidence of under reporting of COVID-19 deaths based on the comparison of excess deaths. Fourth, we assumed a "true" case fatality rate of 1% for all populations 11 and did not allow for any lag-time in using the mortality data to impute cumulative case incidence. Finally, the inadequate reporting of many studies, particularly the preprints, made the task of data extraction difficult. Many authors did not respond to data-related questions sent via email to the corresponding author.
There have been a couple of previous reviews of seroprevalence studies, but these focused on using the studies to infer the infection fatality rate. 28,29 We excluded some of the primary studies they included because of the poor sampling methods, with high risk of bias from the involvement of volunteers or high rates of nonresponse. However, both reviews also illustrate the substantial variation in the seroprevalence rates but with an even greater range because of inclusion of this wider range of studies at risk of bias.
The results of this review have several implications for policy and practice. First, in all of the studies the estimated seroprevalences falls well short of the rates required for herd immunity suggesting that herd immunity is unlikely to be achieved without mass vaccinations. Most infection rates are an order of magnitude lower than would be required for herd immunity, and even the seroprevalence found in North-West Iran is around one third of what would be required. Second, studies in regions with relatively wide-spread testing and detection show only a modest gap between the seroprevalence and the case cumulative incidence, suggesting that much of the gap between reported cases and seroprevalence is likely to be due to undetected cases. Third, the variation and incompleteness of methods used by the studies points to the need for better standardisation, design, and reporting of seroprevalence studies, including the need for better questioning and reporting of subjects, prior history of RT-PCR testing, and history of symptoms.
Testing for an immune response to COVID-19 in recovered patients allows evaluations of the transmissibility of infection in general and specific populations, and provides improved estimations of attack rates and infection fatality rates as well as estimates of possible immunity. 30 The detection of antibodies established from the 14 studies we analysed does not infer immunity in their populations. SARS-CoV-2 shares 79.6% sequence identity to SARS-CoV 31 , and the peak level of IgG/neutralising antibodies in recovered SARS-CoV patients occurred at 4-6 months before declining. 32 However, 88%-89% of recovered SARS-CoV patients retained detectable IgG for 1-2 years 1,32 and by the fourth year 50% to 74% had detectable IgG antibodies. 32,33 Knowing the duration of immunity could inform strategic public health approaches until a vaccine is available. Accurate estimates of immunity will not only require repeat antibody testing among the population, but also establishing the association between a positive antibody response and protective immunity against the disease. The current unknown duration of IgG response and its association with disease . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 15, 2020. . https://doi.org/10.1101/2020.07.13.20153163 doi: medRxiv preprint immunity also raises questions about the validity of an "immunity passport", especially past a probable peak at 4-6 months post infection. 32,33 Findings of this article should help inform policy globally, but also trigger improved research methods and better reporting of any future studies on seroprevalence of SARS-CoV-2 infections. Comparison of seroprevalence estimates help in evaluating the performance of surveillance systems and testing practices. Larger gap between seroprevalence estimates and regional incidence rates, testing needs to be enhanced. Evidence-based and targeted public health measures informed by accurate real-world data will help us successfully navigate the uncertain dynamics of this new pandemic.