Evolution and epidemic spread of SARS-CoV-2 in Brazil

Brazil currently has one of the fastest growing SARS-CoV-2 epidemics in the world. Owing to limited available data, assessments of the impact of non-pharmaceutical interventions (NPIs) on virus spread remain challenging. Using a mobility-driven transmission model, we show that NPIs reduced the reproduction number from >3 to 1–1.6 in São Paulo and Rio de Janeiro. Sequencing of 427 new genomes and analysis of a geographically representative genomic dataset identified >100 international virus introductions in Brazil. We estimate that most (76%) of the Brazilian strains fell in three clades that were introduced from Europe between 22 February11 March 2020. During the early epidemic phase, we found that SARS-CoV-2 spread mostly locally and within-state borders. After this period, despite sharp decreases in air travel, we estimated multiple exportations from large urban centers that coincided with a 25% increase in average travelled distances in national flights. This study sheds new light on the epidemic transmission and evolutionary trajectories of SARS-CoV-2 lineages in Brazil, and provide evidence that current interventions remain insufficient to keep virus transmission under control in the country.

S evere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a novel betacoronavirus with a 30-kb genome that was first reported in December 2019 in Wuhan, China (1,2). SARS-CoV-2 was declared a public health emergency of international concern on 30 January 2020. As of 12 July 2020, there were >12.5 million cases of coronavirus disease 2019 (COVID- 19) and 561,000 deaths globally (3). The virus can be classified into two main phylogenetic lineages, A and B, which spread from Wuhan before strict travel restrictions were enacted (4,5) and now cocirculate around the world (6). The case fatality ratio of SARS-CoV-2 infection has been estimated at between 1.2 and 1.6% (7)(8)(9), with substantially higher ratios in those >60 years of age (8). Some estimates suggest that 18 to 56% of SARS-CoV-2 transmission is from asymptomatic or presymptomatic individuals (10)(11)(12)(13), complicating epidemiological assessments and public health efforts to curb the pandemic.

Challenges of real-time assessment of transmission
Although the SARS-CoV-2 epidemics in several countries, including China, Italy, and Spain, have been brought under control through nonpharmaceutical interventions (NPIs) (3), the number of SARS-CoV-2 cases and deaths in Brazil continues to increase (14) (Fig. 1A). As of 12 July 2020, Brazil had reported 1,800,827 SARS-CoV-2 cases, the second-largest number in the world, and 70,398 deaths. More than one-third of the cases (34%) in Brazil are concentrated in the southeast region, which includes São Paulo city (Fig. 1B), the world's fourth-largest conurbation, where the first case in Latin America was reported on 25 February 2020 (15). Diagnostic assays for SARS-CoV-2 molecular detection were widely distributed across the regional reference centers of the national public health laboratory network from 21 February 2020 on (16,17). However, several factors, including delays in reporting, changes in notification, and heterogeneous access to testing across populations, obfuscate the real-time assessment of virus transmission using SARS-CoV-2 case counts (15). Consequently, a more accurate measure of SARS-CoV-2 transmission in Brazil is the number of reported deaths caused by severe acute respiratory infections (SARIs), which is provided by the Sistema Único de Saúde (SUS) (18). Changes in the opportunity for SARS-CoV-2 transmission are strongly associated with changes in average mobility (18)(19)(20) and can typically be measured by calculating the effective reproduction number, R, defined as the average number of secondary infections caused by an infected person. R > 1 indicates a growing epidemic, whereas R < 1 is needed to achieve a decrease in transmission.
We used a Bayesian semimechanistic model (21,22) to analyze SARI mortality statistics and human mobility data to estimate daily changes in R in São Paulo city (12.2 million inhabitants) and Rio de Janeiro city (6.7 million inhabitants), the largest urban metropoles in Brazil (Fig. 1, C and D). NPIs in Brazil consisted of school closures implemented between 12 and 23 March 2020 across the country's 27 federal units/states and store closures implemented between 13 and 23 March 2020. In São Paulo city, schools started closing on 16 March 2020 and stores closed 4 days later. At the start of the epidemics, we found R > 3 in São Paulo and Rio de Janeiro and, concurrent with the timing of state-mandated NPIs, R values fell close to 1.

Mobility-driven changes in R
Analysis of R values after NPI implementation highlights several notable mobility-driven features. There was a period immediately after NPIs, between 21 and 31 March 2020, when R was consistently <1 in São Paulo city (Fig.  1C). However, after this initial decrease, the R value for São Paulo rose to >1 and increased through time, a trend associated with increased population mobility. This can be seen in the Google transit stations index, which rose from -60 to -52%, and by a decrease in the social isolation index from 54 to 47%. By 4 May 2020, we estimate R = 1.3 [95% Bayesian credible interval (BCI): 1.0 to 1.6] in both São Paulo and Rio de Janeiro cities (table S1). However, we note that there were instances in the previous 7 days when the 95% credible intervals for R included values <1, drawing attention to the fluctuations and uncertainty in the estimated R for both cities.
Early sharing of genomic sequences, including the first SARS-CoV-2 genome, Wuhan-Hu-1, released on 10 January (23), has enabled unprecedented global levels of molecular testing for an emerging virus (24,25). However, despite the thousands of virus genomes deposited on public access databases, there is a lack of consistent sampling structure and there are limited data from Brazil (26-28), which hampers accurate reconstructions of virus movement and transmission using phylogenetic analyses. To investigate how SARS-CoV-2 became established in the country, and to quantify the impact of NPIs on virus spatiotemporal spread, we tested a total of 26,732 samples from public and private laboratories using real-time quantitative polymerase chain reaction (RT-qPCR) assays and found 7944 (29%) to be positive for SARS-CoV-2. We then focused our sequencing efforts on generating a large and spatially representative genomic dataset with curated metadata to maximize the association between the number of sequences and the 2 of 6 number of SARS-CoV-2 confirmed cases per state.

Spatially representative sequencing efforts
We generated 427 new SARS-CoV-2 genomes with >75% genome coverage from Brazilian samples collected between 5 March and 30 April 2020 (figs. S1 to S3 and data S1). For each state, the time between the date of the first reported case and the collection date of the first sequence analyzed in that state was only 4.5 days on average ( Fig. 2A). For eight federal states, genomes were obtained from samples collected up to 6 days before the first case notifications. The genomes generated here were collected in 85 municipalities across 18 of 27 federal units spanning all regions in Brazil ( Fig. 2A and fig. S2). Sequenced genomes were obtained from samples collected 4 days on average (median, range: 0 to 29 days) after the onset of symptoms and were generated in three laboratories using harmonized sequencing and bioinformatic protocols (table  S2). When we include 63 additional available sequences from Brazil deposited in GISAID (29) (see data S1 and S2), we found the dataset to be representative of the spatial heterogeneity of the Brazilian epidemic. Specifically, the number of genomes per state strongly correlated with SARI SARS-CoV-2 confirmed cases and SARI cases with unknown etiology per state (n = 490 sequences from 21 states, Spearman's correlation, r = 0.83; Fig. 2A). This correlation varied from 0.70 to 0.83 when considering SARI cases and deaths caused by SARS-CoV-2 and SARI cases and deaths from unknown etiology ( fig. S4). Most (n = 485/490) Brazilian sequences belong to SARS-CoV-2 lineage B, with only five strains belonging to lineage A (two from Amazonas, one from Rio Grande do Sul, one from Minas Gerais, and one from Rio de Janeiro; data S1 and fig. S5 show detailed lineage information for each sequence). Moreover, we used an in silico assessment of diagnostic assay specificity for Brazilian strains (n = 490) to identify potential mismatches in some assays targeting these strains. We found that the forward primers of the Chinese CDC and Hong Kong University nucleoproteintargeting RT-qPCR may be less appropriate for use in Brazil than other diagnostic assays, for which few or no mismatches were identified ( fig. S6 and table S3). The impact of these mismatches on the sensitivity of these assays should be confirmed experimentally. If sensitivity is affected, then the use of duplex RT-qPCR assays that concurrently target different genomic regions may help in the detection of viruses with variants in primer-or probe-binding regions.

Phylogenetic analyses and international introductions
We estimated maximum likelihood and molecular clock phylogenies for a global dataset with a total of 1182 genomes sampled from 24 December 2019 to 30 April 2020 (root-totip genetic distance correlation with sampling dates, r 2 = 0.53; Fig. 3A and fig. S7). We inferred a median evolutionary rate of 1.13 × 10 −3 (95% BCI: 1.03 to 1.23 × 10 −3 ) substitutions per site per year using an exponential growth coalescent model, equating to 33 changes per year on average across the virus genome. This is within the range of evolutionary rates estimated for other human coronaviruses (30)(31)(32)(33). We estimate the date of the common ancestor (TMRCA) of the SARS-CoV-2 pandemic to around mid-November 2019 (median = 19 November 2019, 95% BCI: 26 October 2019 to 6 December 2019), which is consistent with recent findings (34,35).
Phylogenetic analysis revealed that the majority of the Brazilian genomes (76%, n = 370/ 490) fell into three clades, hereafter referred to as Clade 1 (n = 186/490, 38% of Brazilian strains), Clade 2 (n = 166, 34%), and Clade 3 (n = 18/490, 4%) ( Fig. 3A and figs. S8 and S9), which were largely in agreement with those identified in a phylogenetic analysis using 13,833 global genomes. The most recent common ancestors of the three main Brazilian clades (Clades 1 to 3) were dated from 28 February  22 February (17 to 24 February 2020) (Clade 2), to 11 March (9 to 12 March 2020) (Clade 3) (Fig. 3A and fig. S10).  Fig. 3A and figs. S8 and S12). This represents an underestimate of the real number of introductions because we sequenced, on average, only one out of 200 confirmed cases. Most of these estimated introductions were directed to internationally well-connected states (36) such as São Paulo (36% of all imports), Minas Gerais (24%), Ceará (10%), and Rio de Janeiro (8%) (fig. S12). We further assessed the contribution of international versus national virus lineage movement events through time (Fig. 3B). In the first phase of the epidemic, we found an increasing number of international introductions until 10 March 2020 (Fig. 2B). Limited available travel history data (15) suggested that these early cases were predominantly acquired from Italy (26%, n = 70 of 266 unambiguously identified country of infection) and the United States (28%, n = 76 of 266). After this initial phase, we found that the estimated number of international imports decreased concomitantly with the decline in the number of international passengers traveling to Brazil (Fig. 3, B and C, and S13). By contrast, despite the declines in the number of passengers traveling on national flights (Fig.  3C), we detected an increase in virus lineage movement events between Brazilian regions at least until early April 2020.

Modeling spatiotemporal spread within Brazil
To better understand virus spread across spatiotemporal scales within Brazil, we used a continuous phylogeographic model that maps phylogenetic nodes to their inferred origin loca-tions (37) (Fig. 4). We distinguished branches that remain within a state versus those that cross a state to infer the proportion of within-state versus between-state observed virus movement.
We estimate that during the first epidemic phase, SARS-CoV-2 spread mostly locally and within state borders. By contrast, the second phase was characterized by long-distance movement events and the ignition of the epidemic outside of the southeast region of Brazil (Fig.  4A). Throughout the epidemic, we found that within-state virus lineage movement was, on average, 5.1-fold more frequent than betweenstate movement. Moreover, our data suggest that within-state virus spread and, to a lesser extent, between-state virus spread decreased after the implementation of NPIs (Fig. 4B). However, the more limited sampling after 6 April 2020 (see fig. S2) decreased inferred virus lineage movement to the present (Figs. 3B and 4B).
We found that the average route length traveled by passenger increased by 25% during the second phase of the epidemic (Fig. 4C) despite a concomitant reduction in the number of passengers flying within Brazil (Fig. 3C). The increase in the average route length after NPI implementation resulted from a larger reduction in the number of air passengers flying on shorter-distance journeys compared with those flying on longer-distance journeys. For example, we found an 8.8-fold reduction in Candido  the number of passengers flying in flight legs <1000 km, compared with a 4.4-fold reduction in those flying >2000 km (fig. S15). These findings emphasize the roles of within-and betweenstate mobility as a key driver of both local and interregional virus spread, with highly populated and well-connected urban conurbations in the southeast region acting as the main sources of virus exports within the country ( fig. S12).

Discussion
We provide a comprehensive analysis of SARS-CoV-2 spread in Brazil showing the importance of community-and nation-wide measures to control the COVID-19 epidemic in Brazil. Although NPIs initially reduced virus transmission and spread, the continued increase in the number of cases and deaths in Brazil highlights the urgent need to prevent future virus transmission by implementing rapid and accessible diagnostic screening, contact tracing, quarantining of new cases, and coordinated social and physical distancing measures across the country (38). With the recent relaxation of NPIs in Brazil and elsewhere, continued molecular, immunological, and genomic surveil-lance are required for real-time data-driven decisions. Our analysis shows how changes in mobility may affect global and local transmission of SARS-CoV-2 and demonstrates how combining genomic and mobility data can complement traditional surveillance approaches.