Thirteen genetic sequences — isolated from people with COVID-19 infections in the early days of the pandemic in China — were mysteriously deleted from an online database last year, but have now been recovered.
Jesse Bloom, a computational biologist and specialist in viral evolution at the Fred Hutchinson Cancer Research Center in Seattle, found that the sequences had been removed from an online database at the request of scientists in Wuhan, China. But with some internet research, he was able to recover copies of the data stored on Google Cloud.
The sequences don’t fundamentally change scientists’ understanding of the origins of COVID-19 — including the fraught question of whether the coronavirus spread naturally from animals to humans or escaped in a lab accident. But their removal adds to concerns that Chinese government secrecy has hampered international efforts to understand how COVID-19 came to be.
Bloom’s results were published in a preprint paper, which has not yet been reviewed by other scientists on Tuesday. “I think it’s definitely consistent with an attempt to hide the sequences,” he told BuzzFeed News.
Bloom learned about the deleted data after reading a paper by a team led by Carlos Farkas of the University of Manitoba in Canada on some of the earliest genetic sequences of SARS-CoV-2. Farkas’ paper described sequences sampled from outpatient hospital patients in a project by researchers in Wuhan developing diagnostic tests for the virus. But when Bloom tried to download the sequences from the Sequence Read Archive, an online database maintained by the US National Institutes of Health, he received error messages indicating that they had been deleted.
Bloom realized that the copies of SRA data are also maintained on servers operated by Google, and was able to puzzle out the URLs where the missing strings could be found in the cloud. In this way, he recovered 13 genetic sequences that could help answer questions about how the coronavirus evolved and where it came from.
Bloom found that the deleted sequences, like others collected outside the city at later dates, resembled bat coronaviruses — believed to be the ultimate ancestors of the virus that causes COVID-19 — than sequences associated with the Huanan Seafood Market in Wuhan. This adds to previous suggestions that the seafood market may have been an early victim of COVID-19, rather than where the coronavirus first passed from animals to humans.
“This is a very interesting study conducted by Dr. Bloom, and in my opinion the analysis is absolutely correct,” Farkas told BuzzFeed News by email. Scott Gottlieb, former head of the Food and Drug Administration, also praised the findings on Twitter.
But some scientists were less impressed. “It really doesn’t add anything to the origins debate,” Robert Garry of Tulane University in New Orleans told BuzzFeed News by email. Garry argued that the Huanan market or other markets in Wuhan could still be the source of COVID-19.
Bloom is one of 18 scientists who published a letter in May criticizing WHO and China research into the origin of SARS-CoV-2. The scientists argued that the WHO-China report had failed to give “balanced consideration” to competing ideas that the coronavirus spread naturally from animals to humans or escaped from a laboratory — a theory the report considers “extremely unlikely.” considered. After the WHO-China report was published, the US and 13 other governments complained that it “did not have access to complete, original data and samples”.
The deleted virus sequences were first uploaded to the SRA in early March 2020, around the time when researchers led by Yan Li and Tiangang Liu of Wuhan University published a preprint detailing their work using genetic sequencing to prevent COVID-19. diagnose. A few days earlier, China’s State Council had ordered all documents related to COVID-19 to be centrally approved.
The sequences were then withdrawn from the SRA in June, around the time the final version of the article appeared in a scientific journal. According to the NIH, the authors requested that the sequences be removed. “The applicant indicated that the sequence information had been updated, was submitted to another database, and wanted the data removed from SRA to avoid version control issues,” NIH spokesperson Amanda Fine told BuzzFeed News by email.
However, it’s unclear if the sequences have since been posted online in another database.
“There is no plausible scientific reason for the deletion,” Bloom wrote in his preprint, arguing that the sequences were likely “deleted to obscure their existence.” That suggested, he wrote, “a less than sincere attempt to track down the early spread of the epidemic.”
Although the sequences had been deleted, Garry pointed out that the key genetic mutations they contained were still tabulated in the Wuhan team’s final paper. “Jesse Bloom has not found exactly anything new that is not yet in the scientific literature,” Garry told BuzzFeed News, accusing Bloom of writing his preprint in an “incendiary manner that is unscientific and unnecessary.”
Bloom wrote to the Wuhan researchers asking why the sequences had been removed, but received no response. Li and Liu also did not immediately respond to a question from BuzzFeed News.
This isn’t the first time scientists have raised concerns about removing data that could help answer questions about the origins of COVID-19. The main database of coronavirus sequence information maintained by the Wuhan Institute of Virology — speculating about a possible “lab leak” of the virus — was taken offline in September 2019. When members of the WHO-China team studying the origins of the pandemic visited the institute in February, they were told that the database, which reportedly contained data on 22,000 coronavirus samples and sequence records, had been deleted after repeated hacking attempts.