This is a guest post by Sean Michael Kerner.
Andrew McArthur and his graduate students were in the middle of designing a new type of lab test that could have detected any known respiratory virus on the planet - and then SARS-CoV-2, better known as COVID-19 hit.
McArthur is an associate professor of biochemistry and biomedical science at McMaster University in Hamilton, ON, and was working together alongside researchers including teams from Ontario’s Vector Institute and Sunnybrook Health Sciences in Toronto. As the pandemic hit, McArthur and his team of computational biology researchers rapidly adjusted to focus on COVID-19. They joined a team led by Bo Wang at the Vector Institute in building a new app as the front end of a sophisticated compute-intensive system that now benefits from a Cisco donation of UCS systems to help stop the spread of COVID-19.
There are multiple ways to track the spread of a virus including contact tracing, where all the individuals that an infected person has been in contact with are tracked.
"Traditional contact tracing is extremely useful as it gives a sense of what's going on," said McArthur, a member of McMaster’s Infectious disease research institute. "What you don't know is the exact transmission dynamics."
The other way to track virus spread is with molecular epidemiology, where the genome sequence of a pathogen is used to track how the virus moves around, from one location to another. The molecular epidemiology approach is also important as it tracks how the virus is evolving, which is critical to researchers developing a vaccine for COVID-19.
Tracking molecular epidemiology is what McArthur does as co-lead of the Province of Ontario's genomic surveillance of COVID-19 (https://www.oncov.ca). His team is also working with the Public Health Agency of Canada to set the standards on how viruses are analyzed and how accurate data flows from local health authorities up to the national level.
Tracking the movement of COVID-19
There is a lot of data being collected around the world on COVID-19 infections. McArthur said that his team knew that hospitals and local public health agencies were going to want to do their own genomic analysis very quickly on the fly, and not be dependent upon an international database, with its own pace of doing things.
That's where the COVID-19 Genotyping Tool (CGT), developed by Vector Institute PhD student Hassaan Maan, fits in.
"The idea is we take the data that comes from the global repositories and we repackage it using a lot of machine learning," McArthur explained. "So you can quickly upload your own data and ask the immediate questions - Where did my virus come from? How is it transmitted in my community?"
Cisco UCS and the Big Data Problem
A key challenge for McArthur was dealing with scale in a rapid way. While the cloud is often seen as a way for different types of organizations to quickly scale computing power, it wasn't a long-term option for this effort.
With the coronavirus data, personally identifiable information could be part of the mix. McArthur emphasized that generally hospitals are not comfortable with the raw data they generate being processed in a cloud environment, instead they require physical servers onsite.
"Cisco with the UCS donation allows us to put a lot of hardware behind this effort so we can scale up and support large scale analysis," McArthur said.
Cisco UCS also helps with a critical requirement that the genomic analysis requires and that's having lots of system memory. A good part of the analysis is conducted in memory with large data sets.
"In analysis, memory is a big deal," McArthur commented. "Analyzing one patient infection is easy, but analyzing in the context of hundreds of thousand cases globally is all about memory."
Even if cloud was something that could be used for this effort, he noted that getting the amount of memory required in the cloud would be prohibitively expensive. The large memory capacity of the Cisco UCS also helps to enable an integrated data pipeline that runs quality control, file validation and integrity check at each stage.
Going a step further, McArthur said that on the computation side, up until the time of the Cisco UCS donation, memory was a limiting factor for some types of analysis.
"We actually would not even do some of our sub-analyses, because they just sucked up so much memory and the process became so slow," McArthur commented. "Now we're doing them all, including scanning for notable mutations, and we can do it much more reliably."