Cattivelli directs the Genomics Research Center in Fiorenzuola, which is part of the Italian government’s Council for Agricultural and Economic Research (CREA). Cattivelli and his colleagues, along with teams of crop geneticists from other parts of the world, are using high-performance computing in the Microsoft Azure cloud to try to unlock the genetic secrets of durum and other varieties of wheat. In the Pangenome Project, they are sifting through the genomes of about 40 varieties of wheat and its ancient ancestors for traits that would help the crop thrive in extreme conditions, be more efficient in use of natural resources and be resistant to disease and pests, reducing the need for fertilizers and pesticides.
It’s not just a question of pasta for Italians; it’s an urgent quest because growing enough staples like wheat, rice and corn is essential to human survival.
Wheat makes up about 20% of calories consumed by humans globally. And climate change is a direct threat to the production of crops globally, from drought and heat as well as torrential rains and other extreme weather events, such as the recent floods in eastern Spain.
Working together with Microsoft, CREA built a framework in the Azure cloud that eventually could house and analyze multiple petabytes of genetic data from the genomes of many varieties of wheat from multiple sources. (To get an idea of what that means, one petabyte could hold up to 2,000 years’ worth of digital music, if played continuously.)
Curtis Pozniak, a geneticist who directs the Crop Development Center at the University of Saskatchewan, Canada, is among the founders of the Pangenome Project.
“We’re generating petabytes of information that we need to filter down into something meaningful,” he says. “The only efficient way to do that is through cloud-based platforms where the same data can be shared with a whole range of experts at the same time.”
That data, which is stored in Microsoft’s Northern Italy Data Center Region, is then processed and analyzed in what is known as a “pipeline,” also housed in Azure. A pipeline is a series of data processing stages, in this case created with open-source coding. This particular genomic pipeline is designed to deal with billions of small sequences that have to be ordered to make the 14 chromosomes of the durum wheat genome. The pipeline is a tool that helps the scientists piece together that elaborate jigsaw puzzle.
This genomic puzzle can be seen and worked on by teams of scientists wherever they are in the world. Knowledge and information extracted from the genomic puzzle will be embedded in new varieties that will be made available to farmers in the coming years.