The Calgary Blood Stream Infection Cohort

A Comprehensive Multi-omics Dataset of More than 38 000 Bloodstream Infections

The Calgary BSI Cohort

There are more than 50,000 bloodstream infections in North America each year leading to 9,000 deaths. Despite the seriousness of these infections, there are few comprehensive datasets capturing the molecular profiles of these pathogens in large cohorts. To address this shortcoming, our group conducted a systematic multi-omics survey of the 12 most common Calgary-area BSI-causing pathogens collected between 2006-2022 (The Calgary BSI Cohort).

We have compiled extensive medical information and clinical laboratory records linked to these infections. Moreover, we isolated the pathogens responsible for each infection and collected detailed whole-genome sequencing, metabolomics, and quantitative proteomics data. This unique constellation of microbial datasets and human health records enables a wide range of scientific investigations that have not been possible previously and lays the foundation for significant new commercial ventures and clinical applications.

The Calgary BSI Cohort in Numbers

Genomics

Ilumina Whole Genome Sequencing

  • Total Isolates Processed: 8,548​

    Av. Number of Genes / Isolate: 2,413​

  • Total Isolates Processed: 111​

    Av. Number of Genes / Isolate: -

  • Total Isolates Processed: 12,180

    Av. Number of Genes / Isolate: 4,477

  • Total Isolates Processed: 2,515

    Av. Number of Genes / Isolate: 4,932

  • Total Isolates Processed: 677

    Av. Number of Genes / Isolate: 5,476

  • Total Isolates Processed: 803​

    Av. Number of Genes / Isolate: 5,933

  • Total Isolates Processed: -

    Av. Number of Genes / Isolate: -

  • Total Isolates Processed: 1,706​

    Av. Number of Genes / Isolate: 1,713

  • Total Isolates Processed: 1,720​

    Av. Number of Genes / Isolate: 2,766​

  • Total Isolates Processed: 905

    Av. Number of Genes / Isolate: 2,667

  • Total Isolates Processed: -

    Av. Number of Genes / Isolate: -

  • Total Isolates Processed: -

    Av. Number of Genes / Isolate: -

Proteomics

Quantitative TMT-LC-MSMS proteomics

  • Total Isolates Processed: 9360​​

    Av. Number of Proteins/ Isolate: 1379

  • Total Isolates Processed: -

    Av. Number of Proteins / Isolate: -

  • Total Isolates Processed: 12156

    Av. Number of Proteins/ Isolate: 1836

  • Total Isolates Processed: 2730

    Av. Number of Proteins / Isolate: 1875

  • Total Isolates Processed: 734

    Av. Number of Proteins / Isolate: 1621

  • Total Isolates Processed: -

    Av. Number of Proteins / Isolate: -

  • Total Isolates Processed: -

    Av. Number of Proteins / Isolate: -

  • Total Isolates Processed: 1229

    Av. Number of Proteins / Isolate: 1031

  • Total Isolates Processed: 1881

    Av. Number of Proteins / Isolate: 1454

  • Total Isolates Processed: 961

    Av. Number of Proteins / Isolate: 1364

  • Total Isolates Processed: -

    Av. Number of Proteins / Isolate: -

  • Total Isolates Processed: -

    Av. Number of Proteins / Isolate: -

Metabolomics

Semi-quantitative LC-MS metabolomics

  • Total Isolates Processed: 9,480

    Number of Measured metabolites: 68/79

  • Total Isolates Processed: -

    Number of Measured metabolites: -

  • Total Isolates Processed: 11,860​

    Number of Measured metabolites: 68/79

  • Total Isolates Processed: 2,605

    Number of Measured metabolites: 65/79

  • Total Isolates Processed: 730

    Number of Measured metabolites: 67/79

  • Total Isolates Processed: 964

    Number of Measured metabolites: 64/79

  • Total Isolates Processed: -

    Number of Measured metabolites: -

  • Total Isolates Processed: 2,620

    Number of Measured metabolites: 70/79​

  • Total Isolates Processed: 1,806

    Number of Measured metabolites: 67/79​

  • Total Isolates Processed: 906

    Number of Measured metabolites: 68/79​

  • Total Isolates Processed: 451

    Number of Measured metabolites: 73/79​

  • Total Isolates Processed: -

    Number of Measured metabolites: -

Clinical Data

Extensive clinical data and detailed patient characteristics

  • Total Isolates with Clinical Data Combined: 8,160

    Total Isolates with Complete Data: 6,968

    Number of Infection Periods: 5,660

  • Total Isolates with Clinical Data Combined: 4,006​

    Total Isolates with Complete Data: 921

    Number of Infection Periods: 910

  • Total Isolates with Clinical Data Combined: 11,160

    Total Isolates with Complete Data: 9,136

    Number of Infection Periods: 8,664​

  • Total Isolates with Clinical Data Combined: 2,447​

    Total Isolates with Complete Data: 1,895

    Number of Infection Periods: 1,781​

  • Total Isolates with Clinical Data Combined: 671​

    Total Isolates with Complete Data: 525

    Number of Infection Periods: 499

  • Total Isolates with Clinical Data Combined: 980

    Total Isolates with Complete Data: 305

    Number of Infection Periods: 288

  • Total Isolates with Clinical Data Combined: 1,464​

    Total Isolates with Complete Data: -

    Number of Infection Periods: -

  • Total Isolates with Clinical Data Combined: 2,771

    Total Isolates with Complete Data: 921

    Number of Infection Periods: 910

  • Total Isolates with Clinical Data Combined: 1,677

    Total Isolates with Complete Data: 1,424

    Number of Infection Periods: 1263

  • Total Isolates with Clinical Data Combined: 877

    Total Isolates with Complete Data: 673

    Number of Infection Periods: 578​

  • Total Isolates with Clinical Data Combined: 410​

    Total Isolates with Complete Data: -

    Number of Infection Periods: -

  • Total Isolates with Clinical Data Combined: 512

    Total Isolates with Complete Data: -

    Number of Infection Periods: -

Scientific Research and Commercial Ventures

Antimicrobial Susceptibility Prediction

Many bacteria are becoming increasingly resistant to antibiotics and diagnostic testing for antibiotic susceptibility is a $8B global market. The Calgary BSI Cohort could enable a new wave of next-generation sequencing-based diagnostic tools and we have been approached by two companies seeking to license the data for this purpose.


Infection Outbreak Monitoring

Our dataset includes geospatial information of patients, along with hospital and ward locations for infections. These data, along with our genomics records, allow us to build tools that automatically flag outbreaks, identify hospital wards where outbreaks originate, and precisely guide infection control procedures. In collaboration with our provincial infection control and prevention experts and a company that is seeking to develop this market, we received funding for researching and implementing this technology.


Virulence Based Diagnostics

Virulence factors produced by BSI pathogens have a direct impact on the clinical trajectory of infections. Despite this, there are no routine diagnostic tests for microbial virulence, and clinical decision-making is not guided by the unique risk profiles of microbes. Using machine learning, we are investigating the contribution of microbial factors to virulence for the development of new products for their detection.


Mapping of Microbial Gene Expression Regulons

Microbes (such as Escherichia coli) are routinely used to produce proteins and other biomolecules that are then used in foods, medicines, and a wide variety of other industrial products. Our dataset includes protein expression levels and metabolite production levels from every isolate observed and thereby captures a major transect of biological variability. The depth of information allows unprecedented insights into the connection between genome, proteome and metabolome. We have leveraged this to find previously unknown controllers of protein/metabolite production and have entered into early discussions about licensing these data for industrial applications.