The Calgary Blood Stream Infection Cohort
A Comprehensive Multi-omics Dataset of More than 38 000 Bloodstream Infections
The Calgary BSI Cohort
There are more than 50,000 bloodstream infections in North America each year leading to 9,000 deaths. Despite the seriousness of these infections, there are few comprehensive datasets capturing the molecular profiles of these pathogens in large cohorts. To address this shortcoming, our group conducted a systematic multi-omics survey of the 12 most common Calgary-area BSI-causing pathogens collected between 2006-2022 (The Calgary BSI Cohort).
We have compiled extensive medical information and clinical laboratory records linked to these infections. Moreover, we isolated the pathogens responsible for each infection and collected detailed whole-genome sequencing, metabolomics, and quantitative proteomics data. This unique constellation of microbial datasets and human health records enables a wide range of scientific investigations that have not been possible previously and lays the foundation for significant new commercial ventures and clinical applications.
The Calgary BSI Cohort in Numbers
Genomics
Ilumina Whole Genome Sequencing
-
Total Isolates Processed: 8,548
Av. Number of Genes / Isolate: 2,413
-
Total Isolates Processed: 111
Av. Number of Genes / Isolate: -
-
Total Isolates Processed: 12,180
Av. Number of Genes / Isolate: 4,477
-
Total Isolates Processed: 2,515
Av. Number of Genes / Isolate: 4,932
-
Total Isolates Processed: 677
Av. Number of Genes / Isolate: 5,476
-
Total Isolates Processed: 803
Av. Number of Genes / Isolate: 5,933
-
Total Isolates Processed: -
Av. Number of Genes / Isolate: -
-
Total Isolates Processed: 1,706
Av. Number of Genes / Isolate: 1,713
-
Total Isolates Processed: 1,720
Av. Number of Genes / Isolate: 2,766
-
Total Isolates Processed: 905
Av. Number of Genes / Isolate: 2,667
-
Total Isolates Processed: -
Av. Number of Genes / Isolate: -
-
Total Isolates Processed: -
Av. Number of Genes / Isolate: -
Proteomics
Quantitative TMT-LC-MSMS proteomics
-
Total Isolates Processed: 9360
Av. Number of Proteins/ Isolate: 1379
-
Total Isolates Processed: -
Av. Number of Proteins / Isolate: -
-
Total Isolates Processed: 12156
Av. Number of Proteins/ Isolate: 1836
-
Total Isolates Processed: 2730
Av. Number of Proteins / Isolate: 1875
-
Total Isolates Processed: 734
Av. Number of Proteins / Isolate: 1621
-
Total Isolates Processed: -
Av. Number of Proteins / Isolate: -
-
Total Isolates Processed: -
Av. Number of Proteins / Isolate: -
-
Total Isolates Processed: 1229
Av. Number of Proteins / Isolate: 1031
-
Total Isolates Processed: 1881
Av. Number of Proteins / Isolate: 1454
-
Total Isolates Processed: 961
Av. Number of Proteins / Isolate: 1364
-
Total Isolates Processed: -
Av. Number of Proteins / Isolate: -
-
Total Isolates Processed: -
Av. Number of Proteins / Isolate: -
Metabolomics
Semi-quantitative LC-MS metabolomics
-
Total Isolates Processed: 9,480
Number of Measured metabolites: 68/79
-
Total Isolates Processed: -
Number of Measured metabolites: -
-
Total Isolates Processed: 11,860
Number of Measured metabolites: 68/79
-
Total Isolates Processed: 2,605
Number of Measured metabolites: 65/79
-
Total Isolates Processed: 730
Number of Measured metabolites: 67/79
-
Total Isolates Processed: 964
Number of Measured metabolites: 64/79
-
Total Isolates Processed: -
Number of Measured metabolites: -
-
Total Isolates Processed: 2,620
Number of Measured metabolites: 70/79
-
Total Isolates Processed: 1,806
Number of Measured metabolites: 67/79
-
Total Isolates Processed: 906
Number of Measured metabolites: 68/79
-
Total Isolates Processed: 451
Number of Measured metabolites: 73/79
-
Total Isolates Processed: -
Number of Measured metabolites: -
Clinical Data
Extensive clinical data and detailed patient characteristics
-
Total Isolates with Clinical Data Combined: 8,160
Total Isolates with Complete Data: 6,968
Number of Infection Periods: 5,660
-
Total Isolates with Clinical Data Combined: 4,006
Total Isolates with Complete Data: 921
Number of Infection Periods: 910
-
Total Isolates with Clinical Data Combined: 11,160
Total Isolates with Complete Data: 9,136
Number of Infection Periods: 8,664
-
Total Isolates with Clinical Data Combined: 2,447
Total Isolates with Complete Data: 1,895
Number of Infection Periods: 1,781
-
Total Isolates with Clinical Data Combined: 671
Total Isolates with Complete Data: 525
Number of Infection Periods: 499
-
Total Isolates with Clinical Data Combined: 980
Total Isolates with Complete Data: 305
Number of Infection Periods: 288
-
Total Isolates with Clinical Data Combined: 1,464
Total Isolates with Complete Data: -
Number of Infection Periods: -
-
Total Isolates with Clinical Data Combined: 2,771
Total Isolates with Complete Data: 921
Number of Infection Periods: 910
-
Total Isolates with Clinical Data Combined: 1,677
Total Isolates with Complete Data: 1,424
Number of Infection Periods: 1263
-
Total Isolates with Clinical Data Combined: 877
Total Isolates with Complete Data: 673
Number of Infection Periods: 578
-
Total Isolates with Clinical Data Combined: 410
Total Isolates with Complete Data: -
Number of Infection Periods: -
-
Total Isolates with Clinical Data Combined: 512
Total Isolates with Complete Data: -
Number of Infection Periods: -
Scientific Research and Commercial Ventures
Antimicrobial Susceptibility Prediction
Many bacteria are becoming increasingly resistant to antibiotics and diagnostic testing for antibiotic susceptibility is a $8B global market. The Calgary BSI Cohort could enable a new wave of next-generation sequencing-based diagnostic tools and we have been approached by two companies seeking to license the data for this purpose.
Infection Outbreak Monitoring
Our dataset includes geospatial information of patients, along with hospital and ward locations for infections. These data, along with our genomics records, allow us to build tools that automatically flag outbreaks, identify hospital wards where outbreaks originate, and precisely guide infection control procedures. In collaboration with our provincial infection control and prevention experts and a company that is seeking to develop this market, we received funding for researching and implementing this technology.
Virulence Based Diagnostics
Virulence factors produced by BSI pathogens have a direct impact on the clinical trajectory of infections. Despite this, there are no routine diagnostic tests for microbial virulence, and clinical decision-making is not guided by the unique risk profiles of microbes. Using machine learning, we are investigating the contribution of microbial factors to virulence for the development of new products for their detection.
Mapping of Microbial Gene Expression Regulons
Microbes (such as Escherichia coli) are routinely used to produce proteins and other biomolecules that are then used in foods, medicines, and a wide variety of other industrial products. Our dataset includes protein expression levels and metabolite production levels from every isolate observed and thereby captures a major transect of biological variability. The depth of information allows unprecedented insights into the connection between genome, proteome and metabolome. We have leveraged this to find previously unknown controllers of protein/metabolite production and have entered into early discussions about licensing these data for industrial applications.