Artigo Acesso aberto Revisado por pares

Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning‐enabled molecular diagnostics

2020; Springer Nature; Volume: 12; Issue: 3 Linguagem: Inglês

10.15252/emmm.201910264

ISSN

1757-4684

Autores

Ariane Khaledi, Aaron Weimann, Monika Schniederjans, Ehsaneddin Asgari, Tzu‐Hao Kuo, Antonio Oliver, Gabriel Cabot, Axel Kola, Petra Gastmeier, Michael Hogardt, Daniel E Jonas, Mohammad RK Mofrad, Andreas Bremges, Alice C. McHardy, Susanne Häußler,

Tópico(s)

Computational Drug Discovery Methods

Resumo

Article12 February 2020Open Access Transparent process Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning-enabled molecular diagnostics Ariane Khaledi Ariane Khaledi Department of Molecular Bacteriology, Helmholtz Centre for Infection Research, Braunschweig, Germany Molecular Bacteriology Group, TWINCORE-Centre for Experimental and Clinical Infection Research, Hannover, Germany Search for more papers by this author Aaron Weimann Aaron Weimann orcid.org/0000-0003-4597-2471 Molecular Bacteriology Group, TWINCORE-Centre for Experimental and Clinical Infection Research, Hannover, Germany Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany German Center for Infection Research (DZIF), Braunschweig, Germany Search for more papers by this author Monika Schniederjans Monika Schniederjans Department of Molecular Bacteriology, Helmholtz Centre for Infection Research, Braunschweig, Germany Molecular Bacteriology Group, TWINCORE-Centre for Experimental and Clinical Infection Research, Hannover, Germany Search for more papers by this author Ehsaneddin Asgari Ehsaneddin Asgari Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA Search for more papers by this author Tzu-Hao Kuo Tzu-Hao Kuo Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany Search for more papers by this author Antonio Oliver Antonio Oliver Servicio de Microbiología y Unidad de Investigación Hospital Universitario Son Espases, Instituto de Investigación Sanitaria Illes Balears (IdISPa), Palma de Mallorca, Spain Search for more papers by this author Gabriel Cabot Gabriel Cabot Servicio de Microbiología y Unidad de Investigación Hospital Universitario Son Espases, Instituto de Investigación Sanitaria Illes Balears (IdISPa), Palma de Mallorca, Spain Search for more papers by this author Axel Kola Axel Kola Institute of Hygiene and Environmental Medicine, Charité – Universitätsmedizin Berlin, Berlin, Germany Search for more papers by this author Petra Gastmeier Petra Gastmeier Institute of Hygiene and Environmental Medicine, Charité – Universitätsmedizin Berlin, Berlin, Germany Search for more papers by this author Michael Hogardt Michael Hogardt Institute of Medical Microbiology and Infection Control, University Hospital Frankfurt, Frankfurt/Main, Germany Search for more papers by this author Daniel Jonas Daniel Jonas Faculty of Medicine, Institute for Infection Prevention and Hospital Epidemiology, Medical Center-University of Freiburg, Freiburg, Germany Search for more papers by this author Mohammad RK Mofrad Mohammad RK Mofrad Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Lab, Berkeley, CA, USA Search for more papers by this author Andreas Bremges Andreas Bremges orcid.org/0000-0001-6739-7899 Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany German Center for Infection Research (DZIF), Braunschweig, Germany Search for more papers by this author Alice C McHardy Corresponding Author Alice C McHardy [email protected] orcid.org/0000-0003-2370-3430 Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany German Center for Infection Research (DZIF), Braunschweig, Germany Search for more papers by this author Susanne Häussler Corresponding Author Susanne Häussler [email protected] orcid.org/0000-0001-6141-9102 Department of Molecular Bacteriology, Helmholtz Centre for Infection Research, Braunschweig, Germany Molecular Bacteriology Group, TWINCORE-Centre for Experimental and Clinical Infection Research, Hannover, Germany Search for more papers by this author Ariane Khaledi Ariane Khaledi Department of Molecular Bacteriology, Helmholtz Centre for Infection Research, Braunschweig, Germany Molecular Bacteriology Group, TWINCORE-Centre for Experimental and Clinical Infection Research, Hannover, Germany Search for more papers by this author Aaron Weimann Aaron Weimann orcid.org/0000-0003-4597-2471 Molecular Bacteriology Group, TWINCORE-Centre for Experimental and Clinical Infection Research, Hannover, Germany Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany German Center for Infection Research (DZIF), Braunschweig, Germany Search for more papers by this author Monika Schniederjans Monika Schniederjans Department of Molecular Bacteriology, Helmholtz Centre for Infection Research, Braunschweig, Germany Molecular Bacteriology Group, TWINCORE-Centre for Experimental and Clinical Infection Research, Hannover, Germany Search for more papers by this author Ehsaneddin Asgari Ehsaneddin Asgari Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA Search for more papers by this author Tzu-Hao Kuo Tzu-Hao Kuo Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany Search for more papers by this author Antonio Oliver Antonio Oliver Servicio de Microbiología y Unidad de Investigación Hospital Universitario Son Espases, Instituto de Investigación Sanitaria Illes Balears (IdISPa), Palma de Mallorca, Spain Search for more papers by this author Gabriel Cabot Gabriel Cabot Servicio de Microbiología y Unidad de Investigación Hospital Universitario Son Espases, Instituto de Investigación Sanitaria Illes Balears (IdISPa), Palma de Mallorca, Spain Search for more papers by this author Axel Kola Axel Kola Institute of Hygiene and Environmental Medicine, Charité – Universitätsmedizin Berlin, Berlin, Germany Search for more papers by this author Petra Gastmeier Petra Gastmeier Institute of Hygiene and Environmental Medicine, Charité – Universitätsmedizin Berlin, Berlin, Germany Search for more papers by this author Michael Hogardt Michael Hogardt Institute of Medical Microbiology and Infection Control, University Hospital Frankfurt, Frankfurt/Main, Germany Search for more papers by this author Daniel Jonas Daniel Jonas Faculty of Medicine, Institute for Infection Prevention and Hospital Epidemiology, Medical Center-University of Freiburg, Freiburg, Germany Search for more papers by this author Mohammad RK Mofrad Mohammad RK Mofrad Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Lab, Berkeley, CA, USA Search for more papers by this author Andreas Bremges Andreas Bremges orcid.org/0000-0001-6739-7899 Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany German Center for Infection Research (DZIF), Braunschweig, Germany Search for more papers by this author Alice C McHardy Corresponding Author Alice C McHardy [email protected] orcid.org/0000-0003-2370-3430 Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany German Center for Infection Research (DZIF), Braunschweig, Germany Search for more papers by this author Susanne Häussler Corresponding Author Susanne Häussler [email protected] orcid.org/0000-0001-6141-9102 Department of Molecular Bacteriology, Helmholtz Centre for Infection Research, Braunschweig, Germany Molecular Bacteriology Group, TWINCORE-Centre for Experimental and Clinical Infection Research, Hannover, Germany Search for more papers by this author Author Information Ariane Khaledi1,2,†, Aaron Weimann2,3,4,†, Monika Schniederjans1,2,‡, Ehsaneddin Asgari3,5,‡, Tzu-Hao Kuo3, Antonio Oliver6, Gabriel Cabot6, Axel Kola7, Petra Gastmeier7, Michael Hogardt8, Daniel Jonas9, Mohammad RK Mofrad5,10, Andreas Bremges3,4, Alice C McHardy *,3,4,§ and Susanne Häussler *,1,2,§ 1Department of Molecular Bacteriology, Helmholtz Centre for Infection Research, Braunschweig, Germany 2Molecular Bacteriology Group, TWINCORE-Centre for Experimental and Clinical Infection Research, Hannover, Germany 3Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany 4German Center for Infection Research (DZIF), Braunschweig, Germany 5Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA 6Servicio de Microbiología y Unidad de Investigación Hospital Universitario Son Espases, Instituto de Investigación Sanitaria Illes Balears (IdISPa), Palma de Mallorca, Spain 7Institute of Hygiene and Environmental Medicine, Charité – Universitätsmedizin Berlin, Berlin, Germany 8Institute of Medical Microbiology and Infection Control, University Hospital Frankfurt, Frankfurt/Main, Germany 9Faculty of Medicine, Institute for Infection Prevention and Hospital Epidemiology, Medical Center-University of Freiburg, Freiburg, Germany 10Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Lab, Berkeley, CA, USA † These authors contributed equally to this work as the first authors ‡ These authors contributed equally to this work as the second authors § Shared last authors *Corresponding author. Tel: +49 531 391 55271; E-mail: [email protected] *Corresponding author. Tel: +49 531 6181 3000; E-mail: [email protected] EMBO Mol Med (2020)12:e10264https://doi.org/10.15252/emmm.201910264 PDFDownload PDF of article text and main figures. Peer ReviewDownload a summary of the editorial decision process including editorial decision letters, reviewer comments and author responses to feedback. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info Abstract Limited therapy options due to antibiotic resistance underscore the need for optimization of current diagnostics. In some bacterial species, antimicrobial resistance can be unambiguously predicted based on their genome sequence. In this study, we sequenced the genomes and transcriptomes of 414 drug-resistant clinical Pseudomonas aeruginosa isolates. By training machine learning classifiers on information about the presence or absence of genes, their sequence variation, and expression profiles, we generated predictive models and identified biomarkers of resistance to four commonly administered antimicrobial drugs. Using these data types alone or in combination resulted in high (0.8–0.9) or very high (> 0.9) sensitivity and predictive values. For all drugs except for ciprofloxacin, gene expression information improved diagnostic performance. Our results pave the way for the development of a molecular resistance profiling tool that reliably predicts antimicrobial susceptibility based on genomic and transcriptomic markers. The implementation of a molecular susceptibility test system in routine microbiology diagnostics holds promise to provide earlier and more detailed information on antibiotic resistance profiles of bacterial pathogens and thus could change how physicians treat bacterial infections. Synopsis The spread of antibiotic resistance complicates infection treatment, requiring an optimization of current diagnostics. In this study, a machine learning approach identified a set of biomarkers suitable for the development of a molecular test system to determine antibiotic resistance profiles. Genome and transcriptome data of 414 clinical isolates was combined for biomarker identification using information on gene expression, gene presence or absence, and single nucleotide variations. For some antibiotics, transcriptome information greatly improves resistance prediction. Depending on the antibiotic, 37–93 biomarkers are sufficient to obtain high (0.8–0.9) or very high (> 0.9) sensitivity and predictive values. Biomarkers include known resistance conferring genes (e.g. gyrA, oprD, ampC, efflux pumps) as well as unexpected and potential novel candidates. The paper explained Problem Limited therapy options due to the emergence and spread of multidrug resistance leave clinicians with uncertainty about which drug to prescribe. Inadequate initial therapy, however, may cause suffering or death of infected patients, promotes further resistant development, and imposes an enormous financial burden on healthcare systems and on society in general. Results We integrated genomic, transcriptomic, and phenotypic data on antibiotic resistance profiles of 414 clinical Pseudomonas aeruginosa isolates and used a machine learning-based approach to identify sets of molecular markers that allowed a reliable prediction of antibiotic resistance against four antibiotic classes. Using information on (i) the presence or absence of genes, (ii) sequence variations within genes, and (iii) gene expression profiles alone or in combinations resulted in high (0.8–0.9) or very high (> 0.9) sensitivity and predictive values. Importantly, transcriptome data significantly improved the prediction outcome as compared to using genome information alone. Identified biomarkers included known antibiotic resistance determinants (e.g., gyrA, ampC, oprD, efflux pumps) as well as markers previously not associated with antibiotic resistance. Impact Our findings demonstrate that the identification of molecular markers for the prediction of antibiotic resistance holds promise to change current resistance diagnostics. However, gene expression information may be required for highly sensitive and specific resistance prediction in the problematic opportunistic pathogen P. aeruginosa. Introduction The rise of antibiotic resistance is a public health issue of greatest importance (Cassini et al, 2019). Growing resistance hampers the use of conventional antibiotics and leads to increased rates of ineffective empiric antimicrobial therapy. If not adequately treated, infections cause suffering, incapacity, and death, and impose an enormous financial burden on healthcare systems and on society in general (Alanis, 2005; Gootz, 2010; Fair & Tor, 2014). Despite growing medical need, FDA approvals of new antibacterial agents have substantially decreased over the last 20 years (Kinch et al, 2014). Alarmingly, there are only few agents in clinical development for the treatment of infections caused by multidrug-resistant Gram-negative pathogens (Bush & Page, 2017). Pseudomonas aeruginosa, the causative agent of severe acute as well as chronic persistent infections, is particularly problematic. The opportunistic pathogen exhibits high intrinsic antibiotic resistance and frequently acquires resistance-conferring genes via horizontal gene transfer (Lister et al, 2009; Partridge et al, 2018). Furthermore, the accelerating development of drug resistance due to the acquisition of drug resistance-associated mutations poses a serious threat. The lack of new antibiotic options underscores the need for optimization of current diagnostics. Diagnostic tests are a core component in modern healthcare practice. Especially in light of rising multidrug resistance, high-quality diagnostics becomes increasingly important. However, to provide information as the basis for infectious disease management is a difficult task. Antimicrobial susceptibility testing (AST) has experienced little change over the years. It still relies on culture-dependent methods, and as a consequence, clinical microbiology diagnostics is labor-intensive and slow. Culture-based AST requires 48 h (or longer) for definitive results, which leaves physicians with uncertainty about the best drugs to prescribe to individual patients. This delay also contributes to the spread of drug resistance (Oliver et al, 2015; López-Causapé et al, 2018). The introduction of molecular diagnostics could become an alternative to culture-based methods and could be critical in paving the way to fight antimicrobial resistance. Identification of genetic elements of antimicrobial resistance promises a deeper understanding of the epidemiology and mechanisms of resistance and could lead to a timelier reporting of the resistance profiles as compared to conventional culture-based testing. It has been demonstrated that for a number of bacterial species, antimicrobial resistance can be highly accurately predicted based on information derived from the genome sequence (Gordon et al, 2014; Bradley et al, 2015; Moradigaravand et al, 2018). However, in the opportunistic pathogen P. aeruginosa even full genomic sequence information is insufficient to predict antimicrobial resistance in all clinical isolates (Kos et al, 2015). Pseudomonas aeruginosa exhibits a profound phenotypic plasticity mediated by environment-driven flexible changes in the transcriptional profile (Dötsch et al, 2015). For example, P. aeruginosa adapts to the presence of antibiotics with the overexpression of the mex genes, encoding the antibiotic extrusion machineries MexAB-OprM, MexCD-OprJ, MexEF-OprN, and MexXY-OprM. Similarly, high expression of the ampC-encoded intrinsic beta-lactamase confers antimicrobial resistance (Haenni et al, 2017; Juan et al, 2017; Goli et al, 2018; Martin et al, 2018). Those transcriptional responses are frequently fixed in clinical P. aeruginosa strains, e.g., due to mutations in negative regulators of gene expression (Frimodt-Møller et al, 2018; Juarez et al, 2018). Thus, the isolates develop an environment-independent resistance phenotype. Up-regulation of intrinsic beta-lactamases as well as overexpression of efflux pumps that contribute to the resistance phenotype makes gene-based testing a challenge, because it is difficult to predict from the genomic sequence, which (combinations of) mutations would lead to an up-regulation of resistance-conferring genes (Llanes et al, 2004; Fernández & Hancock, 2012; Schniederjans et al, 2017). In this study, we investigated whether we can reliably predict antimicrobial resistance in P. aeruginosa using not only genomic but also quantitative gene expression information. For this purpose, we sequenced the genomes of 414 drug-resistant clinical P. aeruginosa isolates and recorded their transcriptional profiles. We built predictive models of antimicrobial susceptibility/resistance to four commonly administered antibiotics by training machine learning classifiers. From these classifiers, we inferred candidate marker panels for a diagnostic assay by selecting resistance- and susceptibility-informative markers via feature selection. We found that the combined use of information on the presence/absence of genes, their sequence variation, and gene expression profiles can predict resistance and susceptibility in clinical P. aeruginosa isolates with high or very high sensitivity and predictive value. Results Taxonomy and antimicrobial resistance distribution of 414 DNA- and mRNA-sequenced clinical Pseudomonas aeruginosa isolates A total of 414 P. aeruginosa isolates were collected from clinical microbiology laboratories of hospitals across Germany and at sites in Spain, Hungary, and Romania (Fig 1A). For all isolates, the genomic DNA was sequenced and transcriptional profiles were recorded. This enabled us to use not only the full genomic information but also information on the gene expression profiles as an input to machine learning approaches. Figure 1. Geographic and phylogenetic distribution of 414 clinical Pseudomonas aeruginosa isolates used in this study A. Geographic sampling site distribution, where circle size is proportional to the number of isolates from a particular location. B. Phylogenetic tree of the clinical isolates and seven reference strains (blue dots). A PA7-like outgroup clade including two clinical isolates is not shown. Abundant high-risk clones are indicated by green bars. Scale bar: 0.04. C. Antimicrobial susceptibility profiles against the four commonly administered antibiotics tobramycin (TOB), ceftazidime (CAZ), ciprofloxacin (CIP), and meropenem (MEM) determined by agar dilution according to Clinical & Laboratory Standards Institute Guidelines (CLSI, 2018). Download figure Download PowerPoint We inferred a maximum likelihood phylogenetic tree based on variant nucleotide sites (Fig 1B). The tree was constructed by mapping the sequencing reads of each isolate to the genome of the P. aeruginosa PA14 reference strain and then aligning the consensus sequences for each gene. The isolates exhibited a broad taxonomic distribution and separated into two major phylogenetic groups. One included PAO1, PACS2, LESB58, and a cluster of high-risk clone ST175 isolates; the other included PA14, as well as one large cluster of high-risk clone ST235 isolates. Both groups comprised several further clades with closely related isolates of the same sequence type as determined by multilocus sequencing typing (MLST). Next, we recorded antibiotic resistance profiles for all isolates regarding the four common anti-pseudomonas antimicrobials, tobramycin (TOB), ceftazidime (CAZ), ciprofloxacin (CIP), and meropenem (MEM) (Bassetti et al, 2018; Cardozo et al, 2019; Tümmler, 2019) using agar dilution method. Most isolates of our clinical isolate collection exhibit antibiotic resistance against these four antibiotics (Fig 1C, Dataset EV1). One-third had a multidrug-resistant (MDR) phenotype, defined as non-susceptible to at least three different classes of antibiotics (Magiorakos et al, 2012). Machine learning for predicting antimicrobial resistance We used the genomic and transcriptomic data of the clinical P. aeruginosa isolates to infer resistance and susceptibility phenotypes to ceftazidime, meropenem, ciprofloxacin, and tobramycin with machine learning classifiers. For each antibiotic, we included all respective isolates categorized as either "resistant" or "susceptible". For the genomic data, we included sequence variations (single nucleotide polymorphisms; SNPs, including small indels) and gene presence or absence (GPA) as features. In total, we analyzed 255,868 SNPs, represented by 65,817 groups with identical distributions of SNPs across isolates for the same group, and 76,493 gene families with presence or absence information, corresponding to 14,700 groups of identically distributed gene families. 1,306 of these gene families had an indel in some isolate genomes, which we included as an additional feature. We evaluated SNP and GPA groups in combination with gene expression information for 6,026 genes (Fig 2). Figure 2. Training and validating a diagnostic classifier for antimicrobial susceptibility prediction for four different drugs based on genomic (GPA/SNPs) and transcriptomic profiles (EXPR)The best data type combination was determined using 80% of the data in standard and phylogenetically informed cross-validation (cv) and further validated on the remaining 20% of the data. Download figure Download PowerPoint For each drug, we randomly assigned isolates to a training set that comprised 80% of the resistant and susceptible isolates, respectively, and the remaining 20% to a test set. Parameters of machine learning models were optimized on the training set and their value assessed in cross-validation, while the test set was used to obtain another independent performance estimate. As bacterial population structure can influence machine learning outcomes, e.g., it has been shown before in Escherichia coli that phylo-groups' specific markers alone could be used to predict antibiotic resistance phenotypes with accuracies of 0.65–0.91, depending on the antibiotic (Moradigaravand et al, 2018), we also assessed performance while accounting for population structure based on sequence types through a block cross-validation approach. We trained several machine learning classification methods on SNPs, GPA, and expression features individually and in combination for predicting antibiotic susceptibility or resistance of isolates and evaluated the classifier performances. We determined MIC (minimal inhibitory concentration) values of all clinical isolates with agar dilution according to CLSI guidelines (CLSI, 2018) to use as the gold standard for evaluation purposes. We calculated the sensitivity and predictive value of resistance (R) and susceptibility (S) assignment, as well as the macro F1-score, as an overall performance measure based on a classifier trained on a specific data type combination. The sensitivity reflects how good that classifier is in recovering the assignments of the underlying gold standard, representing the fraction of susceptible, or resistant, samples, respectively. The predictive value reflects how trustworthy the assignments of this particular classifier are, representing the fraction of correct assignments of all susceptible or resistant assignments, respectively. The F1-score is the harmonic mean of the sensitivity and predictive value for a particular class, i.e., susceptible or resistant. The macro F1-score is the average over the two F1-scores. We used the support vector machine (SVM) classifier with a linear kernel, as in Weimann et al (2016), to predict sensitivity or resistance to four different antibiotics. Parameters were optimized in nested cross-validation, and performance estimates averaged over five repeats of this setup. The combined use of (i) GPA, (ii) SNPs, and (iii) information on gene expression resulted in high (0.8–0.9) or very high (> 0.9) sensitivity and predictive values (Fig 3). Notably, the relative contribution of the different information sources to the susceptibility and resistance sensitivity strongly depended on the antibiotic. To assess the effect of the classification technique, we compared the performance of an SVM classifier with a linear kernel to that of random forests and logistic regression, which we and others have successfully used for related phenotype prediction problems (Asgari et al, 2018; Her & Wu, 2018; Wheeler et al, 2018). For this purpose, we used the data type combination with the best macro F1-score in resistance prediction with the SVM. We evaluated the classification performance in nested cross-validation and on a held-out test dataset. In addition, we performed a phylogeny-aware partitioning of our dataset, to assess the phylogenetic generalization ability of our technique. Figure 3. Evaluation of AMR classification with a support vector machine (R: resistant; S: susceptible) using different performance metrics and data types (EXPR: gene expression; GPA: gene presence or absence; and SNPs: single nucleotide polymorphisms) or combinations thereofEach individual panel depicts the results for one of four different anti-pseudomonal antibiotics (CAZ, CIP, MEM, and TOB). The solid vertical line in the box plots represents the median, the box limits depict the 25th and 75th percentile, and the lower and upper hinges include values within ± 1.5 times the interquartile range. Values outside that range were plotted as solid dots. Download figure Download PowerPoint The performance of the SVM in random cross-validation was comparable to logistic regression (macro F1-score for the SVM: 0.83 ± 0.06 vs. logistic regression: 0.84 ± 0.06), but considerably better than the random forest classifiers (0.67 ± 0.14; Appendix Figs S1 and S2, Dataset EV2). The performance on the held-out dataset was in a comparable range (SVM: 0.87 ± 0.07; logistic regression: 0.90 ± 0.04; random forest 0.71 ± 0.16). We furthermore observed similar macro F1-scores inferred in the phylogenetically selected cross-validation (SVM: 0.87 ± 0.07; logistic regression: 0.86 ± 0.07; random forest 0.72 ± 0.13), which suggests only a minor influence of the bacterial phylogeny on the classification performance. The performance on the phylogenetically selected held-out dataset was again comparable, though performance for the random forest deteriorated in comparison with the cross-validation results (SVM: 0.86 ± 0.06; logistic regression 0.83 ± 0.06; random forests 0.56 ± 0.03). Ciprofloxacin resistance and susceptibility based on SVMs could be correctly predicted with a sensitivity of 0.92 ± 0.01 and 0.87 ± 0.01, and with simultaneously high predictive values of 0.91 ± 0.01 and 0.90 ± 0.01, respectively, using solely SNP information. The sensitivity of 0.80 ± 0.04 and 0.79 ± 0.02 and predictive value of 0.73 ± 0.01 and 0.76 ± 0.02 to predict ciprofloxacin susceptibility and resistance based exclusively on gene expression data were also high. However, there was no added value of using information on gene expression in addition to SNP information for the prediction of susceptibility/resistance toward ciprofloxacin. For the prediction of tobramycin susceptibility and resistance, the machine learning classifiers performed almost equally well when the three input data types (SNPs, GPA, and gene expression) were used individually (values > 0.8). SNP information was predictive of tobramycin resistance; however, it did not further improve the classification performance when combined with the other data types. GPA information alone was the most important data type for classifying tobramycin resistance and susceptibility providing sensitivity values of 0.84 ± 0.01 and 0.95 ± 0.01 and predictive values of 0.88 ± 0.01 and 0.93 ± 0.01, respectively. The performance of GPA-based prediction increased further when gene expression values were included (P-value of a

Referência(s)