Artigo Acesso aberto Revisado por pares

Predicting antigen specificity of single T cells based on TCR CDR 3 regions

2020; Springer Nature; Volume: 16; Issue: 8 Linguagem: Inglês

10.15252/msb.20199416

ISSN

1744-4292

Autores

David S. Fischer, Yihan Wu, Benjamin Schubert, Fabian J. Theis,

Tópico(s)

Immune Cell Function and Interaction

Resumo

Article11 August 2020Open Access Transparent process Predicting antigen specificity of single T cells based on TCR CDR3 regions David S Fischer David S Fischer orcid.org/0000-0002-1293-7656 Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany Search for more papers by this author Yihan Wu Yihan Wu orcid.org/0000-0003-2718-8704 Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany Search for more papers by this author Benjamin Schubert Benjamin Schubert orcid.org/0000-0003-3412-1102 Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany Department of Mathematics, Technical University of Munich, Garching bei München, Germany Search for more papers by this author Fabian J Theis Corresponding Author Fabian J Theis [email protected] orcid.org/0000-0002-2419-1943 Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany Department of Mathematics, Technical University of Munich, Garching bei München, Germany Search for more papers by this author David S Fischer David S Fischer orcid.org/0000-0002-1293-7656 Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany Search for more papers by this author Yihan Wu Yihan Wu orcid.org/0000-0003-2718-8704 Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany Search for more papers by this author Benjamin Schubert Benjamin Schubert orcid.org/0000-0003-3412-1102 Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany Department of Mathematics, Technical University of Munich, Garching bei München, Germany Search for more papers by this author Fabian J Theis Corresponding Author Fabian J Theis [email protected] orcid.org/0000-0002-2419-1943 Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany Department of Mathematics, Technical University of Munich, Garching bei München, Germany Search for more papers by this author Author Information David S Fischer1,2, Yihan Wu1, Benjamin Schubert1,3 and Fabian J Theis *,1,2,3 1Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany 2TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany 3Department of Mathematics, Technical University of Munich, Garching bei München, Germany *Corresponding author. Tel: +49 89 3187 43260; E-mail: [email protected] Molecular Systems Biology (2020)16:e9416https://doi.org/10.15252/msb.20199416 PDFDownload PDF of article text and main figures. Peer ReviewDownload a summary of the editorial decision process including editorial decision letters, reviewer comments and author responses to feedback. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info Abstract It has recently become possible to simultaneously assay T-cell specificity with respect to large sets of antigens and the T-cell receptor sequence in high-throughput single-cell experiments. Leveraging this new type of data, we propose and benchmark a collection of deep learning architectures to model T-cell specificity in single cells. In agreement with previous results, we found that models that treat antigens as categorical outcome variables outperform those that model the TCR and antigen sequence jointly. Moreover, we show that variability in single-cell immune repertoire screens can be mitigated by modeling cell-specific covariates. Lastly, we demonstrate that the number of bound pMHC complexes can be predicted in a continuous fashion providing a gateway to disentangle cell-to-dextramer binding strength and receptor-to-pMHC affinity. We provide these models in the Python package TcellMatch to allow imputation of antigen specificities in single-cell RNA-seq studies on T cells without the need for MHC staining. Synopsis TcellMatch is a deep-learning based algorithm that predicts the antigen specificity of single T cells based on multimodal single-cell experiments that measure pMHC binding and T-cell receptor sequences among other properties. pMHC measurements are predicted in a large single-cell data set with > 100,000 cells, additionally using TCR-antigen pairs from IEDB and VDJdb. Benchmarking categorical models of antigens with antigen-embedding models indicates that categorical models are often preferable. The study highlights the need to measure TCR specificity for a larger repertoire of antigens to generalize models to unseen antigens. Introduction Antigen recognition is one of the key factors of T cell-mediated immunity. T cells interact via a dimeric surface protein, the T-cell receptor (TCR), with an antigen presented on a major histocompatibility complex (MHC) located on the surface of antigen-presenting cells. This presenting cell can be experimentally modeled via an MHC multimer with an immobilized antigen (pMHC). The T cells of an individual organism cover a wide range of antigen specificities. This variability in specificity stems mostly from plasticity of three complementarity-determining region (CDR) loops (CDR1-3) of both TCR ɑ- and β-chains. The hypervariable loops CDR3ɑ and CDR3β are most commonly aligned with the presented epitope (Singh et al, 2017) and are hypothesized to be the main driver of T-cell specificity (Glanville et al, 2017). However, specificity-determining influences of the other CDR loops (Cole et al, 2009; Madura et al, 2013; Stadinski et al, 2014) and distal regions (Harris et al, 2016a,b) have also been demonstrated. The ability to accurately predict T-cell activation upon antigen recognition based on antigen and TCR sequences would have transformative effects on many research fields from infectious disease, autoimmunity, and vaccine design to cancer immunology, but has been thwarted by a lack of training data and adequate models. In the absence of sufficiently large experimental data, most studies focused on molecular analysis of individual co-crystallized TCR–pMHC complexes and molecular dynamics simulations with limited success (Flower et al, 2010). Only recently, through concerted data collection efforts (Borrman et al, 2017; Shugay et al, 2018; Vita et al, 2019) and newly emerging high-throughput technologies that allow the sequencing of the TCR while probing the T-cell specificity (Klinger et al, 2015; Bentzen et al, 2016), have large enough data sets become available to begin modeling the TCR–pMHC interaction through machine-learning methods (Zvyagin et al, 2020). Current methods to predict the likelihood of binding of TCRs to specific antigens use linear position-specific scoring matrices (Glanville et al, 2017), Gaussian processes (preprint: Jokinen et al, 2019), or random forests (Gielis et al, 2018). A second set of methods attempts to directly model the TCR–pMHC interaction with neural networks in order to generalize across unseen TCR–antigen pairs (preprint: Jurtz et al, 2018). We expand on these efforts but also consider the current limitation in the number of available antigens in training data sets. Secondly, we consider the inclusion of complex sets of cell-specific covariates into the prediction problem. The inclusion of cell-specific covariates has previously been shown to work in the example of transcriptome-derived clusters as covariates (preprint: Jokinen et al, 2019). Here, we leverage the data modalities in the new droplet-based single-cell experiments. In this study, we exploit a newly developed single-cell technology that enables the simultaneous sequencing of the paired TCR ɑ- and β-chains and determining the T-cell specificity via bound peptide-loaded MHC (pMHC) complexes. This technology allows the routine collection of binding TCR and antigen complexes of the size of entire curated databases in a single study (Bagaev et al, 2019; 10x Genomics, 2019) and accordingly harnesses great potential to transform the field of T-cell receptor specificity prediction. We propose and trained multiple deep learning architectures that model the TCR–pMHC interaction. The models account for the variability found in single-cell data through cell-specific covariates. We show that models that include both ɑ- and β-chain have a predictive advantage over models that only include the β-chain, while models fit on only a single chain still perform well. We further find that T-cell specificity imputation in a single-cell sample from a known donor is possible, enabling assessment of the presence of disease-specific T cells, while generalization across unknown TCR–pMHC pairs is still not possible. Lastly, we anticipate a large number of single-cell studies involving T cells to exploit TCR specificity as an additional phenotypic readout. To facilitate the usage of our predictive algorithms, we built the Python package TcellMatch, which hosts a pre-trained model zoo for analysts to impute pMHC-derived antigen specificities and allows the transfer and re-training of models on new data sets. Results A joint deep learning model for alpha- and beta-chains, antigens, and covariates for single-cell TCR profiling experiments We set out to predict the antigen specificity of single T cells based on TCR ɑ- and β-chain sequences and other cellular covariates, such as donor identity and cell surface protein counts. We used a publicly available single-cell data set (10x Genomics, 2019) based on a technology in which cells are captured in droplets in a microfluidics system so that antigen specificity, the CDR3 TCR sequences, surface protein abundance, and mRNA abundance can be assayed for each captured cell (Fig 1A, Methods and Protocols). Antigen specificity was quantified via the count of unique molecular identifiers associated with antigen-specific dextramer (pMHC complex) barcode sequences (10x Genomics, 2019). Additionally, we used databases (IEDB; Shugay et al, 2018; Vita et al, 2019) and VDJdb (Shugay et al, 2018) that harbor additional pairs of binding TCR and antigen sequences from traditional low-throughput screenings and crystal structures to validate our results. The prediction of antigen specificity was previously attempted on smaller data sets, but the new single-cell technology enables the collection of data sets that are orders of magnitude larger than what was previously available from curation efforts that integrated studies from the entire field of TCR specificity (Shugay et al, 2018; Vita et al, 2019). These large single-cell data sets may, however, be susceptible to greater noise than results derived from studies that are either conducted in bulk or validated separately. We chose deep learning models for the prediction task as these are well suited to cope with large noisy data sets. We included interpretable linear models and a previously proposed non-linear reference model (NetTCR; preprint: Jurtz et al, 2018) as baseline methods. The convolutional and linear models used here are in structure similar to models that relate antigen specificity to clusters of TCR sequences but are continuously differentiable and therefore easier to extend to new specificity groups. Figure 1. Deep learning models predict binding of T-cell receptors (TCR) to peptide MHC complexes (pMHC) from defined antigen panelsDistributions shown as boxplots are across threefold cross-validation. AUC ROC test: Area under the receiver operating characteristic curve on the test set for the binary binding event prediction task. The top panel in (C), (F), (G) is a zoom into an informative region of the y-axis. counts: total mRNA counts, nc: negative-control pMHC counts, surface: surface protein counts. Concept of multimodal single-cell immune profiling experiment with RNA-seq, surface protein quantification, bound pMHC quantification, and TCR reconstruction. Categorical TcellMatch model: A feed-forward neural network to predict a vector of antigen specificities of a T cell based on the CDR3 sequences of the TCR ɑ- and β-chains. Gray boxes: layers of the neural network. Covariates improve sequence-based binding accuracy prediction. Shown are bidirectional GRU models fit on both ɑ- and β-chains (CONCAT). none: no cell-specific covariates, donor: one-hot encoded donor identity, donor + counts: one-hot encoded donor identity and total mRNA counts per cell, counts, nc: negative-control pMHC count vector, nc + donor + counts: negative-control pMHC count vector, one-hot encoded donor identity and total mRNA counts per cell, counts, nc + donor + counts + surface: negative-control pMHC count vector, one-hot encoded donor identity, total mRNA counts per cell and surface protein count vector (n = 4 cross-validations for models none and nc, “leave-one donor out”, and n = 3 cross-validations for all other models). Overlap of correctly and incorrectly classified test set observations from best-performing model to models with reduced covariate sets. Models without donor covariates were not included. full: nc + donor + counts + surface model from (C), red: model shown on x-axis tick (n = 3 cross-validations for all models). Antigen-wise prediction performance by covariates setting. In contrast to panel (C), the prediction performance is not aggregated across the entire test set but evaluated separately the observations belonging to each antigen. Shown are bidirectional GRU models fit on both ɑ- and β-chains (CONCAT) (n = 4 cross-validations for models without donor covariate, “leave-one donor out”, and n = 3 cross-validations for all other models). Antigen-binding prediction is improved by the inclusion of TCR CDR3 sequences. BIGRU: bidirectional GRU model, NOSEQ: model without TCR sequence embedding. Models without donor covariates were not included (n = 4 cross-validations for models none and nc, “leave-one donor out”, and n = 3 cross-validations for all other models). Antigen-binding prediction based on TCR CDR3 sequences is improved by modeling ɑ- and β-chains. BIGRU: bidirectional GRU model, SA: self-attention model, CONV: convolution model, LINEAR: linear model, CONCAT: models fit on the CDR3 sequences of both TCR ɑ- and β-chains, TRA, TRB: models fit on the CDR3 sequence of either the TCR ɑ- or the β-chain (n = 3 cross-validations for all other models). Data information: All boxplots: the center of each boxplot is the sample median; the whiskers extend from the upper (lower) hinge to the largest (smallest) data point no further than 1.5 times the interquartile range from the upper (lower) hinge. In (C, F, G), the underlying data points are shown as swarm plots color-coded in the same way as the boxplot. Download figure Download PowerPoint The prediction of antigen specificity from TCR sequences and numeric cellular covariates is a mixed input data-type problem. The deep characterization of the single cells via modalities such as mRNA or surface protein abundance in the context of specificity assessment makes such mixed input data-type models much more relevant to single-cell data than they were previously to less well-characterized pairs of binding TCRs and antigens that were curated from literature. We approached this problem by combining a network tailored to numerical data with a network tailored to sequence-structured data to yield a single prediction (Fig 1B). Machine learning on sequence data is a field of ongoing research and different layer types have been shown to be effective for different tasks. Accordingly, we implemented all major sequence data-specific layer types to be able to perform a comprehensive comparison of deep learning architectures for the task of predicting TCR specificity. This comprehensive comparison is to the best of our knowledge the first of its kind. Specifically, we implemented recurrent layers (bidirectional GRUs; Schuster & Paliwal, 1997; Cho et al, 2014) and bidirectional LSTMs (Hochreiter & Schmidhuber, 1997; Schuster & Paliwal, 1997), convolutional layers (Szegedy et al, 2015), self-attention layers (Vaswani et al, 2017), and densely connected networks, which include linear models that relate to previous work (Glanville et al, 2017). All of these sequence data embedding layer types require an initial representation of the elements of the sequence: an initial encoding of the amino acids. We compared categorical, substitution frequency derived (BLOSUM), and learned embeddings and found that the initial amino acid embedding does not have a strong effect on the results (Appendix Fig S1). The novel learned embedding that we propose here is more parameter efficient as it can expose a lower-dimensional amino acid space to the sequence-embedding layers than the standard embedding layers do (Methods and Protocols). In the following, we only show model fits based on these learned 1×1 convolutional embeddings based on BLOSUM50 (Methods and Protocols). We considered the binding event prediction task within a panel of antigens as a single- or multi-task prediction problem with antigen species as categorical output variables (“categorical antigen model”, Figs 1 and 2, Methods and Protocols). Secondly, we considered binding event prediction on arbitrary antigens as a distinct scenario that requires the model to embed the input antigen sequence (“antigen-embedding model”, Fig 3, Methods and Protocols). The categorical antigen model predicts a probability distribution across possible binding events, including a negative (no binding) event. The antigen-embedding model is based on the concept of positive and negative sets. In the single-cell data, a negative set naturally arises from cells that did not bind to any or a given pMHC species. The positive set is naturally defined as the observed binding pairs. We generated the negative set for TCR–antigen-binding pairs from IEDB or VDJdb (preprint: Jurtz et al, 2018) shuffling TCR and antigen assignments in silico. Figure 2. The binding strength of T cells to pMHC complexes can be modeled based on single-cell data Sequence-encoding layer types outperform linear models on pMHC count prediction if donor and size factors are given as covariates. BIGRU: bidirectional GRU model, SA: self-attention model, CONV: convolution model, LINEAR: linear model, CONCAT: models fit on the CDR3 sequences of both the TCR ɑ- and β-chains, TRA, TRB: models fit on the CDR3 sequence of either the TCR ɑ- or the β-chain (n = 3 cross-validations for all other models). Performance of bidirectional GRU models that predict pMHC counts directly is best if covariates and both TCR chains are modeled. test R2 (log): test R2 on log-transformed test data. none: no cell-specific covariates, donor: one-hot encoded donor identity, donor + counts: one-hot encoded donor identity and total mRNA counts per cell, counts, nc: negative-control pMHC count vector, nc + donor + counts: negative-control pMHC count vector, one-hot encoded donor identity and total mRNA counts per cell, counts, nc + donor + counts + surface: negative-control pMHC count vector, one-hot encoded donor identity, total mRNA counts per cell and surface protein count vector (n = 4 cross-validations for models without donor covariate, “leave-one donor out”, and n = 3 cross-validations for all other models). Multi-task models outperform separate single-task model on pMHC count prediction by antigen. multi: multi-task model, single: single-task model (n = 3 cross-validations for all other models). Data information: All boxplots: the center of each boxplot is the sample median; the whiskers extend from the upper (lower) hinge to the largest (smallest) data point no further than 1.5 times the interquartile range from the upper (lower) hinge. The underlying data points are shown as swarm plots color-coded in the same way as the boxplot. Download figure Download PowerPoint Figure 3. Models tailored to generalize to unseen antigens are outperformed by categorical antigen models on seen antigensDistributions shown as boxplots are across threefold cross-validation. A. The databases IEDB and VDJdb contain pairs of TCRs and antigens that were found to be specific to each other and are curated from many different studies. A supervised model that predicts binding events can be trained on such data but also requires the assembly of a set of negative observations (Methods and Protocols). B. Antigen-embedding TcellMatch model: A feed-forward neural network to predict a binding event based on TCR CDR3 sequences and antigen peptide sequence. Gray boxes: layers of the neural network. C. Different sequence-encoding layer types perform similarly well on binding prediction based on TRB-CDR3 and antigen sequence. CONCAT: models in which TRB CDR3 sequence and antigen sequence are concatenated, SEPARATE: models in which TRB CDR3 sequence and antigen sequence are embedded by separate sequence-encoding layer stacks. BILSTM: bidirectional LSTM model, BIGRU: bidirectional GRU model, SA: self-attention model, CONV: convolution model, INCEPTION: inception-type model, NETTCR: NetTCR model (preprint: Jurtz et al, 2018), LINEAR: linear model (n = 3 cross-validations for all other models). D, E. Antigen-wise categorical models outperform models that are built to generalize across antigens on high-frequency antigens in IEDB (D) and on overlapping antigens between IEBD and single-cell data (E). In both cases, the models were trained on IEDB and tested on held-out observations from IEBD (D) or on the single-cell data (E). embedding: models that are embedding the antigen sequence and can be run on any antigen (Fig 3b), categorical: antigen-wise categorical models that do not have the antigen sequence as a feature (Fig 1B) (n = 3 cross-validations for all other models). Data information: All boxplots: the center of each boxplot is the sample median; the whiskers extend from the upper (lower) hinge to the largest (smallest) data point no further than 1.5 times the interquartile range from the upper (lower) hinge. The underlying data points are shown as swarm plots color-coded in the same way as the boxplot. Download figure Download PowerPoint Assembling meaningful training and test sets across databases We subset the data sets to allow a meaningful model comparison and predictivity evaluation: The single-cell data set contained more than 150,000 cells from four donors with successfully reconstructed TCR sequences and with measured binding specificity to 44 distinct pMHC complexes. The authors of this data set defined binding events by comparing the target pMHC counts to the counts of negative-control pMHCs. pMHCs were defined as negative-control pMHCs if they were not expected to specifically bind any TCR in the screen (10x Genomics, 2019). We assembled antigen specificity labels based on the same binding classification scheme. We removed putative cellular doublets from the data set (Methods and Protocols, Appendix Fig S2): A doublet of two cells of distinct specificities in a microfluidics setup may result in the TCR sequence of the first cell and the pMHC binding read-outs from the second cell being misreported as a third, non-existent, specificity pair. To avoid such non-existent specificity pairs, we chose a conservative doublet exclusion threshold (Methods and Protocols). We only considered the eight antigens in the pMHC CD8+ T-cell data set that had at least 100 unique, non-doublet clonotype observations to remove effects from strong class imbalance (Appendix Fig S3A and B). The total data set size was 91,495 unique, non-doublet observations (cells) across the four donors. We only assembled pairs of binding TCR CDR3 β-chain and antigen sequences from IEDB and VDJdb as these databases contain far fewer ɑ-chain than β-chain sequences and do not contain an equivalent of the cellular covariates found in the single-cell data. We only considered observations from the most commonly assayed HLA type HLA-A*02:01. We assembled a data set of 12,414 observations from 10,726 clonotypes and 71 antigens from IEBD and 3,964 observations from 2,812 clonotypes and 40 antigens from VDJdb, which contained at most 10 TCR sequences per clonotype. The number of TCR clonotypes per antigen was very heterogeneous, with the most frequently encountered antigen covering 4,812 clonotypes in IEDB, and 1,461 in VDJdb. We provided a detailed descriptive analysis of all data sets in Dataset EV3. TCR and specificity variation of the single-cell data are also described in detail elsewhere (10x Genomics, 2019). To avoid an over-optimistic estimation of model performance, we clustered the T cells into clonotypes and separated the single-cell data into train, test, and validation sets with regard to their assigned clonotypes so that each clonotype only existed in one of the splits (Methods and Protocols). We down-sampled clonotypes to a maximum of 10 observations. Cell-specific covariates improve binding event prediction Single-cell T-cell specificity screens feature multiple effects that confound the binding event and its observation. Here, we compared the performance of categorical antigen models with various sets of covariates to quantify the relevance of covariates for predictive models. Firstly, one would expect the donor identity to affect the TCR sequence if donors vary in their HLA genotype. We compared models with and without a one-hot encoded donor identity covariate to establish the impact of these donor-to-donor differences. We found that the performance of models without donor information varies strongly and is much worse than the performance of models with donor covariates. The mean area under the receiver operating characteristic curve (AUC ROC, Methods and Protocols) was 0.33 for bidirectional GRU models (the best-performing sequence-based models) without covariates and 0.81 for those with donor covariates (Fig 1C). The identification of binding events based on single-cell RNA-seq libraries is liable to false negatives due to a low capture rate of RNAs. In the single-cell screen, negative-control pMHCs were included to provide a background distribution of non-specific binding events and were part of the definition of discrete binding events (Methods and Protocols). The discrete labels are therefore already corrected for false-positive binding events. We investigated whether normalization factors and negative-control pMHC counts are useful predictors of a false-negative binding event that cannot be rescued by background signal correction: A donor covariate-only model (“donor”) was not outperformed either by a model that also included a scaled total mRNA count covariate (“donor + counts”) or by one that additionally also contained negative-control count covariates (“nc + donor + counts”) (Materials and Methods, Fig 1C). We conclude that such false-negative observations are either rare or cannot be captured by the correction proposed here. We also identified a predictive advantage of models that account for the cell state encoded by surface protein counts: bidirectional GRUs that accounted for donor, negative-control pMHC counts, and total counts improved from 0.83 AUC ROC to 0.86 if cell surface protein counts were added as a covariate (Fig 1C, Welch's t-test, P < 0.01). The surface protein counts can be used to embed cells based on their membrane surface structure in a latent space which can be used by the model to account for the abundance of TCRs and other binding-relevant proteins on the cell surface. The overall top-performing model accounted for donor, total counts, negative-control counts, and surface protein counts with an AUC ROC of 0.87 (Fig 1C). We validated that growing the set of covariates modeled lead to models that had additional (rather than different) correct predictions. The best-performing model with the highest number of covariates predicted almost all observations correctly that were also predicted correctly by models with fewer covariates (Fig 1D). The test sets are not balanced across the different classes: We found similar trends across covariate settings on each individual class as we found globally (Fig 1E). We validated that sequence information is indeed a relevant predictor in each of these covariate scenarios, indicating that the combination of sequence and non-sequence covariates is desirable (Fig 1F). Lastly, we investigated whether models that were fit with cell-specific covariates generalize to observations that do not contain these covariates. For this purpose, we applied models presented in this section to TCR sequences from matched and unmatched antigens from IEDB (Vita et al, 2019) and VDJdb (Shugay et al, 2018), setting the covariate input vector to zero. The best-performing linear predictors had true-positive rates above 0.55 while maintaining false-positive rates below 0.1 (Appendix Fig S4), suggesting that these models can generalize to settings in which not all covariates are observed. Co-modeling alpha- and beta-chains improves binding event prediction We compared the predictivity of models fit using one TCR CDR3 chain (“TRA only”, or “TRB only”) with models fit on both TRB and TRA chains (“TRA + TRB”, Materials and Methods) to evaluate the additional information inherent in the use of both chains. We found that TRA + TRB models were slightly better than TRA-only and TRB-only models across most layer types if basic single-cell covariates were included in the prediction. The top-performing TRA + TRB was 0.01 AUC ROC bet

Referência(s)