Quantifying the effect of experimental perturbations at single-cell resolution

Abstract

Current methods for comparing single-cell RNA sequencing datasets collected in multiple conditions focus on discrete regions of the transcriptional state space, such as clusters of cells. Here we quantify the effects of perturbations at the single-cell level using a continuous measure of the effect of a perturbation across the transcriptomic space. We describe this space as a manifold and develop a relative likelihood estimate of observing each cell in each of the experimental conditions using graph signal processing. This likelihood estimate can be used to identify cell populations specifically affected by a perturbation. We also develop vertex frequency clustering to extract populations of affected cells at the level of granularity that matches the perturbation response. The accuracy of our algorithm at identifying clusters of cells that are enriched or depleted in each condition is, on average, 57% higher than the next-best-performing algorithm tested. Gene signatures derived from these clusters are more accurate than those of six alternative algorithms in ground truth comparisons.

Data availability

Gene expression counts matrices prepared in ref. 13 were accessed from NCBI GEO database accession GSE92872. Gene expression counts matrices prepared in ref. 15 were downloaded from NCBI GEO accession GSE112294. The pancreatic islets datasets are available on NCBI GEO at accession GSE161465.

Code availability

Code for the MELD and VFC algorithms implemented in Python is available as part of the MELD package on GitHub (https://github.com/KrishnaswamyLab/MELD) and on the Python Package Index. The GitHub repository also contains tutorials, code to reproduce the analysis of the zebrafish dataset and code associated with several of the quantitative comparisons.

References

  1. 1.

    Luecken, M. D. & Theis, F. J. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15, e8746 (2019).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  2. 2.

    Weinreb, C., Wolock, S., Klein, A. M. & Berger, B. SPRING: a kinetic interface for visualizing high dimensional single-cell expression data. Bioinformatics 34, 1246–1248 (2018).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  3. 3.

    Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37, 1482–1492 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  4. 4.

    van Dijk, D. et al. Recovering gene interactions from single-cell data using data diffusion. Cell 174, 716–729 (2018).


    Google Scholar
     

  5. 5.

    Shekhar, K. et al. Comprehensive classification of retinal bipolar neurons by single-cell transcriptomics. Cell 166, 1308–1323 (2016).

    Article 
    CAS 

    Google Scholar
     

  6. 6.

    Levine, J. H. et al. Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  7. 7.

    Xu, C. & Su, Z. Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31, 1974–1980 (2015).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  8. 8.

    Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019).

    CAS 
    Article 

    Google Scholar
     

  9. 9.

    Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  10. 10.

    Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).

  11. 11.

    Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  12. 12.

    Jaitin, D. A. et al. Dissecting immune circuits by linking CRISPR-pooled screens with single-cell RNA-seq. Cell 167, 1883–1896 (2016).

    Article 
    CAS 

    Google Scholar
     

  13. 13.

    Datlinger, P. et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat. Methods 14, 297–301 (2017).

  14. 14.

    Gao, X., Hu, D., Gogol, M. & Li, H. ClusterMap: comparing analyses across multiple single cell RNA-seq profiles. Bioinformatics 35, 3038–3045 (2018).

  15. 15.

    Wagner, D. E. et al. Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo. Science 360, 981–987 (2018).

  16. 16.

    Farrell, J. A. et al. Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis. Science 360, eaar3131 (2018).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  17. 17.

    Dann, E., Henderson, N. C., Teichmann, S. A., Morgan, M. D. & Marioni, J. C. Milo: differential abundance testing on single-cell data using k-NN graphs | Preprint at bioRxiv https://doi.org/10.1101/2020.11.23.393769 (2020).

  18. 18.

    Büttner, M., Ostner, J., Müller, C., Theis, F. & Schubert, B. scCODA: a Bayesian model for compositional single-cell data analysis. Preprint at bioRxiv https://doi.org/10.1101/2020.12.14.422688 (2020).

  19. 19.

    Moon, K. R. et al. Manifold learning-based methods for analyzing single-cell RNA-sequencing data. Curr. Opin. Syst. Biol. 7, 36–46 (2018).

    Article 

    Google Scholar
     

  20. 20.

    Shuman, D. I., Narang, S. K., Frossard, P., Ortega, A. & Vandergheynst, P. The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag. 30, 83–98 (2013).

    Article 

    Google Scholar
     

  21. 21.

    Botev, Z. I., Grotowski, J. F. & Kroese, D. P. Kernel density estimation via diffusion. Ann. Stat. 38, 2916–2957 (2010).

    Article 

    Google Scholar
     

  22. 22.

    Shuman, D. I., Vandergheynst, P. & Frossard, P. Chebyshev polynomial approximation for distributed signal processing. In: Distributed Computing in Sensor Systems and Workshops (DCOSS). 2011 International Conference on Distributed Computing in Sensor Systems, 1–8 (IEEE, 2011).

  23. 23.

    Shuman, D. I., Ricaud, B. & Vandergheynst, P. Vertex-frequency analysis on graphs. Applied Comput. Harmon. Anal. 40, 260–291 (2016).

    Article 

    Google Scholar
     

  24. 24.

    Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  25. 25.

    Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  26. 26.

    DePasquale, E. A. K. et al. CellHarmony: cell-level matching and holistic comparison of single-cell transcriptomes. Nucleic Acids Res. 47, e138–e138 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  27. 27.

    Fischer, D. Theislab/diffxpy. Theis Lab https://github.com/theislab/diffxpy (2020).

  28. 28.

    Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  29. 29.

    Yen, S.-T. et al. Somatic mosaicism and allele complexity induced by CRISPR/Cas9 RNA injections in mouse zygotes. Dev. Biol. 393, 3–9 (2014).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  30. 30.

    Hammerschmidt, M. et al. Dino and mercedes, two genes regulating dorsal development in the zebrafish embryo. Development 123, 95–102 (1996).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  31. 31.

    Schulte-Merker, S., Lee, K. J., McMahon, A. P. & Hammerschmidt, M. The zebrafish organizer requires chordino. Nature 387, 862–863 (1997).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  32. 32.

    Fisher, S. & Halpern, M. E. Patterning the zebrafish axial skeleton requires early chordin function. Nat. Genet. 23, 442–446 (1999).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  33. 33.

    Ablamunits, V., Elias, D., Reshef, T. & Cohen, I. R. Islet T cells secreting IFN-γ in NOD mouse diabetes: arrest by p277 peptide treatment. J. Autoimmun. 11, 73–81 (1998).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  34. 34.

    Lopes, M. et al. Temporal profiling of cytokine-induced genes in pancreatic β-cells by meta-analysis and network inference. Genomics 103, 264–275 (2014).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  35. 35.

    Muraro, M. J. et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 3, 385–394 (2016).


    Google Scholar
     

  36. 36.

    Xin, Y. et al. Pseudotime ordering of single human β-cells reveals states of insulin production and unfolded protein response. Diabetes 67, 1783–1794 (2018).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  37. 37.

    Farack, L. et al. Transcriptional heterogeneity of beta cells in the intact pancreas. Dev. Cell 48, 115–125 (2019).

    Article 
    CAS 

    Google Scholar
     

  38. 38.

    Ramana, C. V., Gil, M. P., Schreiber, R. D. & Stark, G. R. Stat1-dependent and -independent pathways in IFN-γ-dependent signaling. Trends Immunol. 23, 96–101 (2002).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  39. 39.

    Sadler, A. J. & Williams, B. R. G. Interferon-inducible antiviral effectors. Nat. Rev. Immunol. 8, 559–568 (2008).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  40. 40.

    Fitzgerald, K. A. The interferon inducible gene: viperin. J. Interferon Cytokine Res. 31, 131–135 (2011).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  41. 41.

    Zheng, Z., Wang, L. & Pan, J. Interferon-stimulated gene 20-kDa protein (ISG20) in infection and disease: review and outlook. Intractable Rare Dis. Res. 6, 35–40 (2017).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  42. 42.

    Hultcrantz, M. et al. Interferons induce an antiviral state in human pancreatic islet cells. Virology 367, 92–101 (2007).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  43. 43.

    Stewart, A. F. et al. Human β-cell proliferation and intracellular signaling: part 3. Diabetes 64, 1872–1885 (2015).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  44. 44.

    Chen, X. et al. MLL-AF9 initiates transformation from fast-proliferating myeloid progenitors. Nat. Commun. 10, 5767 (2019).

  45. 45.

    Dutrow, E. V. et al. The human accelerated region HACNS1 modifies developmental gene expression in humanized mice. Preprint at https://www.biorxiv.org/content/10.1101/2019.12.11.873075v1 (2019).

  46. 46.

    Savell, K. E. et al. A dopamine-induced gene expression signature regulates neuronal function and cocaine response. Sci. Adv. 6, eaba4221 (2020).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  47. 47.

    Chung, K. M. et al. Endocrine–exocrine signaling drives obesity-associated pancreatic ductal adenocarcinoma. Cell 181, 832–847 (2020).

    Article 
    CAS 

    Google Scholar
     

  48. 48.

    Ravindra, N. G. et al. Single-cell longitudinal analysis of SARS-CoV-2 infection in human airway epithelium. Preprint at https://www.biorxiv.org/content/10.1101/2020.05.06.081695v2 (2020).

  49. 49.

    Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).

  50. 50.

    Coifman, R. R. & Lafon, S. Diffusion maps. Applied Comput. Harmon. Anal. 21, 5–30 (2006).

    Article 

    Google Scholar
     

  51. 51.

    Mack, Y. P. & Rosenblatt, M. Multivariate k-nearest neighbor density estimates. J. Multivar. Anal. 9, 1–15 (1979).

    Article 

    Google Scholar
     

  52. 52.

    Biau, G., Chazal, F., Cohen-Steiner, D., Devroye, L. & Rodríguez, C. A weighted k-nearest neighbor density estimate for geometric inference. Electron. J. Stat. 5, 204–237 (2011).

    Article 

    Google Scholar
     

  53. 53.

    Kung, Y.-H., Lin, P.-S. & Kao, C.-H. An optimal k-nearest neighbor for density estimation. Stat. Probabil. Lett. 82, 1786–1791 (2012).

    Article 

    Google Scholar
     

  54. 54.

    Von Luxburg, U. & Alamgir, M. Density estimation from unweighted k-nearest neighbor graphs: a roadmap. In: Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z. & Weinberger, K. Q. (eds.) Advances in Neural Information Processing Systems 26, 225–233 (Curran Associates, 2013).

  55. 55.

    Silverman, B. W. Density Estimation for Statistics and Data Analysis (Routledge, 2018).

  56. 56.

    Hammond, D. K., Vandergheynst, P. & Gribonval, R. Wavelets on graphs via spectral graph theory. Applied Comput. Harmon. Anal. 30, 129–150 (2011).

    Article 

    Google Scholar
     

  57. 57.

    Perraudin, N., Ricaud, B., Shuman, D. & Vandergheynst, P. Global and local uncertainty principles for signals on graphs. APSIPA Trans. Signal Inform. Process. 7, E3 (2018); https://doi.org/10.1017/ATSIP.2018.2

  58. 58.

    Mallat, S.A. Wavelet Tour of Signal Processing: The Sparse Way (Academic Press, 2008).

  59. 59.

    Zhou, D. & Schölkopf, B. A regularization framework for learning from graph data. In: ICML Workshop on Statistical Relational Learning and Its Connections to Other Fields 15, 67–68 (2004).

  60. 60.

    Ham, J., Lee, D. D. & Saul, L. K. Semisupervised alignment of manifolds. Proc. Annu. Conf. Uncertainty in Artificial Intelligence (eds Ghahramani, Z. & Cowell, R.) (AUAI Press, 2005).

  61. 61.

    Belkin, M., Matveeva, I. & Niyogi, P. Regularization and semi-supervised learning on large graphs. In: International Conference on Computational Learning Theory, 624–638 (Springer, 2004).

  62. 62.

    Ando, R. K. & Zhang, T. Learning on graph with Laplacian regularization. In: Schölkopf, B., Platt, J. C. & Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, 25–32 (MIT Press, 2007).

  63. 63.

    Weinberger, K. Q., Sha, F., Zhu, Q. & Saul, L. K. Graph Laplacian regularization for large-scale semidefinite programming. In: Schölkopf, B., Platt, J. C. & Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, 1489–1496 (MIT Press, 2007).

  64. 64.

    He, X., Ji, M., Zhang, C. & Bao, H. A variance minimization criterion to feature selection using Laplacian regularization. IEEE Trans. Pattern Anal. Mach. Intell. 33, 2013–2025 (2011).

    PubMed 
    Article 

    Google Scholar
     

  65. 65.

    Liu, X., Zhai, D., Zhao, D., Zhai, G. & Gao, W. Progressive image denoising through hybrid graph Laplacian regularization: a unified framework. IEEE Trans. Image Process. 23, 1491–1503 (2014).

    PubMed 
    Article 

    Google Scholar
     

  66. 66.

    Pang, J., Cheung, G., Ortega, A. & Au, O. C. Optimal graph Laplacian regularization for natural image denoising. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2294–2298 (IEEE, 2015).

  67. 67.

    Pang, J. & Cheung, G. Graph Laplacian regularization for image denoising: analysis in the continuous domain. IEEE Trans. Image Process. 26, 1770–1785 (2017).

    PubMed 
    Article 

    Google Scholar
     

  68. 68.

    Perraudin, N. et al. GSPBOX: a toolbox for signal processing on graphs. Preprint at https://arxiv.org/abs/1408.5781 (2016).

  69. 69.

    Barron, M. & Li, J. Identifying and removing the cell-cycle effect from single-cell RNA-sequencing data. Sci. Rep. 6, 33892 (2016).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  70. 70.

    Belkin, M. & Niyogi, P. Convergence of Laplacian eigenmaps. In: Schölkopf, B., Platt, J. C. & Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, 129–136 (MIT Press, 2006).

  71. 71.

    Coifman, R. R. & Maggioni, M. Diffusion wavelets. Applied Comput. Harmon. Anal. 21, 53–94 (2006).

    Article 

    Google Scholar
     

  72. 72.

    Chaudhuri, P. & Marron, J. S. Scale space view of curve estimation. Ann. Stat. 28, 408–428 (2000).

    Article 

    Google Scholar
     

  73. 73.

    Perraudin, N., Holighaus, N., Søndergaard, P. L. & Balazs, P. Designing Gabor windows using convex optimization. Appl. Math. Comput. 330, 266–287 (2018).


    Google Scholar
     

  74. 74.

    Ng, A. Y., Jordan, M. I. & Weiss, Y. On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems 849–856 (NIPS, 2001).

Download references

Acknowledgements

The authors would like to thank C. Vejnar, R. Coifman, J. Noonan, V. Tornini and C. Kontur for fruitful discussions. We would also like to thank G. Wang of the Yale Center for Genome Analysis for help in preparing the pancreatic islet data. This research was supported, in part, by the Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institues of Health (NIH) (award no. F31HD097958) (to D.B.); the Gruber Foundation (to S.G.); IVADO Professor startup and operational funds, IVADO Fundamental Research Project grant PRF-2019-3583139727 (to G.W.); NIH grants R01GM135929 and R01GM130847 (to G.W. and S.K.); and Chan-Zuckerberg Initiative grants 182702 and CZF2019-002440 (to S.K.). The content provided here is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.

Author information

Author notes

  1. These authors contributed equally: Daniel B. Burkhardt, Jay S. Stanley.

  2. These authors jointly supervised this work: Guy Wolf, Antonio J. Giraldez, David van Dijk, Smita Krishnaswamy.

Affiliations

  1. Department of Genetics, Yale University, New Haven, CT, USA

    Daniel B. Burkhardt, Antonio J. Giraldez & Smita Krishnaswamy

  2. Computational Biology & Bioinformatics Program, Yale University, New Haven, CT, USA

    Jay S. Stanley III & Scott A. Gigante

  3. Department of Computer Science, Yale University, New Haven, CT, USA

    Alexander Tong & Smita Krishnaswamy

  4. Department of Immunobiology, Yale University, New Haven, CT, USA

    Ana Luisa Perdigoto & Kevan C. Herold

  5. Department of Internal Medicine (Cardiology), Yale University, New Haven, CT, USA

    David van Dijk

  6. Department of Mathematics and Statistics, Université de Montréal, Montreal, QC, Canada

    Guy Wolf

  7. Mila – Quebec AI Institute, Montreal, QC, Canada

    Guy Wolf

Contributions

D.B.B., S.K., G.W., D.v.D. and A.J.G. envisioned the project. D.B.B., J.S., A.T., S.K. and G.W. developed the mathematical formulation of the problem and related numerical analysis. D.B.B., J.S. and S.G. implemented the code. D.B.B. and S.K. performed the analysis of biological and simulated data. A.L.P. and K.C.H. generated and assisted with the analysis of the pancreatic islet dataset. A.J.G. assisted with the analysis of the zebrafish data and related writing. D.B.B., J.S., A.T., S.K. and G.W. wrote the paper. S.G. assisted with the writing.

Corresponding authors

Correspondence to
David van Dijk or Smita Krishnaswamy.

Ethics declarations

Competing interests

The authors declare the following competing interest: S.K. is a paid scientific advisor to AI Therapeutics.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

About this article

Verify currency and authenticity via CrossMark

Cite this article

Burkhardt, D.B., Stanley, J.S., Tong, A. et al. Quantifying the effect of experimental perturbations at single-cell resolution.
Nat Biotechnol (2021). https://doi.org/10.1038/s41587-020-00803-5

Download citation

Read More

Leave a Reply