We tested for the null hypothesis of the randomly expressed gene with the same distribution of expression values having a higher gene connectivity score

We tested for the null hypothesis of the randomly expressed gene with the same distribution of expression values having a higher gene connectivity score. cordis a well characterized example of cellular lineage commitment and terminal cellular differentiation1. Neural precursor cells differentiate in response to spatiotemporally regulated morphogen gradients that are generated in the neural tube by activating a cascade of specific transcriptional programs1. A detailed understanding of this process has been hindered by the inability to isolate and purify sufficient quantities of synchronized cellular subpopulations from the developing murine spinal cord. Although approaches have been used to study both the mechanisms of motor neuron differentiation2, and motor neuron disease3, 4, alimitation of these approaches is the differential exposure of embryoid bodies (EBs) to inductive ligands and uncharacterized paracrine signaling within EBs, which lead to the generation of heterogeneous populations of differentiated cell types5. Motor neuron disease mechanisms are currently studied in a heterogeneous background of cell types whose contributions to pathogenesis are unknown. Methods to analyse the transcriptome of individual differentiating motor neurons could provide fundamental insights into the molecular basis of neurogenesis and motor neuron disease mechanisms. Single-cell RNA-sequencing carried out over time enables the dissection of transcriptional programs during cellular differentiation of individual cells, thereby capturing heterogeneous cellular responses to developmental induction. Several algorithms for the analysis of single-cell RNA-sequencing data from developmental processes have been published, including Diffusion Pseudotime6, Wishbone7, SLICER8, Destiny9, Monocle10, and SCUBA11 (Supplementary Table 1). All of these methods can be used to order cells according to their expression profiles, and PF 431396 they enable the indentification of lineage branching events. However, Destiny9 PF 431396 lacks an unsupervised framework for determining the transcriptional events that are statistically associated with each stage of the differentiation process; and the statistical framework of Diffusion Pseudotime, Wishbone, Monocle, and SCUBA is usually biased, for example by assuming a differentiation process with exactly one branch event6, 7 or a tree-like structure10, 11. Although these methods can reveal the lineage structure when the biological process fits with the assumptions, an unsupervised method would be expected to have the advantage of extracting more complex relationships. For example, the presence of multiple impartial lineages, convergent lineages, or the Rabbit Polyclonal to TSPO coupling of cell cycle to lineage commitment. Moreover, apart from SCUBA, these methods do not exploit the temporal information available in longitudinal single cell RNA-sequencing experiments, and they require the user to explicitly specify the least differentiated state6-10. We present an unbiased, unsupervised, statistically strong mathematical approach to single cell RNA-sequencing data analysis that addresses these limitations. Topological data analysis (TDA) is usually a mathematical approach used to study the continuous structure of high-dimensional data sets. TDA has been used to study viral re-assortment12, human recombination13, 14, cancer15, and other complex genetic diseases16. scTDA is usually applied to study time-dependent gene expression using longitudinal single-cell RNA-seq data. Our scTDA method is usually a statistical framework for the detection of transient cellular populations and their transcriptional repertoires, and does not assume a tree-like structure for the expression space or a specific number of branching points. scTDA can be used to assess the significance of topological features of the expression space, such as loops or holes. In addition, it exploits temporal experimental information when available, inferring the least differentiated state from the data. Here we apply scTDA to analyse the transcriptional programs that regulate developmental decisions as mESCs transition from pluripotency to fully differentiated motor neurons and concomitant cell types. Results Overview of scTDA Single-cell gene expression can be represented as a sparse high-dimensional point PF 431396 cloud, with the number of dimensions equivalent to the number of expressed genes (10,000). Extracting biological information from such data requires a reduction in the dimensionality of the space. Widely-used algorithms, such as multidimensional scaling (MDS), impartial component analysis (ICA), and t-distributed stochastic neighbor embedding (t-SNE), have been adapted to flow cytometry, mass spectrometry11, 17, and single-cell RNA-seq measurements10, 18. These strategies, however, are not designed to preserve continuous associations in high dimensions. Physique 1a represents a simple example of a one-dimensional continuous manifold (a circle) twisted in three dimensions. Reduction to two dimensions using these algorithms introduces artifacts in the low-dimensional representation, including artifactual intersections (in MDS and ICA), and tearing apart the original continuous structure (in t-SNE). As cell differentiation is usually a continuous asynchronous process, we reasoned.