Computational pipeline to characterize time-dependent heterogeneity in single-cell data
Single-cell RNA sequencing (scRNA-seq) measures gene expression at the level of individual cells and makes it possible to study cellular variation within complex tissues. However, analyzing these large datasets to identify and interpret distinct cell subtypes remains challenging. The goal of this project is to design and implement a computational prototype that identifies transcriptionally distinct cell populations from scRNA-seq data. Rod photoreceptors are the primary cells responsible for low-light vision in the retina. While rods are typically treated as a single population, transcriptionally defined rod subtypes have not been well established. Because of this, identifying potential rod subtypes requires unsupervised clustering methods that group cells based on gene expression patterns without predefined labels. Additional analysis is needed to interpret whether these clusters represent meaningful cellular populations. To address this, we developed an analytical workflow that processes scRNA-seq data through several stages. The pipeline begins with a quality control step that removes low-quality cells to ensure reliable input data. Rod photoreceptor cells are then computationally isolated from the dataset and analyzed using clustering algorithms to identify potential rod subtypes. In addition, the dataset is examined across different time points to visualize how cells from each time point relate to the identified clusters. This analysis helps assess whether subtype-related gene expression patterns are consistent or vary across time. Cluster quality is evaluated using metrics such as silhouette scores and ROGUE scores to assess how well the clusters are separated and whether they represent consistent cell groups. To capture complex transcriptional structure in the data, dimensionality reduction and autoencoding techniques are applied to identify latent relationships in gene expression between cells. Differential gene expression analysis is then performed to identify differentially expressed genes (DEGs) that distinguish clusters. These DEGs are used for downstream pathway analysis using Ingenuity Pathway Analysis (IPA) to help interpret the biological meaning of the unsupervised clusters. The implemented prototype integrates quality control, rod cell isolation, clustering, temporal analysis, and pathway interpretation into a modular computational pipeline. The project develops a computational framework for identifying cell subtypes from single-cell RNA sequencing data. Although this work focuses on rod photoreceptors, the same approach can be applied to other cell types and biological systems where transcriptionally distinct cell populations may exist. One possible application of this framework is the study of inherited retinal diseases, where identifying rod subtypes may help researchers investigate early molecular changes that occur before widespread retinal degeneration and may support the development of targeted therapeutic approaches.