MicroRNAs (miRNAs) are small non-coding RNAs that play a crucial role in the post-transcriptional regulation of messenger RNA (mRNA) expression. They influence a wide range of biological, physiological and pathological processes. MiRNAs are attracting growing interest for their therapeutic potential, particularly in the field of cancerology. The current challenges associated with the clinical application of miRNAs involve gaining a better understanding of their modes of action, accurately identifying their targets and assessing their phenotypic impact. Although numerous experimental and bioinformatics methods have been developed to identify or predict these targets, the poor agreement between their predictions underlines the fact that this issue still requires significant advances.
Since 2018, five research groups have developed experimental techniques for co-sequencing miRNAs and mRNAs at the single-cell level. The data generated represent precious resources for the study of miRNAs. More accurate than conventional bulk sequencing data, these single-cell approaches enable more precise analysis of the effects of miRNAs on their predicted targets.
The first part of this thesis is devoted to analysing the correlations between miRNA expression and their predicted targets in these co-sequencing datasets. To this end, we have used the short bioinformatics study carried out in the article by Wang et al. (Nat. Comm., 2019), which is associated with a co-sequencing dataset of 19 cells from the K562 lineage. Our in-depth analysis of this dataset corrected several statistical biases and showed that only a small proportion of miRNAs are significantly anti-correlated with their targets. In addition, our study highlighted a trade-off between analysing a small number of predicted targets with high confidence and including a larger number of targets. These findings were validated using two other co-sequencing datasets (Xiao et al. Genome Biol., 2018 and Li et al. Sci. Rep., 2025).
The second part of this thesis focuses on the use of correlation analyses applied to co-sequencing data, with the aim of comparing the performance of different miRNA target prediction algorithms and experimentally validated target databases. These recent datasets constitute an original resource for these comparisons since they have never been used to develop a target prediction algorithm. The tools were evaluated for their ability to predict targets significantly anti-correlated with miRNAs, in a context of human cell lines and primary mouse lung cells. Seven of the most popular algorithms were compared: Diana microT, miRDB, mirDIP, miRmap, miRWalk, RNA22 and TargetScan, as well as the miRTarBase and TarBase databases. The analyses revealed differences in the performance of certain tools between human and murine miRNAs. In addition, this study highlights the value of restricting predicted targets to those validated experimentally, and the benefits of using consensus built from different algorithms.
This work highlights the value of generating miRNA-mRNA co-sequencing data at the single cell level to better understand the impact of miRNAs on the expression of their targets. However, the analysis of these complex data must be carried out in a systematic and rigorous manner. On the other hand, these precursor datasets can be used to compare target prediction tools and thus provide clear guidance for their use.
Supervision of the thesis :
Laurent GUYON
Nadia CHERRADI