You are here : Home > The lab > Statistical methods for the treatment of longitudinal data in quantitative proteomics

Hélène Borges

Statistical methods for the treatment of longitudinal data in quantitative proteomics

Published on 19 October 2021
Thesis presented October 19, 2021

The goal of proteomics is the identification and quantification of proteins present in biological samples. One of its applications is the search for new biomarkers, i.e., measurable entities that precisely describe a specific biological state. These biomarkers can then be used in a clinical context, for the diagnosis or the medical monitoring of patients suffering from pathologies, in particular chronic ones hereby assisting the clinicians in their care and treatment. Biomarker discovery involves the differential analysis of proteins, in other words, the demonstration of an alteration in the expression of proteins between the different samples by a statistical analysis. However, the analysis of large clinical cohorts requires specific numerous instruments producing complex data. These data are difficult to process, due to technical biases and inter-patient variability. Eventually, inadequate processing of these data can lead to erroneous results. To tackle this challenge, while maintaining a high level of automation (essential for the daily work of an analytic platform managing multiple concomitant projects), methodological developments as well as their software implementation are necessary. This work seeks to meet this need thanks to three main contributions. The first one is the development of the Well Plate Maker software to assist in the design of more robust experimental protocols. The software automatically generates a well plate filling strategy that minimizes potential biases in the experiment and consequently allow more reproducible downstream statistical results. The second one is the reliable and reproducible adaptation of Analysis of Variance (a classic statistical approach) to account for the specificities of proteomics data. This adaptation is combined with methods of representation and visualization of protein expression profiles, while preserving ease of use for proteomicians in an application context of an analysis platform. The third contribution is the concrete application of the above methodology on a clinical cohort of patients with non-alcoholic fatty liver disease. We have identified proteins with expression profiles describing the progression of the disease, which may be of interest to explore in further clinical studies. Beyond the case of non-alcoholic fatty liver disease, this work illustrates the interest of proteomics as a reliable complementary tool in the clinical context of patient monitoring and care.

Data analysis, quantitative proteomics, bioinformatics

On-line thesis.