A Statistician’s journey in ‘omics wonderland*

Dr Kim-Anh Lê Cao from The University of Queensland Diamantina Institute with her team and many other collaborators present this exit seminar at TRI. 


The advent of high throughput technologies has led to a wealth of biological data coming from different sources, the so-called ‘omics data. In order to understand biological mechanisms and uncover important biological insights, we need to adopt a holistic and systems biology approach to analyse those complex data.

Univariate statistical approaches consider each biological variable independently to explain or model biological conditions or phenotypes. To shift the univariate analysis paradigm, we have developed several multivariate methods to identify a subset of variables - a ‘molecular or microbial signature’. 

I will first summarize several multivariate methods we have developed at UQDI to statistically integrate several ‘omics data sets at once. Here I refer to data integration in a broad sense, either where the same individuals are profiled using different ‘omics platforms or where independent studies including different individuals are generated under similar biological conditions using the same ‘omics platform. Both types of methods attempt to address the issue of data heterogeneity due to inherent platform-specific artefacts or systematic differences arising due to being assayed at different geographical sites or different times.

In the second part of my talk, I will illustrate how those methods can be extended to microbiome data. From a statistical point of view, microbiome data add an extra layer of complexity in the analyses due to their inherent properties including sparse counts, compositional data and the need to model microbial communities as a whole. I will present some preliminary methods and results we obtained in several in-house and publicly available studies.

All our methods are available in our R package mixOmics (www.mixOmics.org), which have been keeping us busy since I have set foot in Australia in 2009.

* no allusions to Alice’s adventures will be given during this talk

Speaker Profile - Dr. Kim-Anh Lê Cao

Group Leader, Computational Biostatistics, NHMRC Career Development Fellow

Dr Kim-Anh Lê Cao graduated with a PhD in Applied Statistics in 2008 from the Université de Toulouse, France. After completing her PhD, for which she was awarded the prestigious triennial Marie-Jeanne Laurent-Duhamel prize from the French Statistical Society, she started her postdoc at the Institute for Molecular Bioscience in Prof Geoff McLachlan group. Between 2009 - 2013 she was appointed as a Research Biostatistician at QFAB Bioinformatics where she developed a multidisciplinary approach to her research. In 2014 Kim-Anh was recruited at the UQDI and in 2015 she was awarded an NHMRC Career Development Fellowship. In 2016 she established a small Biostatistics Facility to provide statistical support to researchers at the UQDI.

Kim-Anh's research interests lie in multivariate statistical analysis, with a strong focus on the statistical analysis of high-throughput biological data, including longitudinal data, `omics data integration and identification of biomarkers in large-scale biological data. Her contribution in the field of Applied Statistics is to bridge the gap between molecular biologists, bioinformaticians, and biostatisticians through methodological developments and open access software. Kim-Anh has been nationally and internationally recognized as an expert in integrative multivariate methods for data integration and biomarker discovery and has made important contribution in the field of cancer and immune diseases.