Exploring the metabolic effects of different genotoxic compounds from transcriptomics data: using gene selection to unravel significant effects
Abstract
Abstract
Toxic agents are a major risk for human/animal health and biosystems, particularly those that damage the genome called genotoxic substances. Understanding the modes of action of these genotoxic compounds, particularly at the level of metabolic pathways, is a current need in order to be able to characterize them. As part of the ANR GENOSHIFT project, a multi-omics study is carried out to identify transcriptomic and metabolic biomarkers of exposition following chronic exposure of human liver cells to 12 genotoxic compounds of reference, at low doses.
Human liver cells (HepaRP line [1]) were treated with each genotoxic compound for five days. From the same cells, RNA was extracted for transcriptomic analysis (Agilent SurePrint G3 Human GE V3 8x60K microarray) on the one hand, and on the other hand the cell pellets were extracted for metabolomic analysis by Nuclear Magnetic Resonance (NMR 600 MHz). These analyses allowed to generate transcriptomic (64 samples and 31, 972 variables) and metabolomic (64 samples and 875 variables) data, respectively. To identify biomarkers of the genotoxic effects of these compounds, multivariate statistical analyzes were performed independently on each of these omics data.
First, to identify genes and metabolites correlated and modulated by chronic exposure of cells to these genotoxic compounds, statistical integration is necessary. Omics data contains a large number of variables, some of which may be noisy, irrelevant or redundant. These redundant, irrelevant or noisy variables can reduce the overall accuracy of a prediction model or even lead to a non-significant model. The transcriptomic data block, compared to the metabolomic data block, contains a large number of genes which are not all “informative”. In addition, this large number of genes can give the transcriptomic data a greater weight than the metabolomic data during integration even if it does not report more information. To overcome these problems, a variable selection step on the transcriptomic data block was applied before statistical integration.
Two selection methods were tested to reduce the dimension of transcriptomic data. Affinity Propagation Clustering (AP clustering) [2], an unsupervised clustering method, allows to remove redundancy between variables. The second method, RELIEFF [3], a supervised method, was combined with AP clustering [2], to select both discriminative and non-redundant variables.
These methods were applied on the transcriptomics data obtained in cells exposed to Aflatoxin B1 (AFB1), for which there were 5473 differentially expressed genes, and to Cisplatin (CIS) for which there were no differentially expressed genes. RELIEFF combined with AP clustering enabled to select discriminant and non-redundant genes for these two compounds. With 57 genes for AFB1 and 34 genes for CIS, the groups (control and treatment) were separated. AP clustering also selected nonredundant genes (at least 16 genes for the compound AFB1 and 34 genes for CIS) and thus reduced the data dimension.
These results show that the RELIEFF method combined with AP clustering is able to select discriminant and non-redundant genes, particularly for compounds for which there is a weak genotoxic effect (e.g., CIS). The unsupervised method (AP clustering) reduced the size of the data by selecting around a few thousand or even around twenty variables.
References
1. Brun C., Allain C., Ferron PJ., Younoussa H., Colicchio B., Jeandidier E., M’Kacher R., Guguen-Guillouzo C., Bertile F. Extended lifespan and improved genome stability in HepaRG-derived cell lines through reprogramming by high-density stress. Proc Natl Acad Sci. 2023 Sep 5;120(36):e2219298120.
2. Frey B. J. and Dueck D. Clustering by passing messages between data points. Science 315, 972-976. DOI:
doi:10.1126/science.1136800. 2007 Feb 16;315(5814):972–6.
3. Kononenko I., Simes E. and Robnik-Sikonja M. Overcoming the Myopia of Induction Learning Algorithms with
RELIEFF. Applied Intelligence, Vol.7, 1, 39-55. 1997.
Fichier principal
Poster_Congres_JOBIM_Patricia_Kembia_Kalombo.pdf (836.56 Ko)
Télécharger le fichier
Origin | Files produced by the author(s) |
---|