This project aims to design a methodology to augment small datasets by exploiting the available domain knowledge. Specifically, we consider the analysis of transcriptomics data, that plays a crucial role in the development of personalized medicine. This analysis, using machine learning methods, is hindered by the small size of the available datasets. On the other hand, a rich domain knowledge base is available, the Gene Ontology (GO). We propose to learn an admissibility score of an expression profile, exploiting the existing dataset and a graph neural network whose architecture reproduces the directed acyclic graph structure of GO. A variational auto-encoder will be trained and biased to generate samples with a good admissibility score. The initial dataset, augmented with the generated “relevant enough” samples, will support the learning of classifiers along a semi-supervised setting, expectedly significantly improving the robustness and stability of the learned classifiers.
- Porteur : Blaise HANCZAR (PR Univ. Évry, IBISC équipe AROB@S)
- Financements : Labex DIGICOSME Paris-Saclay
- Durée : 36 mois