Towards Computing Attributions for Dimensionality Reduction Techniques

Résumé

We describe the problem of computing local feature attributions for dimensionality reduction methods. We use one such method that is well established within the context of supervised classification – using the gradients of target outputs with respect to the inputs – on the popular dimensionality reduction technique t-SNE, widely used in analyses of biological data. We provide an efficient implementation for the gradient computation for this dimensionality reduction technique. We show that our explanations identify significant features using novel validation methodology; using synthetic datasets and the popular MNIST benchmark dataset. We then demonstrate the practical utility of our algorithm by showing that it can produce explanations that agree with domain knowledge on a SARS-CoV-2 sequence data set. Throughout, we provide a road map so that similar explanation methods could be applied to other dimensionality reduction techniques to rigorously analyze biological datasets

Matthew Scicluna
Matthew Scicluna
Étudiant au doctorat en bio-informatique

Étudiant au Doctorat en Bio-informatique |

Sébastien Lemieux
Sébastien Lemieux
Chercheur principal

Chercheur principal, Unité de recherche en bio-informatique fonctionnelle et structurale, IRIC | Direction scientifique de la plateforme de Bio-informatique | Professeur agrégé, Département de biochimie et médecine moléculaire, Université de Montréal