Ahcène Boubekki

Pioneer Centre for AI

Billede af oplægsholderen Ahcène Boubekki

Under the Hood of Black-Box Models with NAVE

Opaque, hidden, black-box, deep... Adjectives to describe the complexity of neural networks are not in short supply. Yet, do these terms describe an inherent structural limitation of these models, or merely the depth of our complacency in the status quo? Emblematic of the rapid growth of the field of explainable AI (XAI) are the numerous saliency or feature-attribution methods and their equally diverse explanations: black-box methods for explaining black-box models.

Nevertheless, we present a novel XAI method: Neuro-Activated Vision Explanations (NAVE). The difference here is that we are primarily interested in a good representation of the internal representations of a CNN. One that is simple and architecture-agnostic and from which explanations follow naturally. Building on the fact that similar parts of an input produce similar outputs, we use k-means to automatically extract the concepts a model has learned and how they are used in predictions. Using both image and ECG data, we show how to use NAVE to inspect models, explain predictions, evaluate training quality, and detect or deactivate shortcuts.

We argue that such simple, lightweight inspection techniques pave the way for transparent and verifiable AI systems.

Bio: Ahcène Boubekki is a Postdoctoral Researcher at the University of Copenhagen and the Pioneer Centre for AI, researching explainable representation learning and unsupervised methods. His work focuses on developing practical tools to inspect, debug, decipher, and ultimately control the internal mechanisms of neural networks, which he sees as the intermediate steps toward transparent and trustworthy AI systems.