Exposing and explaining misbehaviours of deep learning systems
PhD: Università della Svizzera italiana
English
Assessing the quality of Deep Learning (DL) systems is crucial, as they are increasingly adopted in safety-critical domains. Researchers have proposed several input generation techniques for DL systems. While such techniques can expose failures, they do not explain which features of the test inputs influenced the system's misbehaviour. This research delves into diverse methodologies aimed at overcoming challenges inherent in testing DL systems, with a particular focus on generating targeted test cases and interpreting system behaviours. To this aim, we proposed three novel testing approaches for DL systems, i.e., DEEPHYPERION-CS, DEEPATASH, and DEEPTHEIA. DEEPHYPERION-CS explores the feature space at large using Illumination Search and provides a unique characterisation of a DL system's quality through an interpretable map which represents the highest-performing (i.e., misbehaving or closest to misbehaving) inputs in the space of the relevant, domain-specific features. We introduce a novel methodology to guide users in manually defining and quantifying feature dimensions effectively. Our empirical study shows that DEEPHYPERION-CS is more effective than state-of-the-art DL testing tools in generating failure-inducing inputs associated with highly diverse features. DEEPATASH is a focused test generator, i.e., a solution for generating failure-inducing inputs with specific features. It can address the development to operation (dev2op) data shift phenomenon, by focusing on interesting feature values observed in operational environments. Further enhancing test generation efficiency, DEEPATASH-LR integrates a surrogate model into the process. Experimental results show that both DEEPATASH and DEEPATASH-LR are effective in generating focused test inputs and improving the quality of the original DL systems through fine tuning on data with the targeted features without regression. DEEPTHEIA is a fully automated illumination-based test generator capable of autonomously extracting features and exploring the feature space using diffusion models. It overcomes the limitation of illumination-based approaches such as DEEPHYPERION, i.e. the need of human expert involvement for the definition of the features and the need of generative input models that can be mutated during the search process. Finally, we provide a thorough comparison of explanatory techniques used to under- stand DL system misbehaviours, including our newly proposed feature maps, shedding light on both their comprehensibility and limitations. Our findings contribute significantly to advancing testing methodologies and enhancing the interpretability of the causes of DL misbehaviours.
-
Collections
-
-
Language
-
-
Classification
-
Computer science and technology
-
License
-
License undefined
-
Open access status
-
green
-
Identifiers
-
-
Persistent URL
-
https://n2t.net/ark:/12658/srd1328318