*Neural networks as explanatory models of language processing*

Artificial neural networks have become remarkably successful in many different subfields of AI. This is exciting for many reasons, one of which is that their improved performance makes them potentially useful as *explanatory models* that may help to better understand the tasks that they are trained to do.  In this presentation, I consider if artificial neural networks may be used as explanatory models of natural language processing, with a particular focus on structure, hierarchy and compositionality.

I will first explain how using neural networks as explanatory models involves two different "strands" of research -- behavioural research in which a model's abilities are considered, and interpretability research, which involves learning *how* models implement solutions to the tasks they are trained on -- followed by a brief overview of the work that I have done on both these strands.  I will then proceed to discuss two of my previous studies. First, I consider how an LSTM-based language model processes long-distance subject-verb agreement relationships. I describe a study in which we used diagnostic classifiers and diagnostic interventions to understand when and where the information to process well in this task is encoded. Secondly, I consider on a more general level what kind of generalisations we can (or want to) expect from neural networks. Using examples from a study with artificial data which isolate different types of generalisation -- grounded in linguistics and philosophy of language -- I consider the differences between three different types of architectures: an LSTM-based architecture, a convolution-based architecture and a transformer model. In particular, I show results on how *local* models' computation of complex expressions is and I present an experiment that considers the extent to which models overgeneralise when they are faced with exceptions to rules. I will finish with a brief outlook for future work.