I am an assistant professor of linguistics at Cornell University. I am very interested in the incremental representations that humans use to process language, and in differences between how language is used and how it is processed. To explore these topics, I study the relationships between computational language models and psycholinguistic data (e.g., reading times) and I study neural network representations of language to understand what aspects of language can be learned from language statistics directly without having experiences in the real world (i.e. through ungrounded learning).
If you’re interested in incremental processing models, you may find these helpful:
- LSTM toolkit that can estimate incremental processing difficulty
- 125 pre-trained English LSTMs
- Left-corner parsing toolkit that can estimate incremental processing difficulty
I manage the Computational Psycholinguistics Discussions research group (C.Psyd) and am part of the Cornell Computational Linguistics Lab (CLab) and the Cornell Natural Language Processing Group (Cornell NLP).
My surname is easy to pronounce (in words, not IPA): /van ‘shine-dull/
Feb 1: Gave an invited talk at UC Irvine: Neural Language Priming
Jan 25: Gave an invited talk at Dongguk University: NLP: Neural Language Priming
Dec 3: Gave an invited talk at University of Chicago: Language Statistics Won’t Solve Language Processing
Oct 15: Gave an invited talk at Georgia Tech NLP Seminar: Language Statistics Won’t Solve Language Processing
Aug 26: Timkey and van Schijndel (2021) accepted at EMNLP 2021: We show that Transformer models consistently develop rogue dimensions that operate at bizarrely inflated scales and track relatively uninteresting phenomena (e.g., time since last punctuation mark). The inflated scale distorts similarity estimates and makes cosine a poor measure of similarity. We introduce a very simple method to correct for the issue that retains all information in the model and requires no retraining.
July 26: Ryb and van Schijndel (2021): Shows that, although shallow heuristics are used extensively by BERT during NLI, certain kinds of symbolic reasoning also arise in BERT. Some types of reasoning, such as spatial reasoning remain beyond it.
June 25: Cognitive Science paper finally published!
We show that surprisal (or more generally, single-stage prediction models) can only explain the existence of garden path effects in reading times, not the magnitude of the effects themselves. Suggests the existence of explicit repair mechanisms are involved during garden path processing.
May 10: 2 papers accepted at ACL and ACL Findings.
1) Davis and van Schijndel (2021): We show that linguistic knowledge in language models can be modeled as constraints. Thus, some linguistic representations can prevent other learned linguistic knowledge from surfacing. We show how to fix this, but more generally we outline a framework for thinking about language representations in neural networks.
2) Wilber et al. (2021): We explore the abstractive capabilities of automatic summarization models. We show that abstractive summarization is extremely shallow at present, often simply emulating extractive summarization.
Sept 18: 2 papers accepted at CoNLL.
1) Bhattacharya and van Schijndel (2020): Neural networks encode abstract filler-gap existence but do not learn more abstract clusterings over kinds of filler-gaps. Raises questions about the depth of abstraction needed to process text sequences.
2) Davis and van Schijndel (2020): Transformers encode implicit causality verb biases but fail to use that knowledge to make correct predictions. Validates Hartshorne’s theory that IC is learnable from language sequences, but suggests that the language modeling objective prevents models from using this knowledge.