research

Preprints

Comprehension Effort as the Cost of Inference Jacob Hoover Vigly, Peng Qian, Morgan Sonderegger, and Timothy J. O’Donnell psyArXiv. 18 Jun, 2025.
[link]
bib
@online{vigly.j:2025psyarxiv, title = {Comprehension Effort as the Cost of Inference}, author = {Vigly, Jacob Hoover and Qian, Peng and Sonderegger, Morgan and O'Donnell, Timothy J.}, year = {2025}, month = jun, day = {18}, url = {https://osf.io/2498w}, eprinttype = {psyArXiv} }
abstract

As you read this text, word by word, you build an understanding of its meaning. What cognitive mechanisms underlie this ability? An influential approach to answering this question comes from viewing comprehension as probabilistic inference over potential interpretations given linguistic input. Motivated within this perspective, a wealth of previous literature in psycholinguistics has focused on an important empirical relationship made precise by surprisal theory (Hale, 2001; Levy, 2008a), the hypothesis that the effort required to process a word scales in its negative log probability, in context. However, the standard derivation of surprisal within the inference framework relies on a crucial assumption: that there is a deterministic relationship between the latent interpretations targeted by inference and the observable input. In this work we propose relaxing this assumption and formalize inference cost directly as the amount of change in probabilistic beliefs. This proposal forms a nontrivial generalization of standard surprisal theory, which provides a more direct connection to algorithmic theories, and naturally explains phenomena where unpredictable input requires little processing effort. To test this framework against surprisal theory, we conduct a self-paced reading time study targeting words with orthographic errors, a specific setting where our approach predicts substantially different patterns. We find that processing effort follows the predictions of belief-update rather than surprisal, in a noisy-channel model of comprehension as inference about intended words. These results demonstrate a clear case where surface surprisal cannot explain human processing cost, and provide further support for models of language comprehension as rational inference.
Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling Benjamin Lipkin, Benjamin LeBrun, Jacob Hoover Vigly, João Loula, David R. MacIver, Li Du, Jason Eisner, Ryan Cotterell, Vikash Mansinghka, Timothy J. O’Donnell, Alexander K. Lew, and Tim Vieira arXiv 2504.05410. 07 Apr, 2025.
[link]
bib
@online{lipkin.b:2025arxiv, title = {Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling}, author = {Lipkin, Benjamin and LeBrun, Benjamin and Vigly, Jacob Hoover and Loula, João and MacIver, David R. and Du, Li and Eisner, Jason and Cotterell, Ryan and Mansinghka, Vikash and O'Donnell, Timothy J. and Lew, Alexander K. and Vieira, Tim}, year = {2025}, month = apr, day = {07}, eprint = {2504.05410}, eprinttype = {arXiv}, eprintclass = {cs}, url = {https://doi.org/10.48550/arXiv.2504.05410} }
abstract

The dominant approach to generating from language models subject to some constraint is locally constrained decoding (LCD), incrementally sampling tokens at each time step such that the constraint is never violated. Typically, this is achieved through token masking: looping over the vocabulary and excluding non-conforming tokens. There are two important problems with this approach. (i) Evaluating the constraint on every token can be prohibitively expensive – LM vocabularies often exceed \100,000 tokens. (ii) LCD can distort the global distribution over strings, sampling tokens based only on local information, even if they lead down dead-end paths. This work introduces a new algorithm that addresses both these problems. First, to avoid evaluating a constraint on the full vocabulary at each step of generation, we propose an adaptive rejection sampling algorithm that typically requires orders of magnitude fewer constraint evaluations. Second, we show how this algorithm can be extended to produce low-variance, unbiased estimates of importance weights at a very small additional cost – estimates that can be soundly used within previously proposed sequential Monte Carlo algorithms to correct for the myopic behavior of local constraint enforcement. Through extensive empirical evaluation in text-to-SQL, molecular synthesis, goal inference, pattern matching, and JSON domains, we show that our approach is superior to state-of-the-art baselines, supporting a broader class of constraints and improving both runtime and performance. Additional theoretical and empirical analyses show that our method’s runtime efficiency is driven by its dynamic use of computation, scaling with the divergence between the unconstrained and constrained LM, and as a consequence, runtime improvements are greater for better models.

Selected publications

The Cost of Information: Looking beyond Predictability in Language Processing Jacob Louis Hoover PhD Thesis, McGill University, Linguistics Department. Aug, 2024. [link] [pdf] [☛ précis]

bib

@thesis{hoover.j:2024phd,
  title = {The Cost of Information: Looking beyond Predictability in Language Processing},
  author = {Hoover, Jacob Louis},
  type = {PhD},
  year = {2024},
  month = aug,
  note = {PhD Thesis, McGill University, Linguistics Department},
  school = {McGill University},
  langid = {en-CA},
  url = {https://escholarship.mcgill.ca/concern/theses/r494vr42w}
}

The Plausibility of Sampling as an Algorithmic Theory of Sentence Processing Jacob Louis Hoover, Morgan Sonderegger, Steven T. Piantadosi, and Timothy J. O’Donnell Open Mind: Discoveries in Cognitive Science. 350–391. Jul, 2023.
[DOI] [preprint] [code repository] [☛ surprisal explorer]
bib
@article{hoover-etal-2023-plausibility, title = {The Plausibility of Sampling as an Algorithmic Theory of Sentence Processing}, author = {Hoover, Jacob Louis and Sonderegger, Morgan and Piantadosi, Steven T. and O'Donnell, Timothy J.}, year = {2023}, month = jul, journal = {Open Mind: Discoveries in Cognitive Science}, volume = {7}, pages = {350--391}, issn = {2470-2986}, doi = {10.1162/opmi_a_00086}, urldate = {2023-07-21}, copyright = {All rights reserved}, pmcid = {PMC10449406} }
abstract

Words that are more surprising given context take longer to process. However, no incremental parsing algorithm has been shown to directly predict this phenomenon. In this work, we focus on a class of algorithms whose runtime does naturally scale in surprisal—those that involve repeatedly sampling from the prior. Our first contribution is to show that simple examples of such algorithms predict runtime to increase superlinearly with surprisal, and also predict variance in runtime to increase. These two predictions stand in contrast with literature on surprisal theory (Hale, 2001; Levy, 2008) which assumes that the expected processing cost increases linearly with surprisal, and makes no prediction about variance. In the second part of this paper, we conduct an empirical study of the relationship between surprisal and reading time, using a collection of modern language models to estimate surprisal. We find that with better language models, reading time increases superlinearly in surprisal, and also that variance increases. These results are consistent with the predictions of sampling-based algorithms.
Measuring Morphological Fusion Using Partial Information Decomposition Michaela Socolof, Jacob Louis Hoover, Richard Futrell, Alessandro Sordoni, and Timothy J. O’Donnell In Proceedings of the 29th International Conference on Computational Linguistics (COLING). 44–54. International Committee on Computational Linguistics. 12 Oct, 2022.
[link]
bib
@inproceedings{socolof.m:2022coling, title = {Measuring Morphological Fusion Using Partial Information Decomposition}, booktitle = {Proceedings of the 29th {{International Conference}} on {{Computational Linguistics}} ({{COLING}})}, author = {Socolof, Michaela and Hoover, Jacob Louis and Futrell, Richard and Sordoni, Alessandro and O'Donnell, Timothy J.}, year = {2022}, month = oct, day = {12}, pages = {44--54}, publisher = {International Committee on Computational Linguistics}, location = {Gyeongju, Republic of Korea}, url = {https://aclanthology.org/2022.coling-1.5}, eventtitle = {{{COLING}} 2022} }
abstract

Morphological systems across languages vary when it comes to the relation between form and meaning. In some languages, a single meaning feature corresponds to a single morpheme, whereas in other languages, multiple meaning features are bundled together into one morpheme. The two types of languages have been called agglutinative and fusional, respectively, but this distinction does not capture the graded nature of the phenomenon. We provide a mathematically precise way of characterizing morphological systems using partial information decomposition, a framework for decomposing mutual information into three components: unique, redundant, and synergistic information. We show that highly fusional languages are characterized by high levels of synergy.
With Better Language Models, Processing Time Is Superlinear in Surprisal Jacob Louis Hoover, Morgan Sonderegger, Steven T. Piantadosi, and Timothy J. O’Donnell Poster at Architectures and Mechanisms for Language Processing (AMLaP 28). York, England. 6 Sep, 2022.
[link] [poster]
bib
@misc{hoover-etal-2022-amlap, type = {Poster}, title = {With Better Language Models, Processing Time Is Superlinear in Surprisal}, author = {Hoover, Jacob Louis and Sonderegger, Morgan and Piantadosi, Steven T. and O'Donnell, Timothy J.}, year = {2022}, howpublished = {Poster at Architectures and Mechanisms for Language Processing (AMLaP 28)}, day = {6}, month = sep, address = {{York, England}}, url = {https://virtual.oxfordabstracts.com/#/event/3067/submission/297} }
Linguistic Dependencies and Statistical Dependence Jacob Louis Hoover, Wenyu Du, Alessandro Sordoni, and Timothy J. O’Donnell In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2941–2963. Association for Computational Linguistics. Online and Punta Cana, Dominican Republic. Nov, 2021.
[link] [poster] [slides] [code repository]
bib
@inproceedings{hoover-etal-2021-emnlp, address = {Online and Punta Cana, Dominican Republic}, author = {Hoover, Jacob Louis and Du, Wenyu and Sordoni, Alessandro and O{'}Donnell, Timothy J.}, booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing}, month = nov, pages = {2941--2963}, publisher = {Association for Computational Linguistics}, title = {Linguistic Dependencies and Statistical Dependence}, url = {https://aclanthology.org/2021.emnlp-main.234}, year = {2021} }
abstract

Are pairs of words that tend to occur together also likely to stand in a linguistic dependency? This empirical question is motivated by a long history of literature in cognitive science, psycholinguistics, and NLP. In this work we contribute an extensive analysis of the relationship between linguistic dependencies and statistical dependence between words. Improving on previous work, we introduce the use of large pretrained language models to compute contextualized estimates of the pointwise mutual information between words (CPMI). For multiple models and languages, we extract dependency trees which maximize CPMI, and compare to gold standard linguistic dependencies. Overall, we find that CPMI dependencies achieve an unlabelled undirected attachment score of at most ≈0.5. While far above chance, and consistently above a non-contextualized PMI baseline, this score is generally comparable to a simple baseline formed by connecting adjacent words. We analyze which kinds of linguistic dependencies are best captured in CPMI dependencies, and also find marked differences between the estimates of the large pretrained language models, illustrating how their different training schemes affect the type of dependencies they capture.
Accounting for Variation in Number Agreement in Icelandic Dative-Nominative Constructions Jacob Louis Hoover In Proceedings of the 38th West Coast Conference on Formal Linguistics. Ed. Rachel Soo, Una Y. Chow, Sander Nederveen. 231–241. Cascadilla Proceedings Project. Somerville, Mass., USA. Oct, 2021.
[link] [handout] [pdf]
bib
@inproceedings{hoover-2020-icelandic, address = {Somerville, Mass., USA}, author = {Hoover, Jacob Louis}, booktitle = {Proceedings of the 38th West Coast Conference on Formal Linguistics}, editor = {Rachel Soo, Una Y. Chow, Sander Nederveen}, month = oct, pages = {231--241}, publisher = {Cascadilla Proceedings Project}, title = {Accounting for Variation in Number Agreement in Icelandic Dative-Nominative Constructions}, url = {http://www.lingref.com/cpp/wccfl/38/abstract3568.html}, year = {2021} }
abstract

Icelandic dative-nominative constructions exhibit a syntactic hierarchy effect known as the Person Restriction: only third person nominatives may control agreement. In these constructions, there is variation between speakers in the extent to which the verb agrees with the nominative for number. Sigurðsson & Holmberg (2008) explain this variation as arising due to differences between varieties in the timing of subject raising, using a split phi-probe. This paper revises their approach, using the feature gluttony mechanism for Agree developed in Coon & Keine (2020), and a split phi-probe in which person probing precedes number probing. Within this framework, the observed variation can be captured by allowing variability two independent parameters: the timing of EPP subject raising, and the visibility of a number feature on dative DPs. The proposed mechanism describes the variation, including predicting the observed optional agreement in certain cases that previous literature had struggled to account for, and makes additional predictions about the differences between varieties in cases of syncretism within the verbal paradigm. An investigation into these predictions should allow this already well-studied area of Icelandic grammar to continue to be a useful test-case for crosslinguistic assumptions about the mechanism of Agree, and the status of dative arguments.

Note on my name:
My surname is Vigly. Prior to September 2024, my surname was Hoover, which is now a middle name.