8 Computational approaches

8.1 Introduction

One of the more recent perspectives on language has viewed it as information. This treatment arose initially from the field of information theory (Shannon, 1948), which used a mathematical lens to view communication as a means of sending information from a sender to a receiver, subject to constraints on the communication system (e.g., its channel capacity or noise). More broadly, the application of computational approaches to linguistics has seen an exponential growth in interest in the last half-century, from early efforts in machine translation for military intelligence applications (Hutchins, 1999) to recent sophisticated chatbots (Open AI, 2024). Much has been written elsewhere on the history of computational linguistics and natural language processing (e.g., Johri et al., 2021; K. S. Jones, 1994; Schubert, 2020); here we focus on surveying some theoretical and philosophical issues regarding such approaches.

8.2 Why computational modelling?

Computational approaches afford a few particular advantages given the methods used to construct, fit, and employ computational models. The first is formalisation: since computational models require the operationalisation of constructs related to language, they require an explicit quantification of language how it is processed and/or acquired, rather than relying on verbal theory. Such formalisation is useful because it allows for the instantiation and evaluation of proposed mechanisms of language processing and acquisition, demonstrating how these mechanisms can (or cannot) explain the observed variation in actual human language. For example, computational models have been used to explain how humans handle communication in settings with noise or errors (Gibson et al., 2013; Levy, 2008), how children acquire regular and irregular past tense forms in English (Plunkett & Juola, 1999; Rumelhart & McClelland, 1987), and how unexpected words slow down reading speed (Oh & Schuler, 2023; Wilcox et al., 2023). By investiating input–output correspondences in these computational models, linguisticians can validate theories of language use, but also conduct experiments that may not be possible on humans (e.g., controlled rearing studies, Christiansen & Chater, 1999), or search through a larger parameter space for optimal experiment design (Huan et al., 2024).

Another benefit of computational approaches is the ability to handle large volumes of data. Continued advancements in corpus collection has vastly increased the amount of available language data (e.g., Common Crawl, 2025), which would be intractable to manually annotate. The use of computational models allows for the automatic processing and annotation of such data (e.g., Qi et al., 2020; Straka et al., 2016), permitting much larger-scale analyses and possibly the detection of lower-frequency constructions or phenomena with smaller effect sizes, which may not have otherwise appeared in smaller datasets (e.g., Roland et al., 2007).

A third contribution of computational methods is that they can represent the rich, high-dimensional nature of language. One significant advance is the shift towards sub-symbolic representations of language, especially distributional semantics, which suggests that word meanings can be elucidated from the contexts in which that word appears (Firth, 1957). Hence, word meanings can be represented as vectors or embeddings, which capture statistical patterns of the contexts in which the word occurs (e.g., Mikolov et al., 2013); this approach stands in stark contrast with formal symbolic theories of semantics, in which it is difficult to express a comprehensive description of meaning that can account for the entire lexicon. The distributed representations of meanings allows them to be arbitrarily composed mathematically, and can also serve as numerical representations for other kinds of operations (including those in modern neural network models). Furthermore, embeddings appear to have properties which align with humans’ linguistic representations (Grand et al., 2022), suggesting that they do in fact capture relevant dimensions of variance in semantics. We can also probe the internal representations of language models to determine how much semantic information is accessible from purely linguistic information—for example, it is possible to read out human colour perceptions (Marjieh et al., 2024) and cyclic representations of time (Engels et al., 2024) as emergent properties of language model representations.

Broadly, the quantitative nature of computational methods has enabled mechanistic, large-scale, robust, and sophisticated analyses of language that would be difficult to conduct otherwise. It is important to note that these characteristics may not apply to every computational approach—for example, modern language models are often difficult to interpret mechanistically (but see Rai et al., 2024). Nonetheless, these tools have provided us with new insights into the structure and usage of language.

8.3 The push towards language modelling

We can also approach the question of computational linguistics from the opposite angle: What makes language a good target for computational approaches? Some possible responses are clear, including the fact that language is essential for human communication, and that it is ubiquitous and thus has a large quantity of potentially available data. There are several other features that make language learning an interesting problem for computational approaches. First, it appears to be effectively universal across humans (barring developmental difficulties), and learnt early and without much explicit instruction—recall that these are the same arguments initially used to support Universal Grammar. That language is so pervasive is a good indicator that progress in machine use of language would be very useful for many applications. On the other hand, language appears to be difficult to learn and represent from a formal perspective. For example, early research into machine translation quickly revealed that it is not as straightforward as had been assumed, particularly due to non-linearities in the information (e.g., hierarchical grammatical structure, differing categorisations of semantic space, and information structure); thus, early symbolic approaches were relatively limited in what they could accomplish (e.g., Weizenbaum, 1966). Hence, natural language processing has emerged as an important challenge task for computational approaches.

Progress in language modelling has often been driven by difficult aspects of language representation and usage. For example, the streamed, linear format of language contrasts with the static, single-snapshot format of vision or other modalities of data; as such, handling complex time series information is necessary for language modelling, and drove early neural network approaches for handling dynamic data, including recursive neural networks (e.g., Costa et al., 2003). Language also exhibits long-distance dependencies (whether the narrowly-defined grammatical phenomenon, or more general informational dependencies), which was one of the impetuses for the development of attentional mechanisms, such that computations involving later words can “attend” more or less to earlier words depending on relevance (Vaswani et al., 2023). More recent approaches have also emphasised the importance of multimodal grounding in semantics and natural language understanding (Radford et al., 2021), as well as the distinction between truthfulness and usefulness in language use (Ouyang et al., 2022).

Furthermore, the modelling of “language” in fact encompasses a very large range of phenomena and capacities. These phenomena include traditional topics in linguistic analyses, including grammatical parsing (e.g., Bai et al., 2023; Vinyals et al., 2015), reference resolution (e.g., Moniz et al., 2024), natural language inference (e.g., Gubelmann et al., 2024), language acquisition (e.g., Elman, 1993; Wang et al., 2023), and the distinction between formal and functional competence (e.g., Mahowald et al., 2024). Computational approaches to language have also addressed issues related to different modalities of language data, including speech recognition (e.g., Dahl et al., 2012; Radford et al., 2022) and optical character recognition (e.g., Poznanski et al., 2025), or even further afield to decoding neural representations of language (e.g., Défossez et al., 2023; Hong et al., 2024). The diversity of potential target phenomena have driven a corresponding expansion in the methods and techniques employed under the broad umbrellas of computational linguistics and natural language processing, and continue to encourage innovation in contemporary computational approaches.

8.4 Philosophical issues in computational linguistics

The computational modelling of language has always been associated with corresponding philosophical issues related to these models. Turing famously introduced the idea of the Turing test, which suggests that a machine can be considered intelligent if a human interrogator is unable to distinguish between it and another human (Turing, 1950). This test is also related to Searle’s Chinese room thought experiment (Searle, 1980), which (contra Turing) suggests that it is possible for a person in a room to follow a set of instructions for constructing appropriate responses to inputs given in Chinese, even if they do not understand Chinese themself. Hence, the Turing test is too crude to determine understanding. These arguments have been naturally extended to modern large language models (LLMs), which do exhibit language performance sophisticated enough to ostensibly pass some Turing tests (C. R. Jones & Bergen, 2024).

Linguisticians have taken up a very broad range of perspectives on the modern version of this debate—that is, whether LLMs can tell us anything about linguistics. Some researchers believe that they cannot, largely because the context in which LLMs learn and use language is qualitatively different from humans, who use different mechanisms for learning, have much less input data, and are embodied in a multisensory, social environment that drives true meaning-making (e.g., Bender et al., 2021; Bender & Koller, 2020; Bolhuis et al., 2024; Gomes, 2024; Kodner et al., 2023). Under this view, the inherent differences between human and machine learning imply that language models cannot truly serve as effective models of language learning and use. However, a key under-addressed issue is the validity of the assumptions made—for example, do models in fact require human-like learning mechanisms in order to be effective models of language? Given that modern LLMs do show relatively sophisticated language behaviour, it seems plausible to posit that even “unnatural” learning mechanisms can extract meaningful structural features of language, such that these models remain interesting artifacts for investigation, especially since they permit analyses that would not be possible with humans.

A much more bullish perspective on LLMs is that they can themselves serve as theories of language, which may even surpass traditional linguistic theories, since they provide more accurate predictions about language behaviour in humans (e.g., Baroni, 2022; Piantadosi, 2024). While LLMs do indeed have increasingly strong predictive power, they lack explanatory power, since they only provide descriptions either at a very high, abstract level (e.g., regarding phenomena), or at a very low, implementational level (e.g., regarding statistical learning), neither of which are useful in providing interpretable, analytical explanations of linguistic phenomena (see Opitz et al., 2025).

In contrast with both of these more extreme perspectives, a growing group of researchers have laid out something of a via media: language models can serve as interesting ways to probe and evaluate linguistic theories, even if they do not serve as complete theories themselves (e.g., Binz et al., 2025; Frank & Goodman, 2025; Futrell & Mahowald, 2025; Mansfield & Wilcox, 2025; Millière, 2024; Pater, 2019; Portelance & Jasbi, 2024). Two ideas are key in this regard. The first is representations: probing the internal representations of LLMs allows us to understand what kinds of representations are able to support complex language behaviour (see Tosato et al., 2024). For example, language models appear to encode hierarchical syntactic information (Rogers et al., 2020) as well as syntactic relations (Diego-Simón et al., 2024), suggesting that such representations are important for appropriate language production, as opposed to merely operating over linear positional features. Another key idea is learnability: understanding what can be acquired by language models reflects the inductive biases that may or may not be necessary for language learning in humans. A recent line of work has demonstrated that actual human languages are easier for LLMs to learn than implausible languages (e.g., with inconsistent word order; Kallini et al., 2024; Xu et al., 2025; Yang et al., 2025), refuting the supposition that language models are able to learn any arbitrary language (Moro et al., 2023), and conversely suggesting that structural regularities in the input are crucial for a language to be learnable—even for learning algorithms like statistical learning. This moderate perspective draws connections among symbolic linguistic theory, information theory, and language modelling, allowing for more multifaceted approaches toward understanding language.

8.5 Conclusion

Computational approaches have received a lot of attention in recent years, and much debate continues on their application and implications for linguistics. Nonetheless, it is exciting that these approaches have permitted many analyses which were hitherto impossible, and it will be interesting to observe how this young field continues to develop and mature over time, through both technical and methodological improvements, as well as continued theoretical and philosophical discussion.

Bai, X., Wu, J., Chen, Y., Wang, Z., & Zhang, Y. (2023, October 31). Constituency Parsing using LLMs. https://doi.org/10.48550/arXiv.2310.19462

Baroni, M. (2022, March 24). On the proper role of linguistically-oriented deep net analysis in linguistic theorizing. https://doi.org/10.48550/arXiv.2106.08694

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. https://doi.org/10.1145/3442188.3445922

Bender, E. M., & Koller, A. (2020). Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. In D. Jurafsky, J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 5185–5198). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.463

Binz, M., Alaniz, S., Roskies, A., Aczel, B., Bergstrom, C. T., Allen, C., Schad, D., Wulff, D., West, J. D., Zhang, Q., Shiffrin, R. M., Gershman, S. J., Popov, V., Bender, E. M., Marelli, M., Botvinick, M. M., Akata, Z., & Schulz, E. (2025). How should the advancement of large language models affect the practice of science? Proceedings of the National Academy of Sciences, 122(5), e2401227121. https://doi.org/10.1073/pnas.2401227121

Bolhuis, J. J., Crain, S., Fong, S., & Moro, A. (2024). Three reasons why AI doesn’t model human language. Nature, 627(8004), 489–489. https://doi.org/10.1038/d41586-024-00824-z

Christiansen, M. H., & Chater, N. (1999). Toward a Connectionist Model of Recursion in Human Linguistic Performance. Cognitive Science, 23(2), 157–205. https://doi.org/10.1207/s15516709cog2302_2

Common Crawl. (2025). Common Crawl - Open Repository of Web Crawl Data [Dataset]. https://commoncrawl.org/

Costa, F., Frasconi, P., Lombardo, V., & Soda, G. (2003). Towards Incremental Parsing of Natural Language Using Recursive Neural Networks. Applied Intelligence, 19(1), 9–25. https://doi.org/10.1023/A:1023860521975

Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 30–42. https://doi.org/10.1109/TASL.2011.2134090

Défossez, A., Caucheteux, C., Rapin, J., Kabeli, O., & King, J.-R. (2023). Decoding speech perception from non-invasive brain recordings. Nature Machine Intelligence, 5(10), 1097–1107. https://doi.org/10.1038/s42256-023-00714-5

Diego-Simón, P., D’Ascoli, S., Chemla, E., Lakretz, Y., & King, J.-R. (2024, December 7). A polar coordinate system represents syntax in large language models. https://doi.org/10.48550/arXiv.2412.05571

Elman, J. L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48(1), 71–99. https://doi.org/10.1016/0010-0277(93)90058-4

Engels, J., Liao, I., Michaud, E. J., Gurnee, W., & Tegmark, M. (2024, May 23). Not All Language Model Features Are Linear. https://doi.org/10.48550/arXiv.2405.14860

Firth, J. R. (1957). A synopsis of linguistic theory 1930–1955. In Studies in Linguistic Analysis (pp. 1–32). Blackwell.

Frank, M. C., & Goodman. (2025, March 6). Cognitive modeling using artificial intelligence. https://doi.org/10.31234/osf.io/wv7mg_v1

Futrell, R., & Mahowald, K. (2025, January 28). How Linguistics Learned to Stop Worrying and Love the Language Models. https://doi.org/10.48550/arXiv.2501.17047

Gibson, E., Bergen, L., & Piantadosi, S. T. (2013). Rational integration of noisy evidence and prior semantic expectations in sentence interpretation. Proceedings of the National Academy of Sciences, 110(20), 8051–8056. https://doi.org/10.1073/pnas.1216438110

Gomes, V. (2024). Whither developmental psycholinguistics? Language Development Research, 5(1, 1). https://doi.org/10.34842/gomesllm

Grand, G., Blank, I. A., Pereira, F., & Fedorenko, E. (2022). Semantic projection recovers rich human knowledge of multiple object features from word embeddings. Nature Human Behaviour, 6(7), 975–987. https://doi.org/10.1038/s41562-022-01316-8

Gubelmann, R., Katis, I., Niklaus, C., & Handschuh, S. (2024). Capturing the Varieties of Natural Language Inference: A Systematic Survey of Existing Datasets and Two Novel Benchmarks. Journal of Logic, Language and Information, 33(1), 21–48. https://doi.org/10.1007/s10849-023-09410-4

Hong, Z., Wang, H., Zada, Z., Gazula, H., Turner, D., Aubrey, B., Niekerken, L., Doyle, W., Devore, S., Dugan, P., Friedman, D., Devinsky, O., Flinker, A., Hasson, U., Nastase, S. A., & Goldstein, A. (2024). Scale matters: Large language models with billions (rather than millions) of parameters better match neural representations of natural language. eLife, 13. https://doi.org/10.7554/eLife.101204.1

Huan, X., Jagalur, J., & Marzouk, Y. (2024). Optimal experimental design: Formulations and computations. Acta Numerica, 33, 715–840. https://doi.org/10.1017/S0962492924000023

Hutchins, J. (1999). Retrospect and prospect in computer-based translation. Proceedings of Machine Translation Summit VII, 30–44. https://aclanthology.org/1999.mtsummit-1.5/

Johri, P., Khatri, S. K., Al-Taani, A. T., Sabharwal, M., Suvanov, S., & Kumar, A. (2021). Natural Language Processing: History, Evolution, Application, and Future Work. In A. Abraham, O. Castillo, & D. Virmani (Eds.), Proceedings of 3rd International Conference on Computing Informatics and Networks (pp. 365–375). Springer. https://doi.org/10.1007/978-981-15-9712-1_31

Jones, C. R., & Bergen, B. K. (2024, May 9). People cannot distinguish GPT-4 from a human in a Turing test. https://doi.org/10.48550/arXiv.2405.08007

Jones, K. S. (1994). Natural Language Processing: A Historical Review. In A. Zampolli, N. Calzolari, & M. Palmer (Eds.), Current Issues in Computational Linguistics: In Honour of Don Walker (pp. 3–16). Springer Netherlands. https://doi.org/10.1007/978-0-585-35958-8_1

Kallini, J., Papadimitriou, I., Futrell, R., Mahowald, K., & Potts, C. (2024, August 2). Mission: Impossible Language Models. https://doi.org/10.48550/arXiv.2401.06416

Kodner, J., Payne, S., & Heinz, J. (2023, September). Why Linguistics Will Thrive in the 21st Century: A Reply to Piantadosi (2023). https://ling.auf.net/lingbuzz/007485

Levy, R. (2008). A Noisy-Channel Model of Human Sentence Comprehension under Uncertain Input. In M. Lapata & H. T. Ng (Eds.), Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (pp. 234–243). Association for Computational Linguistics. https://aclanthology.org/D08-1025/

Mahowald, K., Ivanova, A. A., Blank, I. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2024). Dissociating language and thought in large language models. Trends in Cognitive Sciences, 28(6), 517–540. https://doi.org/10.1016/j.tics.2024.01.011

Mansfield, J., & Wilcox, E. G. (2025, February 25). Looking forward: Linguistic theory and methods. https://doi.org/10.48550/arXiv.2502.18313

Marjieh, R., Sucholutsky, I., van Rijn, P., Jacoby, N., & Griffiths, T. L. (2024). Large language models predict human sensory judgments across six modalities. Scientific Reports, 14(1), 21445. https://doi.org/10.1038/s41598-024-72071-1

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013, September 7). Efficient Estimation of Word Representations in Vector Space. https://doi.org/10.48550/arXiv.1301.3781

Millière, R. (2024, August 13). Language Models as Models of Language. https://doi.org/10.48550/arXiv.2408.07144

Moniz, J. R. A., Krishnan, S., Ozyildirim, M., Saraf, P., Ates, H. C., Zhang, Y., & Yu, H. (2024, August 19). ReALM: Reference Resolution As Language Modeling. https://doi.org/10.48550/arXiv.2403.20329

Moro, A., Greco, M., & Cappa, S. F. (2023). Large languages, impossible languages and human brains. Cortex, 167, 82–85. https://doi.org/10.1016/j.cortex.2023.07.003

Oh, B.-D., & Schuler, W. (2023). Transformer-Based Language Model Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens. In H. Bouamor, J. Pino, & K. Bali (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 1915–1921). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-emnlp.128

Open AI. (2024, March 13). Introducing ChatGPT. https://openai.com/index/chatgpt/

Opitz, J., Wein, S., & Schneider, N. (2025, March 10). Natural Language Processing RELIES on Linguistics. https://doi.org/10.48550/arXiv.2405.05966

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022, March 4). Training language models to follow instructions with human feedback. https://doi.org/10.48550/arXiv.2203.02155

Pater, J. (2019). Generative linguistics and neural networks at 60: Foundation, friction, and fusion. Language, 95(1), e41–e74. https://muse.jhu.edu/pub/24/article/719231

Piantadosi, S. (2024). Modern language models refute Chomsky’s approach to language. In E. Gibson & M. Poliak (Eds.), From fieldwork to linguistic theory: A tribute to Dan Everett (pp. 353–414). Language Science Press. https://lingbuzz.net/lingbuzz/007180

Plunkett, K., & Juola, P. (1999). A connectionist model of English past tense and plural morphology. Cognitive Science, 23(4), 463–490. https://doi.org/10.1016/S0364-0213(99)00012-9

Portelance, E., & Jasbi, M. (2024). The Roles of Neural Networks in Language Acquisition. Language and Linguistics Compass, 18(6), e70001. https://doi.org/10.1111/lnc3.70001

Poznanski, J., Borchardt, J., Dunkelberger, J., Huff, R., Lin, D., Rangapur, A., Wilhelm, C., Lo, K., & Soldaini, L. (2025, February 25). olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models. https://doi.org/10.48550/arXiv.2502.18443

Qi, P., Zhang, Y., Zhang, Y., Bolton, J., & Manning, C. D. (2020). Stanza: A Python natural language processing toolkit for many human languages. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Annual Meeting of the Association for Computational Linguistics. https://nlp.stanford.edu/pubs/qi2020stanza.pdf

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021, February 26). Learning Transferable Visual Models From Natural Language Supervision. https://doi.org/10.48550/arXiv.2103.00020

Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022, December 6). Robust Speech Recognition via Large-Scale Weak Supervision. https://doi.org/10.48550/arXiv.2212.04356

Rai, D., Zhou, Y., Feng, S., Saparov, A., & Yao, Z. (2024, July 2). A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models. https://doi.org/10.48550/arXiv.2407.02646

Rogers, A., Kovaleva, O., & Rumshisky, A. (2020). A Primer in BERTology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics, 8, 842–866. https://doi.org/10.1162/tacl_a_00349

Roland, D., Dick, F., & Elman, J. L. (2007). Frequency of Basic English Grammatical Structures: A Corpus Analysis. Journal of Memory and Language, 57(3), 348–379. https://doi.org/10.1016/j.jml.2007.03.002

Rumelhart, D. E., & McClelland, J. L. (1987). Learning the past tenses of English verbs: Implicit rules or parallel distributed processing? In Mechanisms of language aquisition. (pp. 195–248). Lawrence Erlbaum Associates, Inc.

Schubert, L. (2020). Computational Linguistics. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy (Spring 2020). Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/spr2020/entries/computational-linguistics/

Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3(3), 417–424. https://doi.org/10.1017/S0140525X00005756

Shannon, C. E. (1948). A Mathematical Theory of Communication. The Bell System Technical Journal, 27, 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x

Straka, M., Hajič, J., & Straková, J. (2016). UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC ’16) (pp. 4290–4297). European Language Resources Association (ELRA). https://aclanthology.org/L16-1680/

Tosato, T., Notsawo, P. J. T., Helbling, S., Rish, I., & Dumas, G. (2024, July 5). Lost in Translation: The Algorithmic Gap Between LMs and the Brain. https://doi.org/10.48550/arXiv.2407.04680

Turing, A. (1950). Computer machinery and intelligence. Mind; a Quarterly Review of Psychology and Philosophy, 59(236), 433–460. https://doi.org/10.1093/mind/LIX.236.433

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023, August 2). Attention Is All You Need. https://doi.org/10.48550/arXiv.1706.03762

Vinyals, O., Kaiser, L., Koo, T., Petrov, S., Sutskever, I., & Hinton, G. (2015). Grammar as a foreign language. Advances in Neural Information Processing Systems, 28. https://papers.nips.cc/paper_files/paper/2015/hash/277281aada22045c03945dcb2ca6f2ec-Abstract.html

Wang, W., Vong, W. K., Kim, N., & Lake, B. M. (2023). Finding Structure in One Child’s Linguistic Experience. Cognitive Science, 47(6), e13305. https://doi.org/10.1111/cogs.13305

Weizenbaum, J. (1966). ELIZA: A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45. https://doi.org/10.1145/365153.365168

Wilcox, E. G., Pimentel, T., Meister, C., Cotterell, R., & Levy, R. P. (2023). Testing the Predictions of Surprisal Theory in 11 Languages. Transactions of the Association for Computational Linguistics, 11, 1451–1470. https://doi.org/10.1162/tacl_a_00612

Xu, T., Kuribayashi, T., Oseki, Y., Cotterell, R., & Warstadt, A. (2025, February 17). Can Language Models Learn Typologically Implausible Languages? https://doi.org/10.48550/arXiv.2502.12317

Yang, X., Aoyama, T., Yao, Y., & Wilcox, E. (2025, February 26). Anything Goes? A Crosslinguistic Study of (Im)possible Language Learning in LMs. https://doi.org/10.48550/arXiv.2502.18795