9  Linguistics in the 21st century

9.1 Introduction

As linguistics has developed in sophistication, it has also grown in breadth, and a more expansive set of methodologies have arisen as a result of investigation into increasingly diverse facets of the phenomenon of language. This trend has continued through the end of the twentieth century and into the beginning of the twenty-first. Many directions in linguistics research have emerged and grown as a result of broadening sources of data, an increasing emphasis on empiricism, and further applications of linguistics.

9.2 Broader sources of language data

It is interesting to note that linguistics has yet to exhaust all potential sources of data. Some such data come from languages or dialects that have only been recently documented, or from new archaeological evidence regarding historical language forms. The advent of technologies such as voice and video recording and online information sharing has also enabled linguisticians to work on high-quality raw data even if they are not personally collecting such data in the field.

A particularly noteworthy source of language data is from sign languages. Historically, these were often considered to be inferior to spoken or written language, resulting in little work in data collection and analysis. Gradually, serious investigation into sign languages resulted in the observation that they are as robust, complex, and productive as spoken languages (Klima & Bellugi, 1979). Subsequent sign language linguistics has demonstrated that sign languages share many characteristics of spoken language, including multiple levels of organization (Stokoe, 1960), arbitrariness of signs (Johnston, 1989), prosodic information (Wilber, 2000), and a critical age for learning (Mayberry, 1998). However, there are also ways in which sign languages raise unique and important questions in linguistics. For example, sign languages seem to exhibit grammatical similarities that are as yet unexplainable (since many of them have developed independently) and simultaneously not shared by spoken languages (Sandler & Lillo-Martin, 2017); this finding may provide insight into the role of iconicity in language, as well as the nature of Universal Grammar. The notion of sign space and how it is used is also particularly interesting, since the manual modality permits simultaneity and motion dynamics to a much greater degree than spoken languages (e.g., Perniss, 2020; Wilcox & Martínez, 2020). Furthermore, new sign languages continue to appear in isolated village populations (Meir et al., 2010), providing case studies for investigations into language origins and development. Particularly interesting is Nicaraguan Sign Language, which emerged largely spontaneously among Nicaraguan deaf students in the 1980s (Senghas et al., 2005). More generally, the study of sign language provides linguisticians with an opportunity to study universal properties of language that are not dependent on the modality of transmission.

Another emerging source of novel language data is the Internet, and this medium has brought a variety of new perspectives in linguistics (Crystal, 2005). In particular, the Internet has introduced large-scale near-instantaneous connectivity among many spatially disparate groups of people, resulting in a host of phenomena specific to the Internet, including medium-specific codeswitching and multilingualism, rapid language change, Internet lingo and subculture jargon, stylistic diffusion, and metalinguistics around online language use (Androutsopoulos, 2011; Gawne & Vaughan, 2012; e.g., Thurlow, 2001). The variety of communication platforms (text messaging, online chat, social media, emails, blogs) has also resulted in varied and nuanced stylistics in different contexts (Crystal, 2011). Additionally, the ability to transmit more than pure text has resulted in new modes of communication such as typography, emoji, and memes, each with their own conventions and grammars (McCulloch, 2019). The accessibility of the Internet also means that such data are usable for large-scale corpus analyses, and can be studied in real time. These characteristics make Internet linguistics a promising direction for language research.

These novel types of language data thus provide an interesting contrast to current sources, and can thereby bring a new perspective to critical issues about the nature and use of language. In conjunction with greater quantities of language data (in well-documented and parsed corpora) and better qualities of language data (through improved recording techniques), such data permit linguisticians to conduct more thorough and comprehensive analyses to understand general language processes and phenomena.

9.3 Increasing empiricism

One significant contribution of the cognitive turn is the expansion of descriptive linguistics, since putative features of Universal Grammar have to (by definition) account for features of all extant human languages (Baker, 2021). This push toward data collection and analysis has continued and expanded beyond the generative tradition, resulting in an emerging set of methods that aim to ground linguistics in more quantitative metrics, as opposed to the qualitative descriptions that have typically undergirded formal linguistics. Some such methods include formal acceptability judgements, corpus analysis, and psycho- and neurolinguistics.

Acceptability judgements have formed an important bedrock for theoretical linguistics. However, these judgements have typically been informal in nature, originating either in the linguistician’s own introspection or through informal elicitation from a small number of informants; as such, they are susceptible to various biases and measurement errors (Juzek, 2015). Instead, such acceptability judgements can be formalised through the collection of explicit ratings from a larger group of informants that can be aggregated to reduce noise in the measurement (Myers, 2017; Sprouse, 2018). Studies using formal acceptability judgements have allowed for the discovery of subtler effects that may not have been found with more informal judgements (Myers, 2017), and have even challenged previous grammaticality claims (Linzen & Oseki, 2018). Crucially, formal acceptability judgements have highlighted the gradient nature of acceptability, rather than the binary notion of grammaticality typically assumed in theoretical linguistics (Juzek, 2015; Sprouse, 2007). This gradience is challenging for many theories of phonology, morphology, and syntax, and has encouraged the investigation of more probabilistic approaches toward grammar (e.g., Boersma & Hayes, 2001; Lau et al., 2017). More recently, data from formal acceptability judgements have also permitted finer-grained evaluations of computational models of language (e.g., Juzek, 2024; Tjuatja et al., 2024). This approach is limited by the fact that researchers have to pre-select items to test, and have to gather more informants; as such, it is more difficult to conduct formal judgements in field contexts where prior linguistic information and access to informants are less available. Nonetheless, formal acceptability judgements have been increasingly viewed as a key source of quantitative, psychometrically valid data, and their application across a greater range of languages will help linguisticians better understand nuances of grammaticality and acceptability.

Another way to investigate language users’ intuitions about language is to directly analyse their productions. This approach avoids issues with language users having to use metalinguistic judgement (e.g., that acceptability judgements reflect norms; Haspelmath, 2020). Corpus analysis adopts this perspective to study distributional statistics across large bodies of collected text that reflect actual language use (Biber, 2006), which may include language of various origins (e.g., books, transcribed speech, web text; O’Keeffe & McCarthy, 2010). Corpora allow for linguisticians to estimate the frequencies of occurrences or co-occurrences of different linguistic elements (Gries, 2009); the particular advantage of having large corpora is that it allows for the detection of low frequency constructions, as well as the comparison of the relative frequencies of different constructions (e.g., alternative forms). Corpus analysis has also supported functionalist linguistics, particularly in investigations of possible functional explanations (e.g., semantics) for the observed variation in forms (McEnery & Hardie, 2013). New and developing methods for data collection (e.g., web scraping; M. Davies, 2016–) and data annotation (e.g., Qi et al., 2020; Straka et al., 2016) will enable more sophisticated analyses over larger and more representative amounts of data, although work is still needed to diversify the languages and populations captured by such corpora (see Dunn & Adams, 2019).

A third source of quantitative data about language use comes from the field of psycholinguistics and its sister field of neurolinguistics. These fields focus on the actual processing of language, and thus need to account for performance factors. Hence, psycholinguistics is concerned with issues such as language acquisition, language production, and language comprehension. This has resulted in a variety of methodological innovations, such as artificial language paradigms to investigate statistical learning (Saffran et al., 1996), priming and prediction to investigate top-down processing (Altmann & Kamide, 1999), lexical decision tasks to investigate lexical access (Fischler, 1977), and structural ambiguity to investigate online syntactic parsing (Frazier, 1978). Developmental linguistics has also drawn from experimental paradigms from developmental psychology, including paradigms that use looking time as a measure of attention (and thus novelty or familiarity) (Kuhl, 2004). Such research has also been supplemented with neurophysiological techniques, such as deficit–lesion correlation studies in language disorders (Geshwind, 1965; Luria, 1970), and functional neuroimaging to study the spatial and temporal organisation of language processing in the brain (Fedorenko et al., 2024; Frederici, 2002; Gwilliams et al., 2024; Hickok & Poeppel, 2007; Kutas & Hillyard, 1984; Shain, 2021). These techniques have illuminated the cognitive and neural mechanisms supporting language functions in language users, which also in turn constrain plausible theories of language.

Data from judgements, productions, and experiments have thus enriched the body of evidence that can be used to assess linguistic theory. The increasing emphasis on grounded, data-driven linguistics also represents a shift away from the more rationalist approach of early generativism (McEnery & Hardie, 2013)—a greater concern for language as it is actually used, as opposed to an idealised, abstracted notion of language. The expansion of techniques also allows for more multifaceted methodologies that aim toward convergent evidence, ensuring that linguistic theories are more robust and sophisticated.

9.4 Further applications of linguistics

Applied linguistics is an interdisciplinary domain aiming to mediate between theory and practice to address problems related to language (Buckingham & Eskey, 1980). Thus, applied linguisticians consider not only language and its use, but also contexts, social institutions, cultures, and worldviews regarding language (Rees-Miller, 2001). Clearly, there are many potential subfields that may fall within this domain, and it will only be possible to present a cursory overview of some of these research directions.

The earliest subfield characterised as applied linguistics is that of language teaching, typically as related to second language acquisition (since first languages are acquired naturally without little explicit instruction). A particularly important model (which remains dominant today) is communicative language teaching, which conceptualises language not just as an individual’s cognitive skill, but as a communicative system that relies on social use (recalling functionalism and Hymes’ notion of communicative competence; Sauvignon, 2001). This suggests that students learn language best through interaction, which can be implemented in the form of experiential activities and tasks requiring language use for a communicative purpose (Norris, 2011); as such, classes make use of authentic texts (i.e., not designed specifically for non-native speakers), and the target language is used almost exclusively. An important theoretical idea in this regard is that of comprehensible input, or input that is slightly above learners’ abilities but is nonetheless understandable (Krashen & Terrell, 1988). These concepts have resulted in more robust, integrative, and relevant language pedagogies, although there remain important questions about the need for direct grammar instruction and the efficacy of specific curricula.

Another application of linguistics relates to language planning and policy, which is concerned with efforts to modify language practices and beliefs within a community (Spolsky, 2012). Within this field, there are three key areas, namely status planning (about uses of language, e.g., national and official languages), corpus planning (about language itself, e.g., orthographic standardisation), and acquisition planning (about users of language, e.g., education policies; Cooper, 1989). While earlier research in this field focused on issues around language policy in newly independent post-colonial countries, later work began focusing on broader and more general issues, particularly drawing from critical theory (Tollefson, 2016). This latter approach considers the social, economic, and political effects of language planning, such as the linguistic imperialism of using English as a lingua franca (Linguistic Society of America, 2025; Phillipson, 1992) and the inequalities related to indigenous language revitalisation (Coronel-Molina & McCarty, 2016; Duchêne & Heller, 2007). This field has helped to shape and inform language planning both from the top-down and the bottom-up, especially in areas with language minorities, and has also pushed for greater scrutiny on how the various levels of implementation of language policy affect the broader social environment.

Other applied fields include clinical linguistics (related to speech–language pathology and therapy; Crystal, 1981), legal linguistics (related to language use in written law, legal processes, and forensic evidence; Durant & Leung, 2015), and translation (Baker, 1997). Given that language is employed in communications across effectively all domains, it is unsurprising that such a range of applications is possible, and indeed will continue to expand as the relationships between language and other areas of life are examined. Indeed, applied linguistics has also raised new questions which has driven subsequent theoretical research (e.g., the concept of “nativeness” in language proficiency; A. Davies, 2007), and such bidirectional information flow thus enables both to benefit from each other’s research directions.

9.5 The future of linguistics

Given these exciting developments in linguistics, how will linguistics continue to evolve in the coming years? There are three important motivations in contemporary society that I believe will shape the linguistics of our time (see also Mansfield & Wilcox, 2025). The first is technology: Advancements in computing, artificial intelligence, and telecommunications continue to bring new challenges to the use of language across various media and platforms, and the rapid pace of research in fields like quantum computing and machine learning will inevitably affect the sphere of linguistics. The second is inequality: Greater recognition of social, political, and linguistic inequality continues to drive discourse about language use, multilingualism, language pedagogy, language attitudes, language policy, and language documentation. The third is methodology: The increasing emphasis on rigour in the social sciences will encourage linguistics to move towards greater empiricism, relying on large-scale statistical corpus analyses, experimental approaches, and better open science practices to improve the reliability of the data and theories within linguistics.

9.6 Conclusion

As linguistics heads into the twenty-first century, it continues to grow in scope and breadth, as new evidence, ideas, methods, and research directions are incorporated into the field. This interdisciplinarity is a particular strength of linguistics, which recognises the nuanced and multifaceted nature of the phenomenon known as language. Such diversity also means that any individual dimension of language cannot be considered purely in isolation; rather, a holistic view requires multiple perspectives and approaches. Cross-domain work will also help to clarify the relationships between different subfields of linguistics, thereby improving our understanding of linguistics itself. There remains much more work to be done on language and on linguistics, and it will be exciting to see what the remainder of the twenty-first century has in store for the study of language.