research
Broadly, my research aims to understand the variation in children’s early language environments, and its effects on children’s language abilities. Below are a few key research areas and some representative projects.
characterising early language experience
What kinds of language inputs do children receive? How do they vary within and across children? I am interested in collecting naturalistic data from young children and analysing the distributions of such data.
- What are the actual audiovisual experiences of young children? How can we quantify and qualify their input “in the wild”?
- Long*, Xiang*, Stojanov*, Sparks, Yin, Keene, Tan, Feng, Zhuang, Marchman, Yamins, & Frank (2024). “The BabyView dataset: High-resolution egocentric videos of infants’ and young children’s everyday experiences.” arXiv preprint. [paper]
- Sparks, Long, Keene, Perez, Tan, Marchman, & Frank (2024). “Characterizing contextual variation in children’s preschool language environment using naturalistic egocentric videos.” CogSci Proceedings. [paper]
- Shared book reading is a particularly rich source of children’s language input. How do children’s books differ from child-directed speech?
- Dawson, Hsiao, Tan, Banerji, & Nation (2021). “Features of lexical richness in children’s books: Comparisons with child-directed speech.” Language Development Research. [paper]
- Tan, Read, Gamboa, Bang, & Marchman (in prep.). “The power of the page: Comparing richness in text and talk during book sharing with two-year old children.”
early word learning across linguistic environments
What are the cross-linguistic patterns in word learning? How are early vocabularies shaped by experiences of multilingualism? I use statistical modelling to understand word- and child-level predictors of word learning.
- How do the different languages heard by a bilingual child affect their language learning?
- Tan, Marchman, & Frank (2024). “The role of translation equivalents in bilingual word learning.” Developmental Science. [paper]
- Tan & Frank (2024). “Syntactic category bias in early bilingual vocabularies.” Bay Area Developmental Symposium. [slides]
- Tan, Kachergis, Marchman, Frank, Mayor, et al. (in progress). “Exploring the relationship between language exposure and vocabulary in bilingual children.”
- What does word learning look like cross-linguistically? What are the consistencies and variations in early vocabulary across languages?
- Tan*, Loukatou*, Braginsky, Mankewitz, & Frank (2024). “Predicting ages of acquisition for children’s early vocabulary across 27 languages and dialects.” CogSci Proceedings. [paper]
- Tan*, Kachergis*, Marchman, Dale, & Frank (2023). “Measuring children’s early vocabulary in low-resource languages using a Swadesh-style word list.” CogSci Proceedings. [abs]
machine learning models as cognitive models
The process of language learning is difficult to model, but recent advances in machine learning have given rise to a potential approach requiring few inductive biases. Can we use machine learning models as plausible models of language learning in young children?
- How do we evaluate the closeness of a vision–language model to the process of human language development?
- Tan, Yu, Long, Ma, Murray, Silverman, Yeatman, & Frank (2024). “DevBench: A multimodal developmental benchmark for language learning.” NeurIPS Proceedings. [paper]
- Can we train vision–language models on naturalistic developmental training data?
- Tan, Hu, Long, & Frank (in progress). “Training vision–language models from the child’s perspective.”
open science, meta-science, and big team science
I believe that science is best advanced through information sharing and collaboration, and have worked on several open data repositories and large-scale collaborative endeavours.
- What does it look like to aggregate data from various contributors into a centralised open data repository?
- How do we leverage big team science to work on large-scale distributed projects?
- ManyBabies, a consortium of multi-lab replication efforts for key developmental science findings.
- The Replication Database, a community crowdsourced database for replication studies.