research

Broadly, my research aims to understand the variation in children’s early language environments, and its effects on children’s language abilities. Below are a few key research areas and some representative projects.

characterising early language experience

What kinds of language inputs do children receive? How do they vary within and across children? I am interested in collecting naturalistic data from young children and analysing the distributions of such data.

  • What are the actual audiovisual experiences of young children? How can we quantify and qualify their input “in the wild”?
    • Long*, Xiang*, Stojanov*, Sparks, Yin, Keene, Tan, Feng, Zhuang, Marchman, Yamins, & Frank (2024). “The BabyView dataset: High-resolution egocentric videos of infants’ and young children’s everyday experiences.” arXiv preprint. [paper]
    • Sparks, Long, Keene, Perez, Tan, Marchman, & Frank (2024). “Characterizing contextual variation in children’s preschool language environment using naturalistic egocentric videos.” CogSci Proceedings. [paper]
  • Shared book reading is a particularly rich source of children’s language input. How do children’s books differ from child-directed speech?
    • Dawson, Hsiao, Tan, Banerji, & Nation (2021). “Features of lexical richness in children’s books: Comparisons with child-directed speech.” Language Development Research. [paper]
    • Tan, Read, Gamboa, Bang, & Marchman (in prep.). “The power of the page: Comparing richness in text and talk during book sharing with two-year old children.”

early word learning across linguistic environments

What are the cross-linguistic patterns in word learning? How are early vocabularies shaped by experiences of multilingualism? I use statistical modelling to understand word- and child-level predictors of word learning.

  • How do the different languages heard by a bilingual child affect their language learning?
    • Tan, Marchman, & Frank (2024). “The role of translation equivalents in bilingual word learning.” Developmental Science. [paper]
    • Tan & Frank (2024). “Syntactic category bias in early bilingual vocabularies.” Bay Area Developmental Symposium. [slides]
    • Tan, Kachergis, Marchman, Frank, Mayor, et al. (in progress). “Exploring the relationship between language exposure and vocabulary in bilingual children.”
  • What does word learning look like cross-linguistically? What are the consistencies and variations in early vocabulary across languages?
    • Tan*, Loukatou*, Braginsky, Mankewitz, & Frank (2024). “Predicting ages of acquisition for children’s early vocabulary across 27 languages and dialects.” CogSci Proceedings. [paper]
    • Tan*, Kachergis*, Marchman, Dale, & Frank (2023). “Measuring children’s early vocabulary in low-resource languages using a Swadesh-style word list.” CogSci Proceedings. [abs]

machine learning models as cognitive models

The process of language learning is difficult to model, but recent advances in machine learning have given rise to a potential approach requiring few inductive biases. Can we use machine learning models as plausible models of language learning in young children?

  • How do we evaluate the closeness of a vision–language model to the process of human language development?
    • Tan, Yu, Long, Ma, Murray, Silverman, Yeatman, & Frank (2024). “DevBench: A multimodal developmental benchmark for language learning.” NeurIPS Proceedings. [paper]
  • Can we train vision–language models on naturalistic developmental training data?
    • Tan, Hu, Long, & Frank (in progress). “Training vision–language models from the child’s perspective.”

open science, meta-science, and big team science

I believe that science is best advanced through information sharing and collaboration, and have worked on several open data repositories and large-scale collaborative endeavours.

  • What does it look like to aggregate data from various contributors into a centralised open data repository?
    • Wordbank, a repository of child vocabulary data from Communicative Development Inventories.
    • Peekbank, a repository of child language processing data from looking-while-listening studies.
  • How do we leverage big team science to work on large-scale distributed projects?
    • ManyBabies, a consortium of multi-lab replication efforts for key developmental science findings.
    • The Replication Database, a community crowdsourced database for replication studies.