Bilingual Code-Switching Corpus

Corpus Code-Switching Data

Overview

Collected and annotated a large corpus of code-switched speech across multiple language pairs (Spanish-English, Mandarin-English, French-English). The corpus includes detailed linguistic annotations and demographic information.

Timeline: 2020-2023
Outcome: Publicly available corpus of 500+ hours of natural bilingual conversation

Access

Corpus available here