Bilingual Code-Switching Corpus
Corpus
Code-Switching
Data
Overview
Collected and annotated a large corpus of code-switched speech across multiple language pairs (Spanish-English, Mandarin-English, French-English). The corpus includes detailed linguistic annotations and demographic information.
Timeline: 2020-2023
Outcome: Publicly available corpus of 500+ hours of natural bilingual conversation