A VISUALIZATION OF
Breaking Language Barriers: Equitable Performance in Multilingual Language Models
Nagar, Khvatskii, Sokol, Chawla (NAACL 2025)
Read Paper on arXiv

We propose a novel fine-tuning approach using synthetic code-switching as a solution to the multilingual performance gap.

Phase I

The English Baseline

This visualization represents the CommonSenseQA dataset.

To clarify the performance disparity, we have grouped the questions based on model outcomes. The Blue Blocks represent the questions LLaMA-3 answers correctly in English.

It establishes a strong baseline, solving 78.0% of the tasks.

Phase II

The Performance Gap

When prompted in Hindi, the model struggles. It fails on a significant portion of questions it previously solved in English.

The Red Band represents this "Performance Gap." These are not necessarily "harder" questions; they are simply questions where the model lacks the multilingual representation to map the concept from English to Hindi.

78.0%English

54.0%Hindi

Phase III

Synthetic Augmentation

We generate synthetic "Hinglish" data using the CoCoa Model (Mondal et al., 2022).

Unlike standard translation, CoCoa allows us to enforce specific Code-Mixing Indexes (CMI), ensuring a precise ratio of English grammar to Hindi vocabulary. This acts as a semantic bridge during fine-tuning.

*We found CMI 2 optimal. It maintains the English sentence structure (Blue) while injecting dense Hindi tokens (Red).

Phase IV

Closing the Divide

Fine-tuning on this synthetic data forces the model to align its internal manifolds.

The results are visualized in Green. We recover the vast majority of the performance gap. Hindi accuracy jumps significantly, proving that we don't need massive native datasets to achieve equity—we just need better geometric alignment.

+31.6%Gain

85.6%Final Acc

Unsolved Failed in both languages.

The Gap Questions English solves, but Hindi fails.

Shared Knowledge Solved in both languages.