MRL Benchmarks

Global PIQA Benchmark

Many languages lack culturally-specific evaluation datasets that are created by language community members themselves. This year's shared task for the Multilingual Representation Learning (MRL) workshop was for contributors to create a manually-annotated physical commonsense reasoning evaluation dataset for their language(s), e.g. for researchers who speak non-English language(s) natively. The format is similar to PIQA, a physical commonsense reasoning benchmark where each example consists of a prompt with two candidate completions ("solutions"). The result is Global PIQA, a collaboratively constructed multilingual physical reasoning benchmark with broad language coverage and culturally-specific examples for different languages.

All authors of accepted submissions had the option to be included on the resulting benchmark paper.

The shared task has concluded, however there is still an opportunity to contribute to Global PIQA! We will be accepting submissions for any language or variety that is not currently represented in Global PIQA. We especially invite submissions for low-resource languages and non-prestige varieties. Fill out this form to register your interest in contributing.

News

October 29, 2025: Global PIQA v0.1 is out. Check out the dataset on Hugging Face or the preprint.

Important Dates

~~September 15, 2025: Submit data~~
~~October 1, 2025: Decision notification~~
November 9, 2025: MRL workshop at EMNLP 2025.
November 2025 through early 2026: Organizers will work with the authors to prepare the compiled dataset and benchmark paper for publication.

MRL Benchmarks

Global PIQA Benchmark

Global PIQA represents over 100 languages and cultures, represented on this map. These languages cover five continents, 14 language families, and 23 writing systems.

News

Important Dates