CulturaX: A High-Quality, Multilingual Dataset for LLMs – Data Analysis and Experiments
:::info Authors: (1) Thuat Nguyen, Dept. of Computer Science, University of Oregon, OR, USA; (2) Chien Van Nguyen, Dept. of Computer Science, University of Oregon, OR, USA; (3) Viet Dac Lai, Dept. of Computer Science, University of Oregon, OR, USA; (4) Hieu Man, Dept. of Computer Science, University of Oregon, OR, USA; (5) Nghia Trung … Read more