← Back to feed
2026-04-18data

RoIt-XMASA: Multi-Domain Multilingual Sentiment Analysis Dataset for Romanian and Italian

Andrei-Marius Avram, Aureliu Valentin Antonie, Cosmin-Mircea Croitoru, Vlad Andrei Muntean, Dumitru-Clementin Cercel

Key claim

XLM-R outperforms baseline with 66.23% F1-score.

RoIt-XMASA is a new multilingual dataset for sentiment analysis that includes 36,000 labeled reviews in Italian and Romanian. The proposed adversarial training framework improves sentiment discrimination while maintaining language and domain invariance, achieving a notable F1-score of 66.23% with XLM-R.

Novelty
7.5/10

The dataset and multi-target adversarial training framework represent a meaningful extension to existing sentiment analysis methods.

Reliability
7.0/10

The methodology is solid with clear evaluation metrics, though further validation may be needed.

Deep reliability assessment

The methodology supports the claim that the multi-target adversarial training framework improves sentiment classification performance across multilingual models, but the generalization of these results to other languages or domains not covered in the dataset may be overclaimed.

Reproducibility

Yes, the dataset is available on HuggingFace, and the paper provides detailed hyperparameters and methodological descriptions, although the code itself is not explicitly mentioned as being open-sourced.

Discussion questions

  1. How does the dataset's focus on only two languages (Italian and Romanian) affect the generalizability of the proposed method to other low-resource languages?
  2. What are the practical implications for deploying this sentiment analysis model in real-world applications, especially in terms of computational cost and resource requirements?
  3. What specific conditions or experiments could demonstrate that the multi-target adversarial training framework does not provide significant improvements over baseline models?

Key figure

Figure 1 illustrates the rating distribution across the RoIt-XMASA dataset, showing peaks at extreme ratings, which reflects user tendencies to review products they strongly like or dislike.

Read on arXiv →