French Hate Speech Annotations from Multiple Datasets

Name: French Hate Speech Annotations from Multiple Datasets
Creator: manueltonneau
Published: 2024-04-10T12:31:09
Keywords: Size Categories10 Kn100 K, Librarypolars, Modalitytext, CSV, Modalitytabular, Librarymlcroissant, Social Media, Librarydatasets, Librarypandas, Text Classification, Text, French Language, Hate Speech, Audio, Regionus, Languagefr, Task Categoriestext Classification

by manueltonneauUpdated 1y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Manueltonneau's French Hate Speech Superset contains 18,071 posts annotated as hateful or not. It merges all publicly available French hate speech datasets identified in a systematic 2024 survey. The dataset was last updated in October 2024.

Use Cases

Train binary text classifiers to predict the hateful/not-hateful label from post content.
Benchmark model performance on a consolidated set of 18,071 French hate speech examples.
Analyze linguistic patterns and vocabulary associated with hateful annotations across merged datasets.
Fine-tune pre-trained language models like CamemBERT for French hate speech detection tasks.

Strengths

Consolidated 18,071 annotated posts from multiple sources.
Based on a systematic survey of available French hate speech datasets in 2024.

Limitations

Specific column names, data distributions, and class balance are not provided.
The merge of multiple datasets may introduce inconsistencies in annotation guidelines.
The dataset's temporal coverage and geographic origin of posts are unknown.

Provenance

Source: Merge of multiple publicly available French hate speech datasets.
Collection Method: Preprocessing and merge of datasets identified via a systematic survey in early 2024.
Time Range: null
Freshness: Last updated October 2024.
Geography: null

null

Text Audio CSV Size Categories10 Kn100 K Librarypolars Modalitytext Modalitytabular Librarymlcroissant Social Media Librarydatasets Librarypandas Text Classification French Language Hate Speech Regionus Languagefr Task Categoriestext Classification

Related Datasets

Quality Score

D38

Description

42

Source

41

Reputation

32

Access

22

Community

41 downloads

8 likes

0 views

Dataset Info

Author: manueltonneau
Created: Apr 10, 2024
Updated: Oct 29, 2024
Last synced: Apr 8, 2026

Access

22

Community

41 downloads

8 likes

0 views

Dataset Info

Author: manueltonneau
Created: Apr 10, 2024
Updated: Oct 29, 2024
Last synced: Apr 8, 2026

French Hate Speech Annotations from Multiple Datasets

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info