Multipurpose World News Headlines from 18 Major Media Sources
arff
Available on 1 platform
Sign in to view source links and access this dataset
Description
News headlines are collected from 18 major media sources including Fox Business, Reuters, BBC, and The New York Times. A script checks for new headlines every 20 minutes, with data acquisition starting on March 21, 2020. The dataset creator intends to update it daily, subject to system availability.
Use Cases
Train headline classification models based on the variety of news sources.
Analyze media coverage trends over time based on the 20-minute collection intervals.
Perform sentiment analysis on news content from the listed publishers.
Study the framing of events across different media outlets.
Strengths
Data is collected from 18 distinct and prominent media sources.
Collection began on a specific date: March 21, 2020.
The script is designed to check for updates at a high frequency (every 20 minutes).
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Last update date is unknown; freshness unverified.
Provenance
Source
Headlines scraped from 18 listed websites including Foxbusiness.com, Reuters, and Bbc.com.
Collection Method
A script checks for new headlines every 20 minutes and adds them to a database.
Time Range
Collection started March 21, 2020; end date is ongoing.
Freshness
The creator intends daily updates, but this is dependent on system availability.
Dataset is licensed under GPL 2, which may impose specific redistribution requirements.