Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
11,003 datasets
A collection of four annotated datasets submitted to ICWSM'25, created by Marc Riven Herrera and hosted on Harvard Dataverse. It analyzes political and non-political content from Philippine Facebook pages, providing insights into user engagement, sentiment, civility, and misinformation. The datasets support applications in social media analysis, political communication, and machine learning.
Bollywood movies dataset published on Kaggle. The dataset likely contains information about films from the Hindi-language film industry. Metadata is minimal; actual content requires verification after download.
A dataset titled 'movie_recommender.csv' sourced from Kaggle. The title suggests it contains data for building or testing movie recommendation systems. No further metadata is available to confirm its specific contents, size, or origin.
A dataset named 'wine_reviews' sourced from the OpenML platform. No information is available regarding its contents, size, or structure.
A dataset of customer reviews from Tokopedia, a major Indonesian e-commerce platform. The raw description indicates the data is organized into 7 categories, suggesting a multi-class structure. It was published on Kaggle, but details on volume, authorship, and update recency are unavailable.
review-chekpoints--2026-05-24--13263-13263 is a dataset published on Kaggle. The title suggests it likely contains data related to checkpoints or evaluation points for reviews, possibly for model training or assessment. The dataset's specific content, size, and origin require verification after download.
Experimental data evaluates the therapeutic potential of Bombax ceiba flower aqueous extract for alleviating cyclophosphamide-induced immunosuppression. The dataset was contributed by author Wang, Liuping via Harvard Dataverse and was last updated in April 2026.
180,000 user reviews for health and fitness applications, including Strava, Calm, Nike, Adidas, Lose It!, Garmin, and Google Fit. The dataset is sourced from Kaggle and is intended for natural language processing tasks. The author, organization, and specific collection date are unknown.
A dataset concerning cardiovascular disease and blood pressure, published on Kaggle. The title suggests it may contain health metrics related to CVD. Specifics regarding its size, origin, and creation date are unknown.
Nguyetnga_podcast is a dataset published on Kaggle. The dataset likely contains audio files and associated metadata for a podcast series. The specific content, size, and collection details are not provided in the available metadata.
Hydrographic cast data from the 1990-1991 Columbia River Plume Study collected by NOAA NCEI. The dataset includes CTD measurements of temperature, salinity, pressure, and conductivity, along with dynamic height, to map the plume's extent and thickness off Oregon and Washington. Observations were taken from the R/V Wecoma research vessel during the fall season.
Meteorological observations of air temperature and pressure were recorded at multiple cane sites during the 1976-1977 Mirny-Dome C traverse in Antarctica. The data was collected by personnel involved in the traverse to aid in precise location determination. Records are archived by the Australian Antarctic Division and represent a snapshot from the late 1970s.
German IGY Tropical Sea Level Pressure data provides daily sea-level pressure measurements on a 5-degree latitude/longitude grid. It covers the global tropics from 25°S to 25°N. The dataset was created by SCIOPS and records conditions during the International Geophysical Year from June 1957 to December 1958.
Hourly barometric measurements were recorded at the Dutch Royal Magnetic and Meteorological Observatory in Batavia (Djakarta). The dataset covers a 79-year period from 1866 to 1944. Kevin Hamilton and Rolando Garcia keypunched the data in 1986 from original Observatory Yearbooks.
NMC forecast grids provide daily meteorological predictions on a 47x51 Northern Hemisphere polar-stereographic grid centered on the North Pole. The grids contain parameters like geopotential height, temperature, wind, and pressure across various tropospheric levels. Data from the SCIOPS organization is available for sporadic periods between November 1967 and December 1971.
7 columns contain daily stock metrics for Facebook Inc., including opening price, closing price, and daily high. The data covers a period from 2012 to August 2020. It is shared under a CC0 1.0 license on the OpenML platform.
User reviews for popular Halloween costumes sold on Amazon as of November 2020. The dataset includes review text, titles, scores, publishing dates, and reviewer locations. It is provided under a CC0-1.0 license and is intended as an exercise for text preprocessing and feature extraction.
A dataset listing movies that were trending over time, sourced from Kaggle. The specific temporal coverage and data collection method are not detailed in the available metadata. The dataset likely contains information about movie popularity across different years.
A dataset of top-rated entertainment titles from IMDb, containing over 3 million data points. The dataset includes nested genre information and release years. It was sourced from Kaggle, but the original author, specific license, and last update date are unknown.
Movies data is a dataset hosted on Kaggle. The dataset's specific contents, size, and origin are not detailed in the available metadata. Further inspection after download is required to confirm its scope and structure.