Loading...
Loading...
News corpora, social media analysis, movie/music metadata, sports data, cultural datasets, misinformation
11,013 datasets
Slide decks and analysis produced for the 2025–26 refresh of London's Local Skills Improvement Plan (LSIP). These resources summarise sector-level evidence shared at stakeholder events and reflect data available at the time. The Greater London Authority created these materials, with additional slide decks added in December 2025.
Over 2.76 million Nepali news articles scraped from Baahrakhari and other sources, with cleaned category labels. The dataset was created by spandyie and was last updated in February 2026. It is provided in Parquet format compressed with Snappy.
Dresden's Infoportal Accessibility provides detailed information on public facilities regarding barrier-free access, toilets, technical aids, and special services for people with disabilities. The data is available in three languages (German, Czech, and English) and is published via a WFS service. Most records were collected through an INTERREG-funded project focused on the Bohemian-Saxon border area.
Manju Bhai e-Gadgets Sales likely contains product-level sales metrics for electronic gadgets. The dataset appears to be a review of sales performance metrics, potentially from an e-commerce platform. Its author, organization, and specific temporal coverage are unknown.
Trending movies according to votes, sourced from Kaggle. The dataset likely contains movie titles and associated vote counts or popularity metrics. Metadata is minimal; the specific columns, time range, and data collection method are unknown.
National Park Service authoritative data defines the location of physical and cultural features within Great Smoky Mountains National Park. The database holds Federally recognized names, geographic coordinates, and attributes like feature classification and historical information. These data are published by the Department of the Interior and were last updated in March 2026.
Paraphrase pairs of tweets for the task of text-to-text semantic similarity classification. It is part of the Massive Text Embedding Benchmark (MTEB) and is intended for evaluating embedding models. The specific row count and column details are not provided in the input.
World Bank data measures travel services as a percentage of total service exports for national economies. The indicator quantifies the economic contribution of nonresident and resident travel expenditures within the Balance of Payments framework. It is compiled by the World Bank's World Development Indicators team.
Insurance and financial services data measures the share of these services in total service exports for national economies. The dataset is part of the World Bank's World Development Indicators, a collection of global development data. It covers transactions between residents and non-residents for insurance, financial intermediary, and auxiliary services.
Travel services account for a percentage of total service imports in national balance of payments. This World Development Indicators metric quantifies the economic weight of non-resident travel expenditures within a country's service import portfolio. The World Bank compiles this data for global economic monitoring.
World Development Indicators data measures insurance and financial services imports as a percentage of total service imports for countries. The dataset quantifies the share of cross-border financial and insurance transactions within a nation's broader service import economy. It is compiled by the World Bank's World Development Indicators team.
A collection of Khmer-language news articles scraped from multiple online news websites for academic and research purposes related to Khmer OCR and natural language processing. It was created by Thareah and last updated in February 2026. The dataset consists of extracted textual content without images or structured metadata.
A free sample of restaurant market data from BeamStation, focusing on a subset of highly active users. The dataset likely contains information related to restaurant reviews and market performance. Its specific size, features, and collection date are not detailed in the provided metadata.
TMDB Top-Rated Movies Dataset features movie ratings. The dataset is sourced from The Movie Database (TMDB) platform and aggregated on Kaggle. Its last update date and specific size are unknown.
Movie Dataset is a collection of film-related data published on the Kaggle platform. The dataset's specific contents, such as titles, genres, ratings, or cast information, are not detailed in the available metadata. Its size, structure, and creation details are unknown and require verification after download.
Kaggle hosts a dataset titled 'fake-review-dataset-clean-v2'. The dataset likely contains text data related to online product reviews, potentially with labels indicating authenticity. The author, organization, and specific collection details are not provided in the available metadata.
Kaggle hosts a dataset of over 1900 movies and TV shows sourced from IMDb. The specific collection date, author, and detailed column information are not provided in the available metadata. Its content likely includes titles and associated metadata typical of IMDb listings.
TikTok post data is a dataset sourced from the social media platform TikTok. It was published on Kaggle, but the author, organization, and specific collection details are unknown. The dataset's size, row count, and specific column structure are not provided in the available metadata.
Ulasan Film Indonesia is a dataset of Indonesian-language film reviews published on HuggingFace by Faisaljabir. The dataset's content likely contains user-generated text about movies. Its last update was recorded on 2026-04-05.
Synthetic binary blobs for measuring sequential write throughput latency, released by micmicmicmicmicchan in March 2026. The data is formatted in Parquet without Snappy compression to facilitate infrastructure verification and performance benchmarking.