Loading...
Loading...
Image classification, object detection, segmentation, face recognition, OCR, image generation, video understanding
15,409 datasets
Anadolu OCR Corpus is an OpenCR export of OCR text and metadata for 52 historical PDF sources in Ottoman Turkish, Turkish, and Arabic. The dataset is provided in two Hugging Face configs: 'pages' and 'documents'. It was authored by fatihburakkaragoz and last updated on 2026-05-12.
2011 figures on the business operations of the Dutch central government (Rijk), published by the Ministry of the Interior and Kingdom Relations. The report covers topics such as organization, staff, housing, ICT, facilities, procurement, and information management. It is an annual publication released under a CC0-1.0 license.
Government and Municipalities of Québec provide a list of categories for recyclable, organic, and garbage materials accepted and prohibited for collection. The dataset is available in XML, CSV, and JSON formats and was last updated on April 17, 2026. It is licensed under CC-BY-4.0.
Québec's distributed water analyses for organic and inorganic substances are provided by the Government and Municipalities of Québec. The dataset is available in XML, CSV, and JSON formats under a CC-BY-4.0 license. It was last updated on April 17, 2026.
A directory of organizations that serve the population of Saguenay. The dataset is provided by the Government and Municipalities of Québec and is available in CSV and PDF formats. It was last updated on April 17, 2026.
Government of Ontario provides a list of recipients for its multi-year funding program for recognized sports organizations. The dataset is available in CSV and HTML formats and was last updated on April 17, 2026. It is licensed under the Open Government License - Canada (OGL-CA-2.0).
Government of Nova Scotia provides per-service-unit tonnage data for waste, recycling, and organics collected from curbside services. The dataset is available in multiple formats including CSV, XML, and HTML. It was last updated on April 17, 2026.
170 Ramsar wetland sites in the UK and its territories are documented for human factors. The dataset categorizes factors like conservation measures, land use, and ecosystem services, noting their impact and scope. Data originates from Ramsar Information Sheets compiled by the Joint Nature Conservation Committee, with most records dating from 1998.
BC Public Service Workforce Profiles provides perspective on representation of equity groups within the B.C. Public Service and all of its ministries and organizations. The dataset is published by the Government of British Columbia and was last updated on April 17, 2026. It is available in CSV and HTML formats under the OGL-CA-2.0 license.
Archived 2026-04-07, this dataset contains the text of the Geneva Act of the Hague Agreement, a multilateral intellectual property treaty. It was published by Global Affairs Canada for research and recordkeeping purposes. The content is an archived, unaltered publication concerning the international registration system for industrial designs.
SenseNova-SI-8M contains approximately 8.16 million training samples spanning about 2.72 million unique images. It is the official full-scale training dataset for the SenseNova-SI series, used to train the SenseNova-SI-1.1-InternVL3-8B model. The dataset was created by sensenova and last updated on May 13, 2026.
Laura Nieto Torrejón's research links team-level tactical playing styles to injury rates in elite European football. The dataset likely contains match performance indicators from Wyscout and injury records, analyzed via Principal Component Analysis and mixed-effects modeling. It was last updated on 2026-05-18.
Basnet, Gautam provides HEC-RAS hydraulic model files, geometry and terrain-related files, and selected model output files for reconstructing historical ice-jam volumes. The dataset focuses on the Athabasca River near Fort McMurray, Canada. A summary document describes the contents and organization of the dataset.
BWP is a multiplexed, wireless, and non-invasive platform for monitoring contraction forces of 3D cardiac organoids and engineered heart tissues. The dataset is 6.9 MB in size, authored by Chi Cong Nguyen, and was last updated on 2026-04-26. It is shared under a CC-BY-4.0 license on the figshare platform.
A paired-image dataset where each source photograph is re-rendered into four distinct artistic styles by the same model. The dataset, created by yufan, was last updated on Hugging Face in May 2026. It serves as a one-to-many style transfer benchmark with consistent composition across styles.
29.6 KB of survey data on digital technology adoption and organizational readiness. The dataset, authored by Rini Kurnia Sari and shared under a CC-BY-4.0 license, was last updated on May 10, 2026. Its small size suggests a focused survey or pilot study.
A research dataset file on MSME performance. The dataset was authored by Rini Kurnia Sari and last updated on May 10, 2026. It is a 29.6 KB XLSX file shared under a CC-BY-4.0 license on figshare.
Synthetic chat conversations between a Socratic tutor and a learner. The dataset was created by author 'breitburg' and was last updated on the Hugging Face platform on 2026-05-19. Each conversation is structured as a JSONL file with one conversation per line.
Shipborne data from the R/V Melville cruise MV1102 investigates the impact of whitecaps on satellite-derived ocean color and aerosol measurements. The dataset was collected during a transect from Cape Town, South Africa, to Valparaiso, Chile, from February 2 to March 14, 2011. It includes measurements of whitecap coverage, surface reflectance, aerosol optical thickness, and in-situ profiles of marine optical properties to develop a bulk whitecap reflectance model.
Arabidopsis seedlings grown from seed for 12 days on the International Space Station provide a unique proteomic record of plant adaptation to microgravity. The National Aeronautics and Space Administration preserved leaf and root samples in RNAlater on orbit and analyzed them using iTRAQ broad-scale proteomics procedures. This dataset captures organ-specific protein expression changes resulting from spaceflight environmental stressors.