Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
42,058 datasets
Maria Atif's mixed-method study provides evidence on patterns and drivers of Cesarean sections in Pakistan. The dataset includes quantitative data from 605 women who underwent C-Sections in public, private, or semi-private facilities and qualitative insights from stakeholder perceptions. The data was last updated on 2026-05-14 and is shared under a CC-BY-4.0 license.
A study by Bohye Jeong investigates the role of Musashi protein paralogs MSI1 and MSI2 in photoreceptor-specific alternative splicing. The dataset includes raw ERG data and R-code supporting the analysis of splicing in Cc2d2a, Cep290, Prom1, and Ttc8 genes across combined Msi1 and Msi2 knockout models. The data was last updated on May 14, 2026, and is shared under a CC-BY-4.0 license.
393.8 KB of immunofluorescence data from Bohye Jeong, last updated May 14, 2026. The dataset supports a study on the role of Musashi1 and Musashi2 proteins in regulating photoreceptor-specific splicing. It contains data from combined Msi1 and Msi2 knockout models used to analyze exon inclusion in genes Cc2d2a, Cep290, Prom1, and Ttc8.
5.5 KB of statistical test results from a study applying persistent homology to analyze abstract paintings. The dataset, authored by Emil Dmitruk and shared on figshare under CC-BY-4.0, compares two sets of images based on viewer eye tracking, brain activity, and subjective experience. It was last updated on May 14, 2026.
Monthly averages of mean temperature, temperature range, precipitation, rain days, and sunshine hours are provided for the terrestrial surface of the globe. The data is gridded at a 0.5-degree longitude/latitude resolution and represents a 30-year climatology from 1930 to 1960. It was generated from a large database using a partial thin-plate splining algorithm.
Evaluation metrics for the SleepDepNet model, a transformer-based multi-task learning framework for analyzing sleep quality and depressive sentiment from Reddit text. The dataset, authored by Akshi Kumar and last updated on 2026-05-07, contains performance scores including F1-scores of 0.89 for sleep quality classification and 0.86 for depressive sentiment analysis. It is stored in an XLS file with a size of 5.5 KB.
A 5.5 KB dataset on figshare, authored by Akshi Kumar and last updated May 7, 2026, under a CC-BY-4.0 license. It contains performance comparison results for the SleepDepNet multi-task learning framework, which models sleep quality and depressive sentiment from user-generated Reddit text. The dataset likely includes experimental results such as F1-scores of 0.89 and 0.86 for the model's classification tasks.
5.5 KB of performance metrics for the SleepDepNet ablation study. The dataset, authored by Akshi Kumar and last updated on 2026-05-07, contains experimental results from a transformer-based multi-task learning framework analyzing Reddit text for sleep quality and depressive sentiment. It includes F1-scores of 0.89 for sleep quality classification and 0.86 for depressive sentiment analysis.
Akshi Kumar's 2026 dataset contains evaluation metrics for the SleepDepNet model, a multi-task learning framework analyzing user-generated text. The dataset, stored in an XLS file of 5.5 KB, includes performance scores for classifying sleep quality and depressive sentiment from Reddit posts. Experimental results reported include an F1-score of 0.89 for sleep quality and 0.86 for depressive sentiment analysis.
A dataset supporting the SleepDepNet multi-task learning framework, introduced by Akshi Kumar and last updated on 2026-05-07. The data consists of user-generated text collected from Reddit communities related to sleep and mental health. It is used to model the relationship between sleep quality and depressive sentiment.
1965 Chinese college students participated in a cross-sectional study during COVID-19 campus lockdowns. The dataset contains survey results exploring associations between psychological distress, lifestyle, career planning, and health-related quality of life. Data was collected via an online questionnaire platform using snowball sampling and analyzed by Baochen Su.
76 participants from Malawi's Salima and Chiradzulu districts were interviewed between October and December 2023. This qualitative dataset contains summarized themes and representative quotes from 16 in-depth interviews and six focus group discussions with traditional healers, religious leaders, caregivers, and persons with lived experience. The data explores community perspectives, treatment-seeking practices, and pathways for psychosis management.
Hua Song compiled over 39,000 Web of Science publications and nearly 10,000 patent records from 2016 to 2025. The data covers four Chinese cities—Wuhan, Chengdu, Hangzhou, and Tianjin—and four high-tech domains: AI, fiber-optic communication, intelligent connected vehicles, and storage chips. The dataset was last updated on 2026-05-14.
China's high-tech innovation landscape is analyzed through over 39,000 Web of Science publications and nearly 10,000 patent records from 2016 to 2025. The data covers Wuhan, Chengdu, Hangzhou, and Tianjin across AI, fiber-optic communication, intelligent vehicles, and storage chips. Author Hua Song compiled this dataset, last updated in May 2026, using bibliometric analysis and LLM-assisted semantic interpretation.
World Bank data on energy production, use, dependency, and efficiency for Japan, compiled from the International Energy Agency and the Carbon Dioxide Information Analysis Center. The dataset addresses the sustainability of global energy trends amidst economic growth and industrialization. It was last updated on 2026-04-28.
An anonymized survey dataset from the 'Our Big Conversation' consultation run in 2020. It contains raw responses from residents on how life changed due to the COVID-19 pandemic and what is needed for recovery. The data was collected via an online survey and a paper survey in the June 2020 edition of 'Our City' and is published by the Government Digital Service under the OGL-UK-3.0 license.
World Bank data compiled from the International Energy Agency and the Carbon Dioxide Information Analysis Center. It contains indicators on energy production, use, dependency, and efficiency for India, reflecting trends in the world economy and industrialization. The dataset was last updated on 2026-04-28.
The Australian Ocean Data Network provides an inventory of descriptive attributes for the Eromanga Basin groundwater system. The dataset covers over 1,250,000 square kilometres in central and eastern Australia and includes themes such as location, demographics, geology, hydrogeology, and land use. It was last updated on 2026-05-05.
A 2026 dataset from figshare by Rui Shi details the development of a selective RARα antagonist for male contraception. It includes information on the discovery of compound 23, a highly potent and selective inhibitor with an IC50 of 0.051 nM and >1650-fold selectivity over RARβ. The data covers the compound's ADMET properties, oral bioavailability, and contraceptive efficacy in mice.
300.6 KB of data on benzopyran-, benzofuran-, and benzothiophene-derived RARα inhibitors for male contraception. The dataset includes SAR studies leading to compound 23, a highly potent and selective antagonist with an IC50 of 0.051 nM, published by Rui Shi on figshare in May 2026. Compound 23 is described as orally bioavailable and effective in reducing sperm counts in mice.