Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,721 datasets
An inventory of information generated and managed by Colombia's National Natural Parks for public knowledge. The dataset includes columns for document series, titles, descriptions, formats, and languages. It is published via the www.datos.gov.co platform and was last updated on 2026-05-18.
Government records list tourist residence establishments holding valid registration certificates under the Tourist Accommodation Act. Data originates from the Quebec Tourism Information System (SIT Quebec) and is published in CSV, XML, HTML, and JSON formats. The dataset provides a snapshot of certified operators at the time of file publication.
100 episodes of robot manipulation data, totaling 32,208 frames, collected in the Isaac Sim 5.1 simulation environment. The dataset was generated by the CoRL2026-CSI team using a Code-as-Policies replay pipeline for the task of pulling a cube to a target marker. Each frame includes RGB camera views and per-frame natural-language skill labels generated by the Gemini model.
A research document analyzing TP53 gene mutations at the His179 locus in Non-Small Cell Lung Cancer (NSCLC) subtypes. The study evaluated mutational profiles from 616 lung adenocarcinoma (LUAD) and 544 lung squamous cell carcinoma (LUSC) individuals from TCGA, using molecular dynamics simulations to assess structural and functional ramifications. The document was authored by Ankur Datta and last updated on 2026-04-10.
A 2026 study by Ankur Datta analyzes TP53 gene mutations at the His179 zinc-binding site in 1,160 Non-Small Cell Lung Cancer (NSCLC) patients from The Cancer Genome Atlas (TCGA). The research, published on figshare, uses molecular dynamics simulations to characterize the structural and functional impact of five specific amino acid substitutions (Y/R/N/L/D). Findings indicate these mutations compromise protein stability and alter the energy landscape, highlighting their pathogenic potential.
TP53 gene mutations were present in 50% of lung adenocarcinoma and 81% of lung squamous cell carcinoma cases analyzed. This dataset contains a research document analyzing the structural and functional impact of specific mutations at the His179 residue in the p53 protein's zinc-binding motif, using data from The Cancer Genome Atlas and molecular dynamics simulations. The document was authored by Ankur Datta and last updated in April 2026.
TP53 gene mutations were present in 50% of lung adenocarcinoma and 81% of lung squamous cell carcinoma cases analyzed. This dataset contains a research document analyzing the structural and functional impact of specific mutations at the His179 site in the p53 zinc-binding motif across 1,160 individuals from TCGA. The study by Ankur Datta, last updated in April 2026, used molecular dynamics simulations to assess protein stability and conformational changes.
616 lung adenocarcinoma (LUAD) and 544 lung squamous cell carcinoma (LUSC) patient mutational profiles from TCGA were analyzed by Ankur Datta, with results last updated in April 2026. The study focuses on specific mutations at the His179 residue in the TP53 zinc-binding motif, evaluating their structural impact via molecular dynamics simulations. It reports TP53 mutations in 50% of LUAD and 81% of LUSC cases, with C > A as the predominant substitution.
A 2026 study by Ankur Datta analyzes mutational profiles from 1,160 individuals (616 LUAD and 544 LUSC) sourced from TCGA. The research focuses on specific TP53 H179 zinc-binding motif variants in Non-Small Cell Lung Cancer, using molecular dynamics simulations to assess structural and functional impacts. The document details mutation frequencies, conformational signatures, and binding affinity changes for five identified amino acid substitutions.
Ankur Datta's research data sheet analyzes TP53 gene mutations at the His179 zinc-binding site in Non-Small Cell Lung Cancer. The study evaluates mutational profiles from 616 lung adenocarcinoma and 544 lung squamous cell carcinoma individuals in the TCGA. The document, last updated in April 2026, details structural and functional ramifications of specific H179 substitutions using molecular dynamics simulations.
A 2026 study by Ankur Datta analyzes TP53 mutations in 1,160 individuals from TCGA, comprising 616 lung adenocarcinoma and 544 lung squamous cell carcinoma cases. The research focuses on structural and functional ramifications of specific mutations at the His179 zinc-binding site using molecular dynamics simulations. The document, licensed CC-BY-4.0, is a 1.1 MB DOCX file.
TP53 gene mutations were present in 50% of lung adenocarcinoma and 81% of lung squamous cell carcinoma cases from The Cancer Genome Atlas. This research document details the structural and functional ramifications of specific His179 zinc-binding motif mutations in the p53 protein, analyzed via molecular dynamics simulations. The dataset, authored by Ankur Datta and last updated in April 2026, is a 822.2 KB document shared under a CC-BY-4.0 license.
Ankur Datta's research data sheet analyzes TP53 mutations in Non-Small Cell Lung Cancer (NSCLC). The study evaluates mutational profiles from 616 lung adenocarcinoma and 544 lung squamous cell carcinoma individuals from TCGA, focusing on structural perturbations at the His179 zinc-binding site. The document, last updated in April 2026, details results from static structural analysis and molecular dynamics simulations.
A 2026 study by Ankur Datta analyzes TP53 gene mutations at the His179 zinc-binding site in 1,160 Non-Small Cell Lung Cancer (NSCLC) patients from TCGA. The research, published on figshare, combines genomic profiling with molecular dynamics simulations to assess structural and functional impacts of specific amino acid substitutions. The dataset includes results on mutation prevalence, conformational signatures, and binding affinity changes for five H179 variants.
TP53 gene mutations were present in 50% of lung adenocarcinoma and 81% of lung squamous cell carcinoma cases from The Cancer Genome Atlas. This 2.0 MB document by Ankur Datta, last updated April 2026, details a study using molecular dynamics simulations to analyze structural and functional impacts of specific mutations at the His179 zinc-binding site of the p53 protein.
OpenThoughts-Agent-RL-5K is a set of 5,000 reinforcement-learning tasks used to RL-finetune an initial SFT model into a final agentic checkpoint. The dataset, released by the open-thoughts organization, holds executable agentic tasks, differing from SFT datasets which contain full task-trajectory pairs. It was last updated on June 9, 2026.
A registry of public information assets generated, obtained, or controlled by obligated entities, specifically an insurance company. The dataset includes columns for language, location, storage medium, category name, content description, format, and publication status. It was last updated on 2026-05-18 and is provided by datos.gov.co.
The Australian Ocean Data Network provides a geomorphic classification of shelf features surrounding Lord Howe Island and Balls Pyramid. This dataset includes information on the size, extent, and type of features such as submerged fossil reefs, ridges, and sandy basins. The classification was visually interpreted and digitized in ArcGIS v10.1, extending upon prior work by Linklater et al. (2015).
35,996 clinical text samples in Spanish, averaging about 700 tokens each, form the largest publicly available corpus for clinical NLP research. The dataset aggregates texts from diverse open sources including medical journals, annotated corpora from shared tasks, and supplementary materials. It was created by IIC and last updated on the Hugging Face platform in May 2026.
Generated artifacts for the VANTAGE research project on speculative decoding for code editing. The dataset stores repository-relative paths and artifacts used by the paper and summarization scripts. It was created by faizancodes and last updated on June 2, 2026.