Loading...
Loading...
Legislative text, court decisions, regulatory filings, patents, government contracts, election data
9,684 datasets
A dataset created by Anthropic, last updated on June 6, 2024, for evaluating language models. It contains questions designed to assess models' ability to handle election-related information accurately and harmlessly. The dataset is structured across three CSV files, each focusing on a specific aspect of election-related evaluations.
A 2024 dataset from RegLab contains queries, raw LLM outputs, and correct responses analyzed in Dahl et al., 'Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models'. The dataset likely supports research into the accuracy and reliability of large language models in legal contexts. It is a public subset, with additional queries held in a separate reserve file.
A processed version of the millawell/wikipedia_field_of_science dataset, prepared for retrieval-augmented generation systems with limited context windows. The author Laz4rz split longer Wikipedia pages into smaller entries, with each chunk targeting approximately 512 tokens and the page title added as a prefix. The dataset was last updated on June 12, 2024.
The National Consumer Complaint Database contains reports on household goods, safety violations, hazardous materials, cargo tanks, and passenger complaints submitted to the Department of Transportation. It is a web-based system used by the public and FMCSA staff to file complaints, supporting the development of cargo security regulations. The dataset was last updated on June 26, 2024.
A processed version of the millawell/wikipedia_field_of_science dataset, prepared for retrieval-augmented generation systems with limited context windows. The dataset was created by user Laz4rz and last updated on Hugging Face on June 12, 2024. Longer Wikipedia science articles have been split into smaller entries, with each chunk designed to be around 256 tokens.
Vietnamese Enterprise Law Qa is a dataset uploaded to HuggingFace by AlexNgV. The dataset was last updated on July 18, 2024. Its specific content and structure require verification after download.
22,600 statutory articles from Belgian law and 1,100 legal questions posed by citizens in French. Each question is manually labeled by experienced jurists to link it with relevant articles from the corpus for legal information retrieval tasks.
Public procurement plans, estimates, and financial statements from the Prosecutor's Office of the Sumy region in Ukraine. The dataset was published on the States site of Ukraine and was last updated on June 11, 2024. Available file formats include Excel, PDF, and Word documents.
Over 800,000 French law articles, including codes, laws, decrees, and orders, are compiled from France's LEGI dataset. Harvard Law School's Library Innovation Lab curated this collection, focusing on currently applicable legislation. A significant portion includes English translations generated by GPT-4, provided by Casetext, a Thomson Reuters company.
Venezuelan national-level electoral processes from 1998 to 2012 are analyzed using statistical forensic tools. The dataset, created by author Raúl Jiménez and last updated in May 2024, applies methods like second-digit Benford's law and vote distribution models to assess electoral integrity. The analysis focuses on detecting anomalies and irregular variations in the electoral roll, particularly from 2004 onward.
XPlanung supplementary statutes for Laupheimer Straße, published by the German Federal Agency for Cartography and Geodesy. The dataset is served via a Web Feature Service (WFS) and was last updated on June 7, 2024. Its specific content and scale require verification after download.
List of current regulatory acts published on the eu_open_data platform. The data originates from the States site of Ukraine and was last updated on 2024-06-11 07:10:20.305401. Available file formats include EXCEL XLS and EXCEL XLSX.
1,000 annotated German consumer contract clauses categorized by their legal validity under the German Civil Code (§§ 305-310 BGB). The dataset provides text segments from standard terms and conditions (AGB) paired with expert legal assessments regarding their compliance with consumer protection standards.
A representative sample of Indian court judgments spans from 1950 to 2017. It was created by the organization opennyaiorg, selecting the most cited judgments from IndianKanoon while controlling for court and case type. The dataset includes the full text of each judgment and its IndianKanoon URL.
CUAD contains 510 commercial legal contracts featuring over 13,000 expert-annotated labels across 41 specific clause categories. Developed by The Atticus Project and released in 2021, the dataset is designed to train and evaluate extractive question-answering models on complex legal prose.
A WMS service from XPlanung 5.0 provides the development plan 'Webersbühl West - demarcation statute' for the municipality of Schwenningen. The dataset was last updated on 2024-05-27 and is provided by the Bundesamt für Kartographie und Geodäsie. It describes the delimitation statutes for the Webersbühl West area.
A tokenized dataset derived from the Caselaw Access Project, a collection of U.S. court opinions. The dataset was uploaded to Hugging Face by user 'orionweller' and was last updated on July 4, 2024. Its specific size, format, and license are not detailed in the provided metadata.
Ukrainian enterprise financial data from the States site of Ukraine, last updated on May 3, 2024. The dataset contains planned financial indicators, including income and expense formation, and information on the implementation of these plans during the year. Reporting periods include quarters and the full year.
LegalPT aggregates publicly available legal data in Portuguese from sources including legislation, jurisprudence, legal articles, and government documents. This version is deduplicated using the MinHash algorithm and Locality Sensitive Hashing.
The dataset includes the scopes of the legally binding development plans for the city of Gudensberg. It is provided via the eu_open_data platform and was last updated on May 13, 2024. The data originates from the Bundesamt für Kartographie und Geodäsie.