Loading...
Loading...
Legislative text, court decisions, regulatory filings, patents, government contracts, election data
9,690 datasets
840 is the statutory limit for pedicab registration plates in New York City, triggering an annual lottery when the number falls below this threshold. The dataset contains applications for this lottery, managed by the NYC Department of Consumer and Worker Protection (DCWP), with records last updated on August 14, 2023. Columns suggest tracking includes applicant details, lottery priority numbers, selection status, and geographic information like borough and council district.
Approximately 38,500 poems were scraped from the Public-Domain-Poetry.com website. DanFosing compiled this English-language collection, which was last updated on Hugging Face in September 2023. All content is in the public domain.
Multi Legal Pile is a dataset of legal documents in the 24 official languages of the European Union. It was created by author joelniklaus and last updated on Hugging Face in October 2023.
Monthly statistics detail utilization of and applications for multi-agency emergency housing assistance, as mandated by Local Law 37 of 2011. The dataset is provided by data.cityofnewyork.us and was last updated in August 2023, though it is now a historical record. It covers categories like Families with Children, Single Men, Single Women, and Adult Families.
August 2023 dataset derived from The Pile, with copyrighted content removed to respect author rights. Created by monology, this collection is intended for training large language models without copyright infringement. The methodology involved removing content from specific copyrighted sources like Books3 and BookCorpus2.
Waldenbuch, Germany's urban development plan for the 'Altstadt' (Old Town) area, provided as a WMS service. The plan contains legally binding stipulations for urban development order, developed from the land use plan, with a predominant commercial/residential use. The dataset is maintained by the Bundesamt für Kartographie und Geodäsie and was last updated on September 4, 2023.
53,000+ multiple choice questions focused on identifying the relevant legal holding of a cited case. The dataset targets legal reasoning and natural language understanding within the domain of judicial decisions.
Law Instruct 4K is a text dataset containing 4,000 examples designed for instruction tuning of language models on legal tasks. It was created by author clam004 and uploaded to Hugging Face in September 2023.
King County's data.kingcounty.gov provides precinct-level results from election night for primary elections. The dataset includes vote counts by race, party, and legislative district for the August 2023 primary, with similar data available for the August 2019 primary. Columns suggest this is a detailed record of votes cast, categorized by counter type and geographic precinct.
OneNYC 2050 (Historical) from the City of New York contains indicators for assessing progress toward eight strategic goals. The dataset includes goal, sub-initiative, indicator, baseline year, and baseline value fields, collected by the Mayor's Office of Operations.
Plan C2193 is the Texas U.S. Congressional Districts map enacted by the 87th Legislature, 3rd Called Special Session. It applies to elections beginning with the primary and general elections in 2022. The dataset was last updated on 2023-07-20 and is hosted by data.texas.gov.
German-language dataset contains real questions from individuals and corresponding answers provided by qualified lawyers. The dataset includes at least 5,000 conversations, with a larger subset of 30,000 also available. Created by author hjf-utc and last updated on Hugging Face in August 2023.
Budget program passports from local Ukrainian governments, detailing planned expenditures for landscaping and housing policy. The dataset includes information such as program codes, names, goals, and purposes. It was published on the States site of Ukraine and last updated on August 21, 2023.
The High Qualification Commission of Judges of Ukraine maintains lists of candidates enrolled in the reserve for vacant judge positions. This dataset, published on the eu_open_data platform, likely contains candidate information related to the selection process. It was last updated on August 25, 2023.
The U.S. Environmental Protection Agency's database tracks grant recipient compliance activities, including training, cost rate negotiations, and on-site reviews. It provides information for EPA staff on planned and performed monitoring activities per calendar year. The system records advanced monitoring activities with attached reports as part of the Grantee Compliance Assistance Initiative.
Two categories of open data including Italian Public Administration records and Civic Data use cases are organized into a curated repository. This list facilitates the discovery of public sector information sources across various Italian administrative domains.
Law Stackexchange Prompts contains text prompts derived from the Law Stack Exchange community. The dataset was created by author 'dim' and uploaded to Hugging Face in September 2023. The exact size and structure of the data are not detailed in the provided card.
228 billion BPE tokens aggregated from permissively-licensed sources including Case Law and the public domain subset of Pile of Law. The corpus is tokenized using the GPT-NeoX tokenizer and categorized into domains such as Legal to support the development of open-source language models.
233 countries and territories represented through structured data on national legislatures and their members. The dataset includes biographical information, party affiliations, and legislative term dates for over 70,000 politicians globally.
Decisions and orders from the Kamyanka City Council and its executive committee, as recorded in the city's territorial community register. The data originates from the States site of Ukraine and was last updated on August 24, 2023. It likely contains official municipal resolutions, executive committee decisions, and mayoral orders.