DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

OpenAI HumanEval: 164 Handwritten Python Programming Problems | DataSalon

Home Software Engineering & SecurityOpenAI HumanEval: 164 Handwritten Python Programming Problems

Software Engineering & Security

OpenAI HumanEval: 164 Handwritten Python Programming Problems

Name: OpenAI HumanEval: 164 Handwritten Python Programming Problems
Creator: openai
Published: 2022-03-02T23:29:22
Keywords: Source Datasetsoriginal, Language Creatorsexpert Generated, Librarypolars, Languageen, Size Categoriesn1 K, Modalitytext, Librarymlcroissant, Librarydatasets, Librarypandas, Parquet, Code Generation, Regionus, Multilingualitymonolingual, Licensemit, Annotations Creatorsexpert Generated

by openai·Updated 2y ago

Available on 1 platform

Description

164 handwritten Python programming problems were released by OpenAI in 2021 to evaluate the functional correctness of code generation models. Each entry provides a function signature, docstring, reference implementation, and unit tests for automated validation.

Use Cases

Evaluating code generation models using the 'docstring' and 'function signature' as prompts
Benchmarking functional correctness by executing generated code against the provided 'unit tests'
Fine-tuning small language models on high-quality Python implementations found in the 'body' field

Strengths

164 expert-handwritten problems
Includes unit tests for automated verification
MIT licensed

Limitations

Small sample size of 164 records
Restricted to Python syntax
Susceptible to benchmark contamination in models trained on post-2021 web data

Provenance

Source: OpenAI (Arxiv 2107.03374)
Collection Method: expert-generated
Time Range: 2021
Freshness: Static benchmark released in 2021; last metadata update in 2024.

Requires a secure sandbox environment to execute generated code against the included unit tests to prevent arbitrary code execution risks.

Parquet Source Datasetsoriginal Language Creatorsexpert Generated Librarypolars Languageen Size Categoriesn1 K Modalitytext Librarymlcroissant Librarydatasets Librarypandas Code Generation Regionus Multilingualitymonolingual Licensemit Annotations Creatorsexpert Generated

Related Datasets

Quality Score

D36

Description

Source

Reputation

Quality Score

D36

Description

Source

Reputation

Access

Community

232.2K downloads

376 likes

0 views

Dataset Info

Author: openai
Created: Mar 2, 2022
Updated: Jan 4, 2024
Last synced: Jun 8, 2026

Access

Community

232.2K downloads

376 likes

0 views

Dataset Info

Author: openai
Created: Mar 2, 2022
Updated: Jan 4, 2024
Last synced: Jun 8, 2026

OpenAI HumanEval: 164 Handwritten Python Programming Problems

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info