Sign in to view source links and access this dataset
Description
Metadata and content classification for 4,489,228 YouTube videos identified as potentially educational, totaling 3,975,157 hours of content. The dataset serves as a discovery and processing queue for a large transcription project. It was created by author 'thepowerfuldeez' and last updated on Hugging Face in February 2026.
Use Cases
Filtering educational video candidates based on metadata and content categorization.
Assessing license risk for videos before transcription and redistribution.
Analyzing the scale and composition of educational content on YouTube.
Building search indices or recommendation systems for educational video discovery.
Strengths
Large scale with metadata for 4,489,228 videos.
Includes an estimated total duration of 3,975,157 hours of video.
Provides content categorization and license risk assessment.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect bias inherent to YouTube's platform and the selection criteria for 'educational' videos.
Provenance
Source
YouTube
Collection Method
Likely gathered via YouTube's API or web scraping, filtered for educational content.
Freshness
Last updated 2026-02-19 18:44:39; freshness should be verified.
License information for the dataset itself is unknown; license risk assessment is provided for the videos.