AWS Open Data Registry in Machine-Readable NDJSON Format
Available on 1 platform
Sign in to view source links and access this dataset
Description
A machine-readable NDJSON version of the Registry of Open Data on AWS, which lists publicly available datasets hosted on AWS resources. The registry itself is owned and maintained by various government organizations, researchers, businesses, and individuals, not by AWS. This derived dataset is provided under an Apache-2.0 license by Amazon Web Services.
Use Cases
Build a dataset search engine based on the structured metadata entries.
Automate the discovery of new datasets for AI/ML projects based on registry updates.
Create monitoring tools for dataset availability and changes on AWS resources.
Integrate dataset metadata into data catalog platforms for centralized management.
Strengths
Data is provided in a machine-readable NDJSON format for ease of programmatic use.
License is explicitly stated as Apache-2.0, which is permissive for reuse.
Source of the metadata is clearly defined as the GitHub repository for the open-data-registry.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Last update date is unknown; freshness unverified.
Row count is unknown, which may limit suitability assessment for specific applications.
Provenance
Source
Amazon Web Services (derived from the GitHub repository awslabs/open-data-registry)
Collection Method
Transformed from the original registry data for ease of use with machine interfaces.
Time Range
null
Freshness
null
Geography
null
Datasets listed in this registry are available via AWS but are owned by third parties; users must check individual dataset licenses and terms.