Name: Onprem Llm Benchmark: Performance Metrics Across 9 GPUs for On-Premise Deployment
Creator: Scherm-AI
Published: 2026-05-29T03:04:09
Keywords: Llm Benchmark, Benchmark, Inference Metrics, Tabular, Gpu Performance, On Premise Deployment

Description

Scherm On-Premise LLM Inference Benchmark v0.5.1 provides performance data for large language models across 9 real GPUs, from the NVIDIA B200 to older consumer cards like the GTX 1080 Ti. The benchmark includes metrics like throughput (tok/s), VRAM usage, and tensor-parallel scaling, measured with a methodology using a seed of 1234, 10 repetitions per point, and input/output lengths of 512 and 256 tokens. It was created by Scherm-AI and last updated on 2026-06-16.

Use Cases

Compare LLM inference throughput across different GPU models based on the benchmarked tok/s metrics.
Estimate VRAM requirements for specific model deployments based on the reported memory usage data.
Evaluate the scaling efficiency of tensor-parallel configurations for on-premise clusters.
Size hardware for on-premise LLM deployments using the provided performance and resource metrics.

Strengths

Benchmarks 9 distinct GPUs spanning 7 years of hardware evolution.
Reports median and interquartile range (IQR) from 10 repetitions per data point.
Covers multiple inference engines (vLLM/AWQ and Ollama/GGUF).

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: Scherm-AI
Collection Method: Benchmarking methodology uses a seed of 1234, 10 repetitions per point, and standardized input/output token lengths.
Freshness: Last updated 2026-06-16 03:50:03; freshness should be verified.

The description notes two different inference engines (vLLM/AWQ and Ollama/GGUF) are used, which may affect cross-engine comparisons.

Tabular Llm Benchmark Benchmark Inference Metrics Gpu Performance On Premise Deployment

Onprem Llm Benchmark: Performance Metrics Across 9 GPUs for On-Premise Deployment

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info