Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Scherm On-Premise LLM Inference Benchmark v0.5.1 provides performance data for large language models across 9 real GPUs, from the NVIDIA B200 to older consumer cards like the GTX 1080 Ti. The benchmark includes metrics like throughput (tok/s), VRAM usage, and tensor-parallel scaling, measured with a methodology using a seed of 1234, 10 repetitions per point, and input/output lengths of 512 and 256 tokens. It was created by Scherm-AI and last updated on 2026-06-16.
The description notes two different inference engines (vLLM/AWQ and Ollama/GGUF) are used, which may affect cross-engine comparisons.