Llama 3.1 8B Instruct

Llama / Meta

47scores
41benchmarks
$0.02 / $0.05 per 1M tokenscost in/out

Metadata

Llama Open source

Aliases: llama-3.1-8b-instruct, meta-llama-llama-3.1-8b-instruct, meta-llama/llama-3.1-8b-instruct

Benchmark Results

Benchmark Category Rank Score Sampled
Berkeley Function-Calling Leaderboard Agentic 85 25.83% 2026-05-27
Clembench Text v3.0 Agentic 27 25.28 2026-05-06
UAVBench Agentic 25 65.30 2026-05-06
OpenUGI Alignment 1208 6.46 2026-05-06
Stick To Your Role! Alignment 17 0.62 2026-05-06
BigCodeBench Coding 96 32.80 2026-05-06
RedSage-Bench Cybersecurity 13 77.05% 2026-05-28
MathTutorBench Education 4 0.4700 2026-05-27
BizFinBench Finance 23 48.95 2026-05-27
INVESTORBENCH Finance 10 25.463% 2026-05-27
Open FinLLM Leaderboard Finance 13 22.720855% 2026-05-27
MMLU (CoT) General Knowledge 3 0.73 2026-05-06
Open LLM Leaderboard v2 General Knowledge 1905 23.76 2026-05-06
HealthBench Hard Healthcare 26 0.35 2026-05-27
FACTS Grounding Intelligence 29 0.17 2026-05-06
MuSR Intelligence 2662 8.61 2026-05-06
AraGen v3 Language 48 24.94 2026-05-06
Open Arabic LLM Leaderboard Language 95 55.41 2026-05-06
Open Japanese LLM Leaderboard Language 386 53.19 2026-05-06
Open Japanese LLM Leaderboard Language 749 23.03 2026-05-06
Open Portuguese LLM Leaderboard Language 156 82.95 2026-05-06
Ukrainian LLM Leaderboard Language 11 9.12 2026-05-06
MATH Level 5 Math 1680 15.56 2026-05-06
GSM-8K (CoT) Mathematics 2 0.84 2026-05-06
GSM8K Mathematics 20 84.50 2026-05-06
MATH (CoT) Mathematics 6 0.52 2026-05-06
Multilingual MGSM (CoT) Mathematics 3 0.69 2026-05-06
BRIDGE Medical Leaderboard Medical 62 43.54 2026-05-27
BRIDGE Medical Leaderboard Medical 233 29.4 2026-05-27
BRIDGE Medical Leaderboard Medical 237 28.98 2026-05-27
MEDIC Benchmark Medical 43 63.15 average normalized public table score 2026-05-27
Medmarks Medical 61 0.39674525595082544 2026-05-27
BenchBench Meta 48 0.61 2026-05-06
INCLUDE-base-44 European Languages Multilingual 11 0.55 2026-05-06
LatamBoard Multilingual 21 62.77 2026-05-06
DROP Reasoning 23 0.59 2026-05-06
ThaiSafetyBench Safety 20 28.24% overall ASR 2026-05-28
ChemBench Science 33 0.47 2026-05-06
ChemBench Science 36 0.46 2026-05-06
JSONSchemaBench Structured Output 8 91.1% schema compliance 2026-05-28
JSONSchemaBench Structured Output 20 76% schema compliance 2026-05-28
JSONSchemaBench Structured Output 35 42.2% schema compliance 2026-05-28
StructEval Structured Output 8 61.77% 2026-05-28
Generate README Eval Summarization 14 24.43 2026-05-06
API-Bank Tool Use 3 0.83 2026-05-06
Gorilla Benchmark API Bench Tool Use 3 0.08 2026-05-06
K-MetBench Weather 50 41.8% accuracy 2026-05-28