Llama 3.1 70B Instruct

Llama / Meta

36scores
32benchmarks
$0.4 / $0.4 per 1M tokenscost in/out

Metadata

Llama Open source

Aliases: llama-3.1-70b-instruct, meta-llama-llama-3.1-70b-instruct, meta-llama/llama-3.1-70b-instruct

Benchmark Results

Benchmark Category Rank Score Sampled
Clembench Text v3.0 Agentic 20 46.80 2026-05-06
PinchBench Agentic 65 0.32 2026-05-06
OpenUGI Alignment 1060 20.83 2026-05-06
Stick To Your Role! Alignment 7 0.77 2026-05-06
BigCodeBench Coding 21 46.10 2026-05-06
MathTutorBench Education 3 0.5522 2026-05-27
BizFinBench Finance 20 55.09 2026-05-27
INVESTORBENCH Finance 4 38.946% 2026-05-27
MMLU (CoT) General Knowledge 2 0.86 2026-05-06
Open LLM Leaderboard v2 General Knowledge 62 43.41 2026-05-06
HealthBench Hard Healthcare 36 0.29 2026-05-27
HREF Instruction Following 1 48.98 2026-05-06
FACTS Grounding Intelligence 15 0.33 2026-05-06
MuSR Intelligence 491 17.69 2026-05-06
AraGen v3 Language 25 50 2026-05-06
HindiGen v1 Language 13 70.45 2026-05-06
Open Japanese LLM Leaderboard Language 33 66.38 2026-05-06
Open Japanese LLM Leaderboard Language 461 49.97 2026-05-06
MATH Level 5 Math 524 38.07 2026-05-06
GSM-8K (CoT) Mathematics 1 0.95 2026-05-06
MATH (CoT) Mathematics 1 0.68 2026-05-06
Multilingual MGSM (CoT) Mathematics 2 0.87 2026-05-06
BRIDGE Medical Leaderboard Medical 15 50.52 2026-05-27
BRIDGE Medical Leaderboard Medical 119 39.09 2026-05-27
BRIDGE Medical Leaderboard Medical 175 35.1 2026-05-27
MEDIC Benchmark Medical 44 63.03 average normalized public table score 2026-05-27
BenchBench Meta 6 0.93 2026-05-06
LanguageBench Multilingual 19 0.51 2026-05-06
DROP Reasoning 13 0.80 2026-05-06
ThaiSafetyBench Safety 17 24.49% overall ASR 2026-05-28
ChemBench Science 22 0.53 2026-05-06
ChemBench Science 26 0.51 2026-05-06
API-Bank Tool Use 2 0.90 2026-05-06
Gorilla Benchmark API Bench Tool Use 2 0.30 2026-05-06
VNTL Leaderboard Translation 12 69.79 2026-05-06
K-MetBench Weather 33 59.9% accuracy 2026-05-28