Llama 3.1 8B Instruct
Llama / Meta
47scores
41benchmarks
$0.02 / $0.05 per 1M tokenscost in/out
Metadata
Llama Open source
Aliases: llama-3.1-8b-instruct, meta-llama-llama-3.1-8b-instruct, meta-llama/llama-3.1-8b-instruct
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| Berkeley Function-Calling Leaderboard | Agentic | 85 | 25.83% | 2026-05-27 |
| Clembench Text v3.0 | Agentic | 27 | 25.28 | 2026-05-06 |
| UAVBench | Agentic | 25 | 65.30 | 2026-05-06 |
| OpenUGI | Alignment | 1208 | 6.46 | 2026-05-06 |
| Stick To Your Role! | Alignment | 17 | 0.62 | 2026-05-06 |
| BigCodeBench | Coding | 96 | 32.80 | 2026-05-06 |
| RedSage-Bench | Cybersecurity | 13 | 77.05% | 2026-05-28 |
| MathTutorBench | Education | 4 | 0.4700 | 2026-05-27 |
| BizFinBench | Finance | 23 | 48.95 | 2026-05-27 |
| INVESTORBENCH | Finance | 10 | 25.463% | 2026-05-27 |
| Open FinLLM Leaderboard | Finance | 13 | 22.720855% | 2026-05-27 |
| MMLU (CoT) | General Knowledge | 3 | 0.73 | 2026-05-06 |
| Open LLM Leaderboard v2 | General Knowledge | 1905 | 23.76 | 2026-05-06 |
| HealthBench Hard | Healthcare | 26 | 0.35 | 2026-05-27 |
| FACTS Grounding | Intelligence | 29 | 0.17 | 2026-05-06 |
| MuSR | Intelligence | 2662 | 8.61 | 2026-05-06 |
| AraGen v3 | Language | 48 | 24.94 | 2026-05-06 |
| Open Arabic LLM Leaderboard | Language | 95 | 55.41 | 2026-05-06 |
| Open Japanese LLM Leaderboard | Language | 386 | 53.19 | 2026-05-06 |
| Open Japanese LLM Leaderboard | Language | 749 | 23.03 | 2026-05-06 |
| Open Portuguese LLM Leaderboard | Language | 156 | 82.95 | 2026-05-06 |
| Ukrainian LLM Leaderboard | Language | 11 | 9.12 | 2026-05-06 |
| MATH Level 5 | Math | 1680 | 15.56 | 2026-05-06 |
| GSM-8K (CoT) | Mathematics | 2 | 0.84 | 2026-05-06 |
| GSM8K | Mathematics | 20 | 84.50 | 2026-05-06 |
| MATH (CoT) | Mathematics | 6 | 0.52 | 2026-05-06 |
| Multilingual MGSM (CoT) | Mathematics | 3 | 0.69 | 2026-05-06 |
| BRIDGE Medical Leaderboard | Medical | 62 | 43.54 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 233 | 29.4 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 237 | 28.98 | 2026-05-27 |
| MEDIC Benchmark | Medical | 43 | 63.15 average normalized public table score | 2026-05-27 |
| Medmarks | Medical | 61 | 0.39674525595082544 | 2026-05-27 |
| BenchBench | Meta | 48 | 0.61 | 2026-05-06 |
| INCLUDE-base-44 European Languages | Multilingual | 11 | 0.55 | 2026-05-06 |
| LatamBoard | Multilingual | 21 | 62.77 | 2026-05-06 |
| DROP | Reasoning | 23 | 0.59 | 2026-05-06 |
| ThaiSafetyBench | Safety | 20 | 28.24% overall ASR | 2026-05-28 |
| ChemBench | Science | 33 | 0.47 | 2026-05-06 |
| ChemBench | Science | 36 | 0.46 | 2026-05-06 |
| JSONSchemaBench | Structured Output | 8 | 91.1% schema compliance | 2026-05-28 |
| JSONSchemaBench | Structured Output | 20 | 76% schema compliance | 2026-05-28 |
| JSONSchemaBench | Structured Output | 35 | 42.2% schema compliance | 2026-05-28 |
| StructEval | Structured Output | 8 | 61.77% | 2026-05-28 |
| Generate README Eval | Summarization | 14 | 24.43 | 2026-05-06 |
| API-Bank | Tool Use | 3 | 0.83 | 2026-05-06 |
| Gorilla Benchmark API Bench | Tool Use | 3 | 0.08 | 2026-05-06 |
| K-MetBench | Weather | 50 | 41.8% accuracy | 2026-05-28 |
No matching rows.