Llama 3.3 70B Instruct
Llama / Meta
54scores
44benchmarks
$0 / $0 per 1M tokenscost in/out
Metadata
Llama Open source
Aliases: llama-3.3-70b-instruct, llama-3.3-70b-instruct:free, meta-llama-llama-3.3-70b-instruct, meta-llama/llama-3.3-70b-instruct, meta-llama/llama-3.3-70b-instruct:free
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| Berkeley Function-Calling Leaderboard | Agentic | 62 | 31.9% | 2026-05-27 |
| Clembench Text v3.0 | Agentic | 18 | 50 | 2026-05-06 |
| Galileo Agent Leaderboard | Agentic | 19 | 0.20 | 2026-05-06 |
| OpenUGI | Alignment | 459 | 37.89 | 2026-05-06 |
| Stick To Your Role! | Alignment | 4 | 0.78 | 2026-05-06 |
| BigCodeBench | Coding | 16 | 46.90 | 2026-05-06 |
| BigCodeBench-Hard | Coding | 19 | 28.40 | 2026-05-05 |
| Natural Language to Mongosh | Coding | 45 | 0.82 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 46 | 0.82 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 67 | 0.80 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 76 | 0.79 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 80 | 0.78 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 92 | 0.76 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 99 | 0.73 | 2026-05-06 |
| NeoEvalPlusN | Creative | 45 | 16.75 | 2026-05-06 |
| OrgForge-IT | Cybersecurity | 10 | 0.800 | 2026-05-28 |
| MMTU | Data | 18 | 0.45 | 2026-05-06 |
| Fin-RATE | Finance | 9 | 16.76% | 2026-05-28 |
| SECQUE | Finance | 2 | 0.65 | 2026-05-28 |
| Open LLM Leaderboard v2 | General Knowledge | 40 | 44.85 | 2026-05-06 |
| WeirdML | Generalization | 28 | 14.44 | 2026-05-06 |
| HealthBench Hard | Healthcare | 21 | 0.4 | 2026-05-27 |
| HUMAINE | Human Preference | 37 | 3.37 | 2026-05-06 |
| FACTS Grounding | Intelligence | 3 | 0.43 | 2026-05-06 |
| MuSR | Intelligence | 754 | 15.57 | 2026-05-06 |
| AraGen v3 | Language | 19 | 52.12 | 2026-05-06 |
| Open Arabic LLM Leaderboard | Language | 10 | 74.47 | 2026-05-06 |
| Open Arabic LLM Leaderboard | Language | 36 | 69.31 | 2026-05-06 |
| Open Japanese LLM Leaderboard | Language | 184 | 61.45 | 2026-05-06 |
| Open Japanese LLM Leaderboard | Language | 619 | 37.49 | 2026-05-06 |
| Open Portuguese LLM Leaderboard | Language | 158 | 82.92 | 2026-05-06 |
| J1-ENVS | Legal | 5 | 54.78 | 2026-05-26 |
| MATH Level 5 | Math | 191 | 48.34 | 2026-05-06 |
| OTIS Mock AIME 2024-2025 | Mathematics | 34 | 5.14 | 2026-05-06 |
| BRIDGE Medical Leaderboard | Medical | 19 | 49.49 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 104 | 39.86 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 155 | 36.83 | 2026-05-27 |
| MEDIC Benchmark | Medical | 29 | 68.4 average normalized public table score | 2026-05-27 |
| Medmarks | Medical | 26 | 0.5363568636656147 | 2026-05-27 |
| LanguageBench | Multilingual | 14 | 0.53 | 2026-05-06 |
| Balrog | Reasoning | 9 | 23 | 2026-05-06 |
| LingOly-TOO | Reasoning | 15 | 0.08 | 2026-05-06 |
| SimpleBench | Reasoning | 24 | 19.90 | 2026-05-06 |
| AgentLeak | Safety | 4 | 89.90 | 2026-05-06 |
| LiveSecBench | Safety | 28 | 38.89 | 2026-05-27 |
| ThaiSafetyBench | Safety | 12 | 16.87% overall ASR | 2026-05-28 |
| X-Risks Leaderboard | Safety | 5 | 17.83 | 2026-05-06 |
| SciPredict | Science | 7 | 18.19 | 2026-05-06 |
| Defects4J | Software Engineering | 28 | 0.234 | 2026-05-27 |
| RepairBench | Software Engineering | 28 | 0.224 | 2026-05-27 |
| SWE-PRBench | Software Engineering | 8 | 0.079 | 2026-05-27 |
| LiveSQLBench | Text to SQL | 29 | 15.86 | 2026-05-06 |
| BFCL v2 | Tool Use | 1 | 0.77 | 2026-05-06 |
| VNTL Leaderboard | Translation | 22 | 68.81 | 2026-05-06 |
No matching rows.