Llama 3.3 70B Instruct

Llama / Meta

54scores
44benchmarks
$0 / $0 per 1M tokenscost in/out

Metadata

Llama Open source

Aliases: llama-3.3-70b-instruct, llama-3.3-70b-instruct:free, meta-llama-llama-3.3-70b-instruct, meta-llama/llama-3.3-70b-instruct, meta-llama/llama-3.3-70b-instruct:free

Benchmark Results

Benchmark Category Rank Score Sampled
Berkeley Function-Calling Leaderboard Agentic 62 31.9% 2026-05-27
Clembench Text v3.0 Agentic 18 50 2026-05-06
Galileo Agent Leaderboard Agentic 19 0.20 2026-05-06
OpenUGI Alignment 459 37.89 2026-05-06
Stick To Your Role! Alignment 4 0.78 2026-05-06
BigCodeBench Coding 16 46.90 2026-05-06
BigCodeBench-Hard Coding 19 28.40 2026-05-05
Natural Language to Mongosh Coding 45 0.82 2026-05-06
Natural Language to Mongosh Coding 46 0.82 2026-05-06
Natural Language to Mongosh Coding 67 0.80 2026-05-06
Natural Language to Mongosh Coding 76 0.79 2026-05-06
Natural Language to Mongosh Coding 80 0.78 2026-05-06
Natural Language to Mongosh Coding 92 0.76 2026-05-06
Natural Language to Mongosh Coding 99 0.73 2026-05-06
NeoEvalPlusN Creative 45 16.75 2026-05-06
OrgForge-IT Cybersecurity 10 0.800 2026-05-28
MMTU Data 18 0.45 2026-05-06
Fin-RATE Finance 9 16.76% 2026-05-28
SECQUE Finance 2 0.65 2026-05-28
Open LLM Leaderboard v2 General Knowledge 40 44.85 2026-05-06
WeirdML Generalization 28 14.44 2026-05-06
HealthBench Hard Healthcare 21 0.4 2026-05-27
HUMAINE Human Preference 37 3.37 2026-05-06
FACTS Grounding Intelligence 3 0.43 2026-05-06
MuSR Intelligence 754 15.57 2026-05-06
AraGen v3 Language 19 52.12 2026-05-06
Open Arabic LLM Leaderboard Language 10 74.47 2026-05-06
Open Arabic LLM Leaderboard Language 36 69.31 2026-05-06
Open Japanese LLM Leaderboard Language 184 61.45 2026-05-06
Open Japanese LLM Leaderboard Language 619 37.49 2026-05-06
Open Portuguese LLM Leaderboard Language 158 82.92 2026-05-06
J1-ENVS Legal 5 54.78 2026-05-26
MATH Level 5 Math 191 48.34 2026-05-06
OTIS Mock AIME 2024-2025 Mathematics 34 5.14 2026-05-06
BRIDGE Medical Leaderboard Medical 19 49.49 2026-05-27
BRIDGE Medical Leaderboard Medical 104 39.86 2026-05-27
BRIDGE Medical Leaderboard Medical 155 36.83 2026-05-27
MEDIC Benchmark Medical 29 68.4 average normalized public table score 2026-05-27
Medmarks Medical 26 0.5363568636656147 2026-05-27
LanguageBench Multilingual 14 0.53 2026-05-06
Balrog Reasoning 9 23 2026-05-06
LingOly-TOO Reasoning 15 0.08 2026-05-06
SimpleBench Reasoning 24 19.90 2026-05-06
AgentLeak Safety 4 89.90 2026-05-06
LiveSecBench Safety 28 38.89 2026-05-27
ThaiSafetyBench Safety 12 16.87% overall ASR 2026-05-28
X-Risks Leaderboard Safety 5 17.83 2026-05-06
SciPredict Science 7 18.19 2026-05-06
Defects4J Software Engineering 28 0.234 2026-05-27
RepairBench Software Engineering 28 0.224 2026-05-27
SWE-PRBench Software Engineering 8 0.079 2026-05-27
LiveSQLBench Text to SQL 29 15.86 2026-05-06
BFCL v2 Tool Use 1 0.77 2026-05-06
VNTL Leaderboard Translation 22 68.81 2026-05-06