evaluating-llms-harness
"Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs."
Third-Party Agent Skill: Review the code before installing. Agent skills execute in your AI assistant's environment and can access your files. Learn more about security
Installation for Agentic Skill
View all platforms →skilz install zechenzhangAGI/AI-research-SKILLs/evaluating-llms-harnessskilz install zechenzhangAGI/AI-research-SKILLs/evaluating-llms-harness --agent opencodeskilz install zechenzhangAGI/AI-research-SKILLs/evaluating-llms-harness --agent codexskilz install zechenzhangAGI/AI-research-SKILLs/evaluating-llms-harness --agent geminiFirst time? Install Skilz: pip install skilz
Works with 22+ AI coding assistants
Cursor, Aider, Copilot, Windsurf, Qwen, Kimi, and more...
Extract and copy to ~/.claude/skills/ then restart Claude Desktop
git clone https://github.com/zechenzhangAGI/AI-research-SKILLscp -r AI-research-SKILLs/11-evaluation/lm-evaluation-harness ~/.claude/skills/Need detailed installation help? Check our platform-specific guides:
Related Agentic Skills
flow-nexus-neural
by ruvnet
Train and deploy neural networks in distributed E2B sandboxes with Flow Nexus
hooks-automation
by ruvnet
Automated coordination, formatting, and learning from Claude Code operations using intelligent hooks with MCP integration. Includes pre/post task h...
ml-pipeline-workflow
by wshobson
Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipeline...
book-sft-pipeline
by muratcankoylan
End-to-end system for creating supervised fine-tuning datasets from books and training style-transfer models. Covers text extraction, intelligent s...
Agentic Skill Details
- Owner
- zechenzhangAGI (GitHub)
- Repository
- AI-research-SKILLs
- Stars
- 62
- Forks
- 2
- Type
- Technical
- Meta-Domain
- data ai
- Primary Domain
- machine learning
- Market Score
- 26
Browse Category
More data ai Agentic SkillsReport Security Issue
Found a security vulnerability in this agent skill?
Report Security Issue
Thank you for helping keep SkillzWave secure. We'll review your report and take appropriate action.
Note: For critical security issues that require immediate attention, please also email security@skillzwave.ai directly.