Skillzwave The Package Manager for Enterprise AI Agents

Search skills... ⌘K

evaluating-llms-harness

62 stars 2 forks

26

"Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs."

Third-Party Agent Skill: Review the code before installing. Agent skills execute in your AI assistant's environment and can access your files. Learn more about security

Installation for Agentic Skill

View all platforms →

Claude Code (CLI) Fast

skilz install zechenzhangAGI/AI-research-SKILLs/evaluating-llms-harness

OpenCode (CLI) Fast

skilz install zechenzhangAGI/AI-research-SKILLs/evaluating-llms-harness --agent opencode

OpenAI Codex (CLI) Native

skilz install zechenzhangAGI/AI-research-SKILLs/evaluating-llms-harness --agent codex

Gemini CLI (Project) Project

skilz install zechenzhangAGI/AI-research-SKILLs/evaluating-llms-harness --agent gemini

First time? Install Skilz: pip install skilz

Works with 22+ AI coding assistants

Cursor, Aider, Copilot, Windsurf, Qwen, Kimi, and more...

View All Agents

For Claude Desktop Easy

Download Agent Skill ZIP

Extract and copy to ~/.claude/skills/ then restart Claude Desktop

Manual Installation

1. Clone the repository:

git clone https://github.com/zechenzhangAGI/AI-research-SKILLs

2. Copy the agent skill directory:

cp -r AI-research-SKILLs/11-evaluation/lm-evaluation-harness ~/.claude/skills/

Need detailed installation help? Check our platform-specific guides:

Claude Desktop Guide Claude Code Guide Troubleshooting

Related Agentic Skills

flow-nexus-neural

by ruvnet

Train and deploy neural networks in distributed E2B sandboxes with Flow Nexus

TECHmachine learning

hooks-automation

by ruvnet

Automated coordination, formatting, and learning from Claude Code operations using intelligent hooks with MCP integration. Includes pre/post task h...

TECHmachine learning

ml-pipeline-workflow

by wshobson

Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipeline...

TECHmachine learning

book-sft-pipeline

by muratcankoylan

End-to-end system for creating supervised fine-tuning datasets from books and training style-transfer models. Covers text extraction, intelligent s...

TECHmachine learning

Agentic Skill Details

Owner: zechenzhangAGI (GitHub)
Repository: AI-research-SKILLs
Stars: 62
Forks: 2
Type: Technical
Meta-Domain: data ai
Primary Domain: machine learning
Market Score: 26

Agentic Skill Grades →

Browse Category

More data ai Agentic Skills

Report Security Issue

Found a security vulnerability in this agent skill?