llm-evaluation
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.
Third-Party Agent Skill: Review the code before installing. Agent skills execute in your AI assistant's environment and can access your files. Learn more about security
Installation for Agentic Skill
View all platforms →skilz install Microck/ordinary-claude-skills/llm-evaluation skilz install Microck/ordinary-claude-skills/llm-evaluation --agent opencode skilz install Microck/ordinary-claude-skills/llm-evaluation --agent codex skilz install Microck/ordinary-claude-skills/llm-evaluation --agent gemini
First time? Install Skilz: pip install skilz
Works with 22+ AI coding agents
Cursor, Aider, Copilot, Windsurf, Qwen, Kimi, and more...
Extract and copy to ~/.claude/skills/ then restart Claude Desktop
git clone https://github.com/Microck/ordinary-claude-skills cp -r ordinary-claude-skills/skills_categorized/machine-learning/llm-evaluation ~/.claude/skills/ Need detailed installation help? Check our platform-specific guides:
Related Agentic Skills
llm-evaluation
by varunisraniImplement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM per...
opencode_cli
by SpillwaveSolutionsThis skill should be used when configuring or using the OpenCode CLI for headless LLM automation. Use when the user asks to "configure opencode", "use...
sdd
by SpillwaveSolutionsThis skill should be used when users want guidance on Spec-Driven Development methodology using GitHub's Spec-Kit. Guide users through executable spec...
sdd
by SpillwaveSolutionsThis skill should be used when users want guidance on Spec-Driven Development methodology using GitHub's Spec-Kit. Guide users through executable spec...
Agentic Skill Details
- Repository
- ordinary-claude-skills
- Type
- Non-Technical
- Meta-Domain
- general
- Primary Domain
- general
- Sub-Domain
- evaluation llm
- Market Score
- 22.1
Browse Category
More general Agentic SkillsReport Security Issue
Found a security vulnerability in this agent skill?