llm-evaluation
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.
Third-Party Agent Skill: Review the code before installing. Agent skills execute in your AI assistant's environment and can access your files. Learn more about security
Installation for Agentic Skill
View all platforms →skilz install Microck/ordinary-claude-skills/llm-evaluation skilz install Microck/ordinary-claude-skills/llm-evaluation --agent opencode skilz install Microck/ordinary-claude-skills/llm-evaluation --agent codex skilz install Microck/ordinary-claude-skills/llm-evaluation --agent gemini
First time? Install Skilz: pip install skilz
Works with 14 AI coding assistants
Cursor, Aider, Copilot, Windsurf, Qwen, Kimi, and more...
Extract and copy to ~/.claude/skills/ then restart Claude Desktop
git clone https://github.com/Microck/ordinary-claude-skills cp -r ordinary-claude-skills/skills_categorized/machine-learning/llm-evaluation ~/.claude/skills/ Need detailed installation help? Check our platform-specific guides:
Related Agentic Skills
llm-evaluation
by varunisraniImplement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM per...
architect-agent
by SpillwaveSolutions"Use this skill ONLY when user explicitly requests: (1) 'write instructions for code agent' or 'create instructions', (2) 'this is a new architect age...
confluence
by SpillwaveSolutionsThis skill should be used when working with Confluence documentation - downloading pages to Markdown, converting between Wiki Markup and Markdown, cre...
design-doc-mermaid
by SpillwaveSolutionsCreate Mermaid diagrams for any purpose - activity diagrams, deployment diagrams, architecture diagrams, or complete design documents. This skill uses...
Agentic Skill Details
- Repository
- ordinary-claude-skills
- Type
- Non-Technical
- Meta-Domain
- general
- Primary Domain
- general
- Sub-Domain
- evaluation llm
- Market Score
- 22.1
Browse Category
More general Agentic SkillsReport Security Issue
Found a security vulnerability in this agent skill?