llm-evaluation
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.
Third-Party Agent Skill: Review the code before installing. Agent skills execute in your AI assistant's environment and can access your files. Learn more about security
Installation for Agentic Skill
View all platforms →skilz install varunisrani/skills-claude/llm-evaluation skilz install varunisrani/skills-claude/llm-evaluation --agent opencode skilz install varunisrani/skills-claude/llm-evaluation --agent codex skilz install varunisrani/skills-claude/llm-evaluation --agent gemini
First time? Install Skilz: pip install skilz
Works with 14 AI coding assistants
Cursor, Aider, Copilot, Windsurf, Qwen, Kimi, and more...
Extract and copy to ~/.claude/skills/ then restart Claude Desktop
git clone https://github.com/varunisrani/skills-claude cp -r skills-claude/claude_code_skills/ai-llm/llm-evaluation ~/.claude/skills/ Need detailed installation help? Check our platform-specific guides:
Related Agentic Skills
llm-evaluation
by MicrockImplement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM per...
architect-agent
by SpillwaveSolutions"Use this skill ONLY when user explicitly requests: (1) 'write instructions for code agent' or 'create instructions', (2) 'this is a new architect age...
confluence
by SpillwaveSolutionsThis skill should be used when working with Confluence documentation - downloading pages to Markdown, converting between Wiki Markup and Markdown, cre...
design-doc-mermaid
by SpillwaveSolutionsCreate Mermaid diagrams for any purpose - activity diagrams, deployment diagrams, architecture diagrams, or complete design documents. This skill uses...
Agentic Skill Details
- Owner
- varunisrani (GitHub)
- Repository
- skills-claude
- Type
- Non-Technical
- Meta-Domain
- general
- Primary Domain
- general
- Sub-Domain
- evaluation llm
- Market Score
- 22.1
Browse Category
More general Agentic SkillsReport Security Issue
Found a security vulnerability in this agent skill?