llm-evaluation
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.
Third-Party Agent Skill: Review the code before installing. Agent skills execute in your AI assistant's environment and can access your files. Learn more about security
Installation for Agentic Skill
View all platforms →skilz install Microck/ordinary-claude-skills/llm-evaluationskilz install Microck/ordinary-claude-skills/llm-evaluation --agent opencodeskilz install Microck/ordinary-claude-skills/llm-evaluation --agent codexskilz install Microck/ordinary-claude-skills/llm-evaluation --agent geminiFirst time? Install Skilz: pip install skilz
Works with 22+ AI coding assistants
Cursor, Aider, Copilot, Windsurf, Qwen, Kimi, and more...
Extract and copy to ~/.claude/skills/ then restart Claude Desktop
git clone https://github.com/Microck/ordinary-claude-skillscp -r ordinary-claude-skills/skills_categorized/machine-learning/llm-evaluation ~/.claude/skills/Need detailed installation help? Check our platform-specific guides:
Related Agentic Skills
llm-evaluation
by varunisrani
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM ...
opencode_cli
by SpillwaveSolutions
This skill should be used when configuring or using the OpenCode CLI for headless LLM automation. Use when the user asks to "configure opencode", "...
sdd
by SpillwaveSolutions
This skill should be used when users want guidance on Spec-Driven Development methodology using GitHub's Spec-Kit. Guide users through executable s...
sdd
by SpillwaveSolutions
This skill should be used when users want guidance on Spec-Driven Development methodology using GitHub's Spec-Kit. Guide users through executable s...
Agentic Skill Details
- Repository
- ordinary-claude-skills
- Type
- Non-Technical
- Meta-Domain
- general
- Primary Domain
- general
- Sub-Domain
- testing evaluation llm
- Market Score
- 22
Browse Category
More general Agentic SkillsReport Security Issue
Found a security vulnerability in this agent skill?
Report Security Issue
Thank you for helping keep SkillzWave secure. We'll review your report and take appropriate action.
Note: For critical security issues that require immediate attention, please also email security@skillzwave.ai directly.