Skillzwave Logo
Skillzwave

llm-evaluation

22.1

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

Third-Party Agent Skill: Review the code before installing. Agent skills execute in your AI assistant's environment and can access your files. Learn more about security

Installation for Agentic Skill

View all platforms →
skilz install Microck/ordinary-claude-skills/llm-evaluation
skilz install Microck/ordinary-claude-skills/llm-evaluation --agent opencode
skilz install Microck/ordinary-claude-skills/llm-evaluation --agent codex
skilz install Microck/ordinary-claude-skills/llm-evaluation --agent gemini

First time? Install Skilz: pip install skilz

Works with 22+ AI coding agents

Cursor, Aider, Copilot, Windsurf, Qwen, Kimi, and more...

View All Agents
Download Agent Skill ZIP

Extract and copy to ~/.claude/skills/ then restart Claude Desktop

1. Clone the repository:
git clone https://github.com/Microck/ordinary-claude-skills
2. Copy the agent skill directory:
cp -r ordinary-claude-skills/skills_categorized/machine-learning/llm-evaluation ~/.claude/skills/

Need detailed installation help? Check our platform-specific guides:

Related Agentic Skills

llm-evaluation

by varunisrani

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM per...

22
generalevaluation llm

opencode_cli

by SpillwaveSolutions

This skill should be used when configuring or using the OpenCode CLI for headless LLM automation. Use when the user asks to "configure opencode", "use...

100
generalpatterns skill

sdd

by SpillwaveSolutions

This skill should be used when users want guidance on Spec-Driven Development methodology using GitHub's Spec-Kit. Guide users through executable spec...

100
generalskill use

sdd

by SpillwaveSolutions

This skill should be used when users want guidance on Spec-Driven Development methodology using GitHub's Spec-Kit. Guide users through executable spec...

100
generalskill use

Agentic Skill Details

Type
Non-Technical
Meta-Domain
general
Primary Domain
general
Sub-Domain
evaluation llm
Market Score
22.1

Report Security Issue

Found a security vulnerability in this agent skill?