Skillzwave Logo
Skillzwave

llm-evaluation

22.1

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

Third-Party Agent Skill: Review the code before installing. Agent skills execute in your AI assistant's environment and can access your files. Learn more about security

Installation for Agentic Skill

View all platforms →
skilz install varunisrani/skills-claude/llm-evaluation
skilz install varunisrani/skills-claude/llm-evaluation --agent opencode
skilz install varunisrani/skills-claude/llm-evaluation --agent codex
skilz install varunisrani/skills-claude/llm-evaluation --agent gemini

First time? Install Skilz: pip install skilz

Works with 14 AI coding assistants

Cursor, Aider, Copilot, Windsurf, Qwen, Kimi, and more...

View All Agents
Download Agent Skill ZIP

Extract and copy to ~/.claude/skills/ then restart Claude Desktop

1. Clone the repository:
git clone https://github.com/varunisrani/skills-claude
2. Copy the agent skill directory:
cp -r skills-claude/claude_code_skills/ai-llm/llm-evaluation ~/.claude/skills/

Need detailed installation help? Check our platform-specific guides:

Related Agentic Skills

llm-evaluation

by Microck

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM per...

22
generalevaluation llm

architect-agent

by SpillwaveSolutions

"Use this skill ONLY when user explicitly requests: (1) 'write instructions for code agent' or 'create instructions', (2) 'this is a new architect age...

100
generalschema query
Agents

confluence

by SpillwaveSolutions

This skill should be used when working with Confluence documentation - downloading pages to Markdown, converting between Wiki Markup and Markdown, cre...

100
generaldocumentation skill

design-doc-mermaid

by SpillwaveSolutions

Create Mermaid diagrams for any purpose - activity diagrams, deployment diagrams, architecture diagrams, or complete design documents. This skill uses...

100
generalreact code

Agentic Skill Details

Repository
skills-claude
Type
Non-Technical
Meta-Domain
general
Primary Domain
general
Sub-Domain
evaluation llm
Market Score
22.1

Report Security Issue

Found a security vulnerability in this agent skill?