Skillzwave Logo
Skillzwave

llm-evaluation

22.1

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

Third-Party Agent Skill: Review the code before installing. Agent skills execute in your AI assistant's environment and can access your files. Learn more about security

Installation for Agentic Skill

View all platforms →
skilz install varunisrani/skills-claude/llm-evaluation
skilz install varunisrani/skills-claude/llm-evaluation --agent opencode
skilz install varunisrani/skills-claude/llm-evaluation --agent codex
skilz install varunisrani/skills-claude/llm-evaluation --agent gemini

First time? Install Skilz: pip install skilz

Works with 22+ AI coding agents

Cursor, Aider, Copilot, Windsurf, Qwen, Kimi, and more...

View All Agents
Download Agent Skill ZIP

Extract and copy to ~/.claude/skills/ then restart Claude Desktop

1. Clone the repository:
git clone https://github.com/varunisrani/skills-claude
2. Copy the agent skill directory:
cp -r skills-claude/claude_code_skills/all-skills/llm-evaluation ~/.claude/skills/

Need detailed installation help? Check our platform-specific guides:

Related Agentic Skills

llm-evaluation

by Microck

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM per...

22
generalevaluation llm

opencode_cli

by SpillwaveSolutions

This skill should be used when configuring or using the OpenCode CLI for headless LLM automation. Use when the user asks to "configure opencode", "use...

100
generalpatterns skill

sdd

by SpillwaveSolutions

This skill should be used when users want guidance on Spec-Driven Development methodology using GitHub's Spec-Kit. Guide users through executable spec...

100
generalskill use

sdd

by SpillwaveSolutions

This skill should be used when users want guidance on Spec-Driven Development methodology using GitHub's Spec-Kit. Guide users through executable spec...

100
generalskill use

Agentic Skill Details

Repository
skills-claude
Type
Non-Technical
Meta-Domain
general
Primary Domain
general
Sub-Domain
evaluation llm
Market Score
22.1

Report Security Issue

Found a security vulnerability in this agent skill?