llm-evaluation

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

Third-Party Agent Skill: Review the code before installing. Agent skills execute in your AI assistant's environment and can access your files. Learn more about security

Installation for Agentic Skill

View all platforms →

Claude Code (CLI) Fast

skilz install varunisrani/skills-claude/llm-evaluation

OpenCode (CLI) Fast

skilz install varunisrani/skills-claude/llm-evaluation --agent opencode

OpenAI Codex (CLI) Native

skilz install varunisrani/skills-claude/llm-evaluation --agent codex

Gemini CLI (Project) Project

skilz install varunisrani/skills-claude/llm-evaluation --agent gemini

First time? Install Skilz: pip install skilz

Works with 22+ AI coding assistants

Cursor, Aider, Copilot, Windsurf, Qwen, Kimi, and more...

View All Agents

For Claude Desktop Easy

Download Agent Skill ZIP

Extract and copy to ~/.claude/skills/ then restart Claude Desktop

Manual Installation

1. Clone the repository:

git clone https://github.com/varunisrani/skills-claude

2. Copy the agent skill directory:

cp -r skills-claude/claude_code_skills/all-skills/llm-evaluation ~/.claude/skills/

View on GitHub

Need detailed installation help? Check our platform-specific guides:

Claude Desktop Guide Claude Code Guide Troubleshooting

Related Agentic Skills

llm-evaluation

by Microck

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM ...

general

Marketplace

opencode_cli

by SpillwaveSolutions

This skill should be used when configuring or using the OpenCode CLI for headless LLM automation. Use when the user asks to "configure opencode", "...

100

general

Marketplace

sdd

by SpillwaveSolutions

This skill should be used when users want guidance on Spec-Driven Development methodology using GitHub's Spec-Kit. Guide users through executable s...

100

general

Marketplace

sdd

by SpillwaveSolutions

This skill should be used when users want guidance on Spec-Driven Development methodology using GitHub's Spec-Kit. Guide users through executable s...

100

general

Marketplace

Agentic Skill Details

Owner: varunisrani (GitHub)
Repository: skills-claude
Type: Non-Technical
Meta-Domain: general
Primary Domain: general
Sub-Domain: testing evaluation llm
Market Score: 22

Agentic Skill Grades →

Browse Category

More general Agentic Skills

Report Security Issue

Found a security vulnerability in this agent skill?