llm-evaluation

Name: llm-evaluation
Rating: 1.1 (1 reviews)
Author: applied-artificial-intelligence

22.1

LLM evaluation and testing patterns including prompt testing, hallucination detection, benchmark creation, and quality metrics. Use when testing LLM applications, validating prompt quality, implementing systematic evaluation, or measuring LLM performance.

Also in: monitoring

Third-Party Agent Skill: Review the code before installing. Agent skills execute in your AI assistant's environment and can access your files. Learn more about security

Installation for Agentic Skill

View all platforms →

Claude Code (CLI) Fast

skilz install applied-artificial-intelligence/claude-code-toolkit/llm-evaluation

OpenCode (CLI) Fast

skilz install applied-artificial-intelligence/claude-code-toolkit/llm-evaluation --agent opencode

OpenAI Codex (CLI) Native

skilz install applied-artificial-intelligence/claude-code-toolkit/llm-evaluation --agent codex

Gemini CLI (Project) Project

skilz install applied-artificial-intelligence/claude-code-toolkit/llm-evaluation --agent gemini

First time? Install Skilz: pip install skilz

Works with 22+ AI coding agents

Cursor, Aider, Copilot, Windsurf, Qwen, Kimi, and more...

View All Agents

For Claude Desktop Easy

Download Agent Skill ZIP

Extract and copy to ~/.claude/skills/ then restart Claude Desktop

Manual Installation

1. Clone the repository:

git clone https://github.com/applied-artificial-intelligence/claude-code-toolkit

2. Copy the agent skill directory:

cp -r claude-code-toolkit/skills/llm-evaluation ~/.claude/skills/

View on GitHub

Need detailed installation help? Check our platform-specific guides:

Claude Desktop Guide Claude Code Guide Troubleshooting

Related Agentic Skills

pytest-config

by athola

Standardized pytest configuration patterns for plugin development. Reducesduplication across parseltongue, pensive, sanctum, and other plugins.Trigger...

TECHtesting

Marketplace

shell-testing-framework

by manutej

Shell script testing expertise using bash test framework patterns from unix-goto, covering test structure (arrange-act-assert), 4 test categories, ass...

TECHtesting

Marketplace

+linux

markdownlint-custom-rules

by TheBushidoCollective

Create custom linting rules for markdownlint including rule structure, parser integration, error reporting, and automatic fixing.

TECHtesting

Marketplace

feature-dev

by secondsky

Automate 7-phase feature development with specialized agents (code-explorer, code-architect, code-reviewer). Use for multi-file features, architectura...

TECHtesting

Marketplace

Agentic Skill Details

Owner: applied-artificial-intelligence (GitHub)
Repository: claude-code-toolkit
Type: Technical
Meta-Domain: development
Primary Domain: testing
Market Score: 22.1

Agentic Skill Grades →

Browse Category

More development Agentic Skills

Report Security Issue

Found a security vulnerability in this agent skill?