llm-evaluation
LLM evaluation and testing patterns including prompt testing, hallucination detection, benchmark creation, and quality metrics. Use when testing LLM applications, validating prompt quality, implementing systematic evaluation, or measuring LLM performance.
Third-Party Agent Skill: Review the code before installing. Agent skills execute in your AI assistant's environment and can access your files. Learn more about security
Installation for Agentic Skill
View all platforms →skilz install applied-artificial-intelligence/claude-code-toolkit/llm-evaluation skilz install applied-artificial-intelligence/claude-code-toolkit/llm-evaluation --agent opencode skilz install applied-artificial-intelligence/claude-code-toolkit/llm-evaluation --agent codex skilz install applied-artificial-intelligence/claude-code-toolkit/llm-evaluation --agent gemini
First time? Install Skilz: pip install skilz
Works with 22+ AI coding agents
Cursor, Aider, Copilot, Windsurf, Qwen, Kimi, and more...
Extract and copy to ~/.claude/skills/ then restart Claude Desktop
git clone https://github.com/applied-artificial-intelligence/claude-code-toolkit cp -r claude-code-toolkit/skills/llm-evaluation ~/.claude/skills/ Need detailed installation help? Check our platform-specific guides:
Related Agentic Skills
pytest-config
by atholaStandardized pytest configuration patterns for plugin development. Reducesduplication across parseltongue, pensive, sanctum, and other plugins.Trigger...
shell-testing-framework
by manutejShell script testing expertise using bash test framework patterns from unix-goto, covering test structure (arrange-act-assert), 4 test categories, ass...
markdownlint-custom-rules
by TheBushidoCollectiveCreate custom linting rules for markdownlint including rule structure, parser integration, error reporting, and automatic fixing.
feature-dev
by secondskyAutomate 7-phase feature development with specialized agents (code-explorer, code-architect, code-reviewer). Use for multi-file features, architectura...
Agentic Skill Details
- Repository
- claude-code-toolkit
- Type
- Technical
- Meta-Domain
- development
- Primary Domain
- testing
- Market Score
- 22.1
Browse Category
More development Agentic SkillsReport Security Issue
Found a security vulnerability in this agent skill?