llm-agentsFeatured

llm-evaluation

5.2k starsUpdated 2025-12-28

Compatible with:claudecodex

Description

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking.

How to Use

Visit the GitHub repository to get the SKILL.md file
Copy the file to your project root or .cursor/rules directory
Restart your AI assistant or editor to apply the new skill

Full Skill Documentation

name

llm-evaluation

description

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

Tags

#llm#evaluation#testing

Related Skills

agent-identifier

This skill should be used when the user asks to create an agent, add an agent, write a subagent, or needs guidance on agent structure, system prompts, triggering conditions.

47.9k

configured-agent

This skill should be used when the user asks about plugin settings, store plugin configuration, user-configurable plugin, .local.md files, or plugin state files.

47.9k

command-name

This skill should be used when the user asks to create a plugin, scaffold a plugin, understand plugin structure, organize plugin components, or set up plugin.json.

47.9k

claude-opus-4-5-migration

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5. Handles model string updates and prompt adjustments.

47.9k

PPTX creation, editing, and analysis

description: "Presentation creation, editing, and analysis. When Claude needs to work with presentations (.pptx files) for: (1) Creating new presentations, (2) Modifying or editing content, (3) Working with layouts, (4) Adding comments or speaker notes, or any other presentation tasks"

31.9k