operating-production-services

1 stars 2 forks
17
A

SRE patterns for production service reliability: SLOs, error budgets, postmortems, and incident response.Use when defining reliability targets, writing postmortems, implementing SLO alerting, or establishingon-call practices. NOT for initial service development (use scaffolding skills instead).

Also in: api monitoring

Third-Party Agent Skill: Review the code before installing. Agent skills execute in your AI assistant's environment and can access your files. Learn more about security

Installation for Agentic Skill

View all platforms →
skilz install mjunaidca/mjs-agent-skills/operating-production-services
skilz install mjunaidca/mjs-agent-skills/operating-production-services --agent opencode
skilz install mjunaidca/mjs-agent-skills/operating-production-services --agent codex
skilz install mjunaidca/mjs-agent-skills/operating-production-services --agent gemini

First time? Install Skilz: pip install skilz

Works with 22+ AI coding assistants

Cursor, Aider, Copilot, Windsurf, Qwen, Kimi, and more...

View All Agents
Download Agent Skill ZIP

Extract and copy to ~/.claude/skills/ then restart Claude Desktop

1. Clone the repository:
git clone https://github.com/mjunaidca/mjs-agent-skills
2. Copy the agent skill directory:
cp -r mjs-agent-skills/.claude/skills/operating-production-services ~/.claude/skills/

Need detailed installation help? Check our platform-specific guides:

Related Agentic Skills

Agentic Skill Details

Stars
1
Forks
2
Type
Technical
Meta-Domain
cloud infrastructure
Primary Domain
kubernetes
Market Score
17

Agent Skill Grade

A
Score: 97/100 Click to see breakdown

Score Breakdown

Spec Compliance
12/15
PDA Architecture
27/30
Ease of Use
24/25
Writing Style
10/10
Utility
19/20
Modifiers: +5

Areas to Improve

  • Missing TOC in slo-alerting.md

Recommendations

  • Add trigger phrases to description for discoverability
  • Add table of contents for files over 100 lines

Graded: 2026-01-24

Developer Feedback

Found your operating-production-services skill while browsing the registry—the way you've structured the progressive disclosure for such a dense topic (97/100 for a reason) makes me curious how you'd handle even more edge cases around observability and incident response.

Links:

The TL;DR

You're at 97/100, solidly in A-grade territory. This is based on Anthropic's skill best practices rubric. Your strongest area is Writing Style (10/10)—the skill reads like documentation written by someone who actually runs production systems, not a marketing pamphlet. Weakest spot is Spec Compliance (12/15), mostly because you're leaving discoverability points on the table with trigger phrases.

What's Working Well

  • Blameless postmortem framework - The 5 Whys template and postmortem meeting checklist give Claude concrete structure for handling incidents. That's the kind of thing teams actually need.
  • Token economy is chef's kiss - slo-alerting.md delegates heavy technical details while SKILL.md stays lean. You're not dumping a 200-line reference file on someone; you're layering it thoughtfully.
  • Practical burn rate guidance - The multi-window alerting patterns with specific Prometheus queries and Grafana dashboard structure mean Claude can actually implement this, not just read philosophy.
  • Clear scope boundaries - Your description explicitly calls out SLO alerting and postmortems while noting what you don't cover (deployment strategies, team structure). That's rare and helpful.

The Big One

slo-alerting.md (189 lines) is missing a table of contents. This hurts your navigation score because at 100+ lines, readers need an anchor point. Right now someone has t...

Report Security Issue

Found a security vulnerability in this agent skill?