Agentic Skills by zechenzhangAGI
deepspeed
by zechenzhangAGIExpert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention
blip-2-vision-language
by zechenzhangAGIVision-language pre-training framework bridging frozen image encoders and LLMs. Use when you need image captioning, visual question answering, image-t...
instructor
by zechenzhangAGIExtract structured data from LLM responses with Pydantic validation, retry failed extractions automatically, parse complex JSON with type safety, and ...
outlines
by zechenzhangAGIGuarantee valid JSON/XML/code structure during generation, use Pydantic models for type-safe outputs, support local models (Transformers, vLLM), and m...
mlflow
by zechenzhangAGITrack ML experiments, manage model registry with versioning, deploy models to production, and reproduce experiments with MLflow - framework-agnostic M...
long-context
by zechenzhangAGIExtend context windows of transformer models using RoPE, YaRN, ALiBi, and position interpolation techniques. Use when processing long documents (32k-1...
dspy
by zechenzhangAGIBuild complex AI systems with declarative programming, optimize prompts automatically, create modular RAG systems and agents with DSPy - Stanford NLP'...
tensorboard
by zechenzhangAGIVisualize training metrics, debug models with histograms, compare experiments, visualize model graphs, and profile performance with TensorBoard - Goog...
huggingface-tokenizers
by zechenzhangAGIFast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds. Supports BPE, WordPiece, and Unigram al...
moe-training
by zechenzhangAGITrain Mixture of Experts (MoE) models using DeepSpeed or HuggingFace. Use when training large-scale models with limited compute (5× cost reduction vs ...
audiocraft-audio-generation
by zechenzhangAGIPyTorch library for audio generation including text-to-music (MusicGen) and text-to-sound (AudioGen). Use when you need to generate music from text de...
guidance
by zechenzhangAGIControl LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with ...
pyvene-interventions
by zechenzhangAGIProvides guidance for performing causal interventions on PyTorch models using pyvene's declarative intervention framework. Use when conducting causal ...
quantizing-models-bitsandbytes
by zechenzhangAGI"Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is limited, need to fit larger models, o...
chroma
by zechenzhangAGI"Open-source embedding database for AI applications. Store embeddings and metadata, perform vector and full-text search, filter by metadata. Simple 4-...
constitutional-ai
by zechenzhangAGI"Anthropic's method for training harmless AI through self-improvement. Two-phase approach: supervised learning with self-critique/revision, then RLAIF...
optimizing-attention-flash
by zechenzhangAGI"Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long s...
grpo-rl-training
by zechenzhangAGI"Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training"
langchain
by zechenzhangAGI"Framework for building LLM-powered applications with agents, chains, and RAG. Supports multiple providers (OpenAI, Anthropic, Google), 500+ integrati...
llama-cpp
by zechenzhangAGI"Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when ...
llamaindex
by zechenzhangAGI"Data framework for building LLM applications with RAG. Specializes in document ingestion (300+ connectors), indexing, and querying. Features vector i...
mamba-architecture
by zechenzhangAGI"State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV cache. Selective SSM with hardware...
nanogpt
by zechenzhangAGI"Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for learning transformers. By Andrej Karpa...
nemo-curator
by zechenzhangAGI"GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features: fuzzy deduplication (16× faster), quality filtering (30+ h...
nemo-guardrails
by zechenzhangAGI"NVIDIA's runtime safety framework for LLM applications. Features: jailbreak detection, input/output validation, fact-checking, hallucination detectio...
pinecone
by zechenzhangAGI"Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and nam...
pytorch-fsdp
by zechenzhangAGI"Expert guidance for Fully Sharded Data Parallel training with PyTorch FSDP - parameter sharding, mixed precision, CPU offloading, FSDP2"
sentence-transformers
by zechenzhangAGI"Framework for state-of-the-art sentence, text, and image embeddings. Provides 5000+ pre-trained models for semantic similarity, clustering, and retri...
sentencepiece
by zechenzhangAGI"Language-independent tokenizer treating text as raw Unicode. Supports BPE and Unigram algorithms. Fast (50k sentences/sec), lightweight (6MB memory),...
serving-llms-vllm
by zechenzhangAGI"Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference la...
crewai-multi-agent
by zechenzhangAGIMulti-agent orchestration framework for autonomous AI collaboration. Use when building teams of specialized agents working together on complex tasks, ...
evaluating-llms-harness
by zechenzhangAGIEvaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, ...
nnsight-remote-interpretability
by zechenzhangAGIProvides guidance for interpreting and manipulating neural network internals using nnsight with optional NDIF remote execution. Use when needing to ru...
fine-tuning-with-trl
by zechenzhangAGIFine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and...
sparse-autoencoder-training
by zechenzhangAGIProvides guidance for training and analyzing Sparse Autoencoders (SAEs) using SAELens to decompose neural network activations into interpretable featu...
speculative-decoding
by zechenzhangAGIAccelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1....
implementing-llms-litgpt
by zechenzhangAGIImplements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model...
weights-and-biases
by zechenzhangAGITrack ML experiments with automatic logging, visualize training in real-time, optimize hyperparameters with sweeps, and manage model registry with W&B...
model-merging
by zechenzhangAGIMerge multiple fine-tuned models using mergekit to combine capabilities without retraining. Use when creating specialized models by blending domain-sp...
transformer-lens-interpretability
by zechenzhangAGIProvides guidance for mechanistic interpretability research using TransformerLens to inspect and manipulate transformer internals via HookPoints and a...
qdrant-vector-search
by zechenzhangAGIHigh-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor...
segment-anything-model
by zechenzhangAGIFoundation model for image segmentation with zero-shot transfer. Use when you need to segment any object in images using points, boxes, or masks as pr...
evaluating-code-models
by zechenzhangAGIEvaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, compari...
gptq
by zechenzhangAGIPost-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on consumer GPUs, when you need 4× me...
stable-diffusion-image-generation
by zechenzhangAGIState-of-the-art text-to-image generation with Stable Diffusion models via HuggingFace Diffusers. Use when generating images from text prompts, perfor...
sglang
by zechenzhangAGIFast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflow...
training-llms-megatron
by zechenzhangAGITrains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parame...
model-pruning
by zechenzhangAGIReduce LLM size and accelerate inference using pruning techniques like Wanda and SparseGPT. Use when compressing models without retraining, achieving ...
peft-fine-tuning
by zechenzhangAGIParameter-efficient fine-tuning for LLMs using LoRA, QLoRA, and 25+ methods. Use when fine-tuning large models (7B-70B) with limited GPU memory, when ...
lambda-labs-gpu-cloud
by zechenzhangAGIReserved and on-demand GPU cloud instances for ML training and inference. Use when you need dedicated GPU instances with simple SSH access, persistent...
knowledge-distillation
by zechenzhangAGICompress large language models using knowledge distillation from teacher to student models. Use when deploying smaller models with retained performanc...
hqq-quantization
by zechenzhangAGIHalf-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision without needing calibration datasets,...
phoenix-observability
by zechenzhangAGIOpen-source AI observability platform for LLM tracing, evaluation, and monitoring. Use when debugging LLM applications with detailed traces, running e...
openrlhf-training
by zechenzhangAGIHigh-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-...
pytorch-lightning
by zechenzhangAGIHigh-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scale...
gguf-quantization
by zechenzhangAGIGGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing...
huggingface-accelerate
by zechenzhangAGISimplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic de...
langsmith-observability
by zechenzhangAGILLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, mo...
skypilot-multi-cloud-orchestration
by zechenzhangAGIMulti-cloud orchestration for ML workloads with automatic cost optimization. Use when you need to run training or batch jobs across multiple clouds, l...
autogpt-agents
by zechenzhangAGIAutonomous AI agent platform for building and deploying continuous agents. Use when creating visual workflow agents, deploying persistent autonomous a...
ray-train
by zechenzhangAGIDistributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of nodes. Built-in hyperparameter tunin...
rwkv-architecture
by zechenzhangAGIRNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel), infer like RNN (sequential). Linux ...
modal-serverless-gpu
by zechenzhangAGIServerless GPU cloud platform for running ML workloads. Use when you need on-demand GPU access without infrastructure management, deploying ML models ...
awq-quantization
by zechenzhangAGIActivation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on ...
axolotl
by zechenzhangAGIExpert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support
simpo-training
by zechenzhangAGISimple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No refere...
ray-data
by zechenzhangAGIScalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, T...
llama-factory
by zechenzhangAGIExpert guidance for fine-tuning LLMs with LLaMA-Factory - WebUI no-code, 100+ models, 2/3/4/5/6/8-bit QLoRA, multimodal support
tensorrt-llm
by zechenzhangAGIOptimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when...
llava
by zechenzhangAGILarge Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA l...
llamaguard
by zechenzhangAGIMeta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, sel...
whisper
by zechenzhangAGIOpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six mode...
clip
by zechenzhangAGIOpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M...
unsloth
by zechenzhangAGIExpert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization
faiss
by zechenzhangAGIFacebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index ...
Discover More Agentic Skills
Browse our complete catalog of AI agent skills from developers worldwide.