Skillzwave

tensorrt-llm

422 stars 30 forks Updated Dec 17, 2025
34.3

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

Marketplace
#NVIDIA GPUs#GPU#GPUs#throughput#latency
Also in: github api machine learning

Third-Party Skill: Review the code before installing. Skills execute in your AI assistant's environment and can access your files. Learn more about security

skilz install zechenzhangAGI_AI-research-SKILLs/tensorrt-llm
skilz install zechenzhangAGI_AI-research-SKILLs/tensorrt-llm --agent opencode
skilz install zechenzhangAGI_AI-research-SKILLs/tensorrt-llm --agent codex
skilz install zechenzhangAGI_AI-research-SKILLs/tensorrt-llm --agent gemini

First time? Install Skilz: pip install skilz

Works with 14 AI coding assistants

Cursor, Aider, Copilot, Windsurf, Qwen, Kimi, and more...

View All Agents
Download Skill ZIP

Extract and copy to ~/.claude/skills/ then restart Claude Desktop

1. Clone the repository:
git clone https://github.com/zechenzhangAGI/AI-research-SKILLs
2. Copy the skill directory:
cp -r AI-research-SKILLs/12-inference-serving/tensorrt-llm ~/.claude/skills/

Need detailed installation help? Check our platform-specific guides:

Related Skills

Details

Stars
422
Forks
30
Type
Technical
Meta-Domain
cloud infrastructure
Primary Domain
kubernetes
Sub-Domain
deployment path
Skill Size
26.5 KB
Files
4
Quality Score
34.3

AI-Detected Topics

Extracted using NLP analysis

NVIDIA GPUs GPU GPUs throughput latency

Report Security Issue

Found a security vulnerability in this skill?