ai-multimodal
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, com
Third-Party Agent Skill: Review the code before installing. Agent skills execute in your AI assistant's environment and can access your files. Learn more about security
Installation for Agentic Skill
View all platforms →skilz install Microck/ordinary-claude-skills/ai-multimodalskilz install Microck/ordinary-claude-skills/ai-multimodal --agent opencodeskilz install Microck/ordinary-claude-skills/ai-multimodal --agent codexskilz install Microck/ordinary-claude-skills/ai-multimodal --agent geminiFirst time? Install Skilz: pip install skilz
Works with 22+ AI coding assistants
Cursor, Aider, Copilot, Windsurf, Qwen, Kimi, and more...
Extract and copy to ~/.claude/skills/ then restart Claude Desktop
git clone https://github.com/Microck/ordinary-claude-skillscp -r ordinary-claude-skills/skills_categorized/media/ai-multimodal ~/.claude/skills/Need detailed installation help? Check our platform-specific guides:
Related Agentic Skills
opencode_cli
by SpillwaveSolutions
This skill should be used when configuring or using the OpenCode CLI for headless LLM automation. Use when the user asks to "configure opencode", "...
treatment-plans
by davila7
"Generate concise (3-4 page), focused medical treatment plans in LaTeX/PDF format for all clinical specialties. Supports general medical treatment,...
google-gemini-api
by jezweb
| Integrate Gemini API with correct current SDK (@google/genai v1.27+, NOT deprecated @google/generative-ai). Supports text generation, multimodal ...
elevenlabs-agents
by jezweb
| Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (clie...
Agentic Skill Details
- Repository
- ordinary-claude-skills
- Type
- Non-Technical
- Meta-Domain
- general
- Primary Domain
- general
- Sub-Domain
- api patterns skill
- Market Score
- 20
Browse Category
More general Agentic SkillsReport Security Issue
Found a security vulnerability in this agent skill?
Report Security Issue
Thank you for helping keep SkillzWave secure. We'll review your report and take appropriate action.
Note: For critical security issues that require immediate attention, please also email security@skillzwave.ai directly.