google-gemini/gemini-live-api-dev

Use this skill when building real-time, bidirectional streaming applications with the Gemini Live API. Covers WebSocket-based audio/video/text streaming, voice activity detection (VAD), native audio features, function calling, session management, ephemeral tokens for client-side auth, live translation, and all Live API configuration options. SDKs covered - google-genai (Python), @google/genai (JavaScript/TypeScript).

兼容平台~Claude Code~Codex CLI~Cursor✓Gemini CLI

npx skills add https://github.com/google-gemini/gemini-skills/tree/main/skills/gemini-live-api-dev

查看原文→浏览所有技能

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

ChatGPT Claude Gemini Grok Perplexity DeepSeek

文档

google-gemini/gemini-live-api-dev

Individual skills in this repo

This repo contains 3 individual skills — each has its own dedicated page.

google-gemini/gemini-api-dev

Use this skill when building applications with Gemini API hosted models, including Gemini and Gemma 4, working with multimodal content (text, images, audio, video), implementing function calling, using structured outputs, or needing current model specifications. Covers SDK usage (google-genai for Python, @google/genai for JavaScript/TypeScript, com.google.genai:google-genai for Java, google.golang.org/genai for Go), model selection, and API capabilities.

google-gemini/gemini-interactions-api

Use this skill when writing code that calls the Gemini API for text generation, multi-turn chat, multimodal understanding, image generation, streaming responses, background research tasks, function calling, structured output, or migrating from the old generateContent API. This skill covers the Interactions API, the recommended way to use Gemini models and agents in Python and TypeScript.

google-gemini/vertex-ai-api-dev

Guides the usage of Gemini API on Google Cloud Vertex AI with the Gen AI SDK. Use when the user asks about using Gemini in an enterprise environment or explicitly mentions Vertex AI. Covers SDK usage (Python, JS/TS, Go, Java, C#), capabilities like Live API, tools, multimedia generation, caching, and batch prediction.

相关技能

nirholas/three.ws

Open-source 3D AI agent framework — GLB/glTF avatars with LLM brains, memory, emotions, and autonomous payments. MCP server · x402 · Solana/EVM · Three.js. Embed anywhere as a web component. Character studio, animation gallery, OAuth 2.1. Browser-native.

community

skills-shell/ai-video-generation

Generate AI videos with Google Veo, Seedance 2.0, HappyHorse, Wan, Grok and 40+ models via inference.sh CLI. Models: Veo 3.1, Veo 3, Seedance 2.0, HappyHorse 1.0, Wan 2.5, Grok Imagine Video, OmniHuman, Fabric, HunyuanVideo. Capabilities: text-to-video, image-to-video, reference-to-video, video editing, lipsync, avatar animation, video upscaling, foley sound. Use for: social media videos, marketing content, explainer videos, product demos, AI avatars. Triggers: video generation, ai video, text to video, image to video, veo, animate image, video from image, ai animation, video generator, generate video, t2v, i2v, ai video maker, create video with ai, runway alternative, pika alternative, sora alternative, kling alternative, seedance, happyhorse

community

skekdkkddk/elementsix-skills

🎬 Transform ideas into professional Seedance 2.0 video storyboard prompts with ease using this Claude Code Skill.

community

ewro/didntwatch

Too long; didn't watch — your agent did. A Claude Code skill that summarizes YouTube videos, answers follow-ups, and remembers every video it has seen.

community

doany-ai/lipsync

Lip-sync a face to a specific audio track on RunComfy via the `runcomfy` CLI. Routes across ByteDance OmniHuman (audio-driven full-body avatar from a portrait + audio), Sync Labs sync v2 / Pro (state-of-the-art mouth sync onto a video), Kling lipsync (audio-to- video and text-to-video with synced speech), and Creatify lipsync. The skill picks the right endpoint for the user's actual intent — portrait still + audio (avatar-style), source video + audio (mouth- swap on existing footage), or generate-and-sync from a script. Triggers on "lip sync", "lipsync", "make this video speak", "match audio to mouth", "dub video", "sync lips to voice", "Sync Labs", "voiceover sync", or any explicit ask to drive a face's mouth from an audio track.

community

carreras66848657/claude-skill-ugc-prompt

Generate production-ready UGC video ad prompts for Higgsfield AI Marketing Studio from your reference images and product descriptions.

community

← More 视频与动画 skills