name: gemini-ocr-cli description: "Gemini-powered OCR via a global CLI, with onboarding-first setup, Markdown-first output, and explicit PDF mode control."
gemini-ocr-cli
When to use
Use this skill when a task needs high-quality OCR from a local terminal workflow and the host model’s own OCR is not reliable enough.
Use it especially when an agent must:
- OCR a local image into faithful Markdown,
- OCR a local PDF into faithful Markdown,
- preserve page structure, tables, and labels as well as possible,
- work through a deterministic CLI instead of ad-hoc multimodal prompting.
Preconditions
- Ensure the CLI is globally available on
PATH:- preferred:
npm install -g @codecell-germany/gemini-ocr-agent-skill - verify:
gemini-ocr --help
- preferred:
- Install the skill payload if needed:
gemini-ocr-skill install --force
- Required secret env:
GEMINI_API_KEY=<api-key>
- Supported alias secret env:
GOOGLE_GENERATIVE_AI_API_KEY=<api-key>
- Supported fallback secret env:
GOOGLE_API_KEY=<api-key>
- Optional model override:
GEMINI_OCR_MODEL=gemini-3-flash-preview
Core workflow
- Verify the public CLI surface:
gemini-ocr --help
- Validate the environment:
gemini-ocr doctor --json
- If the environment is incomplete, print the setup guide:
gemini-ocr setup --language engemini-ocr setup --language de
- Export one of the supported API key env vars and rerun:
gemini-ocr doctor --json
- OCR an image:
gemini-ocr scan-image /absolute/path/to/image.png
- OCR a PDF:
gemini-ocr scan-pdf /absolute/path/to/document.pdf
- Use JSON output only when the calling workflow needs the structured OCR object:
gemini-ocr scan-pdf /absolute/path/to/document.pdf --format json
Guardrails
- Use the public CLI names
gemini-ocrandgemini-ocr-skill. - Do not bypass the product surface with repo-local entrypoints such as
node dist/index.js. - Do not call hidden installed runtime paths such as
~/.codex/tools/gemini-ocr-cli/dist/index.js. - Inputs are sent to Gemini for remote processing. Do not use the tool on documents that must stay fully local unless that remote-processing policy is acceptable.
- Default output is Markdown on stdout.
- Diagnostics and warnings belong on stderr.
--format jsonis the machine-readable escape hatch.--pdf-mode autocan retry a failed native PDF request as raster OCR whenpdftoppmis available.--pdf-mode rasterrequirespdftoppm.- API keys remain in shell env. Do not paste them into prompts, tickets, screenshots, or chats.
- Prefer
doctorbefore the first real OCR run in a new shell or environment.
References
- Main overview:
references/overview.md - Agent onboarding:
references/agent-onboarding.md - OCR first run:
references/ocr-first-run.md - Command cheat sheet:
references/command-cheatsheet.md - Architecture:
knowledge/ARCHITECTURE.md - Release checklist:
knowledge/RELEASE_CHECKLIST.md