Communitygithub.com

haoyiyin/pdf-translate

Agent skill for translating PDFs between 20+ languages with full layout preservation. Uses PDFMathTranslate + Google Translate. Zero API keys needed.

対応Claude CodeCodex CLI~Cursor
npx skills add haoyiyin/pdf-translate

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

ドキュメント

PDF Translation with Layout Preservation (pdf2zh)

Translate PDF documents between languages while fully preserving original layout, images, tables, formulas, and typography. Uses pdf2zh (PDFMathTranslate, accepted at EMNLP 2025) with Google Translate as the free default backend.

Output: two files — *-mono.pdf (translated only) and *-dual.pdf (bilingual side-by-side).

When to Use

  • You need to translate a PDF from one language to another
  • The original layout, images, and formatting must be preserved
  • You want a bilingual version for comparison
  • The document contains tables, formulas, or complex layouts
  • Scanned PDFs or documents with embedded images

Don't use for: simple text extraction without layout preservation (use ocr-and-documents skill instead).

Prerequisites

# Install pdf2zh
pip install pdf2zh

No API key needed for Google Translate backend (free, no sign-up).

Usage

Basic Translation

# Translate Chinese to English
pdf2zh input.pdf -li zh -lo en -s google

# Translate English to Chinese
pdf2zh input.pdf -li en -lo zh -s google

# Translate Japanese to English
pdf2zh input.pdf -li ja -lo en -s google

# Translate French to Chinese
pdf2zh input.pdf -li fr -lo zh -s google

This generates:

  • input-mono.pdf — translated-only version
  • input-dual.pdf — bilingual version (original left, translated right)

Specify Custom Output Directory

pdf2zh input.pdf -li zh -lo en -s google -o ./output/

Translate Specific Pages Only

pdf2zh input.pdf -li zh -lo en -s google -p 1-10

Translate from URL (Online PDF)

pdf2zh https://example.com/document.pdf -li zh -lo en -s google

Batch Translate an Entire Folder

pdf2zh --dir ./pdfs/ -li zh -lo en -s google

GUI Mode (Browser Interface)

pdf2zh -i
# Opens at http://localhost:7860

Use Different Translation Backends

BackendCommandAPI Key Needed
Google (free)-s googleNo
DeepL-s deeplYes (higher quality)
OpenAI-s openaiYes
Ollama (local)-s ollamaNo (local LLM)
Azure-s azureYes
Claude-s claudeYes

Language Codes (Common)

LanguageCode
Chinese (Simplified)zh
Chinese (Traditional)zh-TW
Englishen
Japaneseja
Koreanko
Frenchfr
Germande
Spanishes
Russianru
Arabicar
Portuguesept
Italianit

Parameter Reference

FlagDescriptionExample
-liSource language-li zh
-loTarget language-lo en
-sTranslation service-s google
-pSpecific pages-p 1,3,5-10
-oOutput directory-o ./output/
-tThread count-t 4
-fExclude text (regex)-f "(MS.*)"
--dirBatch translate folder--dir ./pdfs/
-iLaunch GUI-i
--compatibleCompatibility mode--compatible
--ignore-cacheBypass translation cache--ignore-cache

One-Shot Recipes

Quick Translate a Local PDF

pip install -q pdf2zh && pdf2zh input.pdf -li zh -lo en -s google

Translate with Higher Quality (OpenAI)

pdf2zh input.pdf -li zh -lo en -s openai
# Requires OPENAI_API_KEY in environment

Common Pitfalls

  1. Model download failure: pdf2zh downloads a layout detection model (wybxc/DocLayout-YOLO-DocStructBench-onnx) on first run. If blocked by network, set:

    export HF_ENDPOINT=https://hf-mirror.com
    
  2. Large PDF processing: For documents with 100+ pages, increase thread count:

    pdf2zh input.pdf -li zh -lo en -s google -t 4
    
  3. Translation rate limits: Very large documents may hit the translation service's rate limit. Try splitting the document:

    pdf2zh input.pdf -li zh -lo en -s google -p 1-20
    pdf2zh input.pdf -li zh -lo en -s google -p 21-40
    
  4. Compatibility issues: If the output has rendering problems, try compatibility mode:

    pdf2zh input.pdf -li zh -lo en -s google --compatible
    

Verification Checklist

  • *-mono.pdf exists and contains translated text
  • *-dual.pdf exists with bilingual layout
  • Images/graphics from original are preserved
  • Tables and formulas render correctly
  • Page count matches original

関連スキル