Skip to main content

How PageTurner Works

PageTurner uses a sophisticated 5-phase AI pipeline to achieve professional-grade translations while preserving React components, technical terminology, and document structure.

This isn't just Google Translate for docs - it's a purpose-built system for technical documentation translation.


Why Not Just Use Google Translate?​

The Problem with Generic Translation:

Generic translation tools process sentences in isolation, leading to:

  • ❌ Inconsistent terminology - "repository" might be "repositorio" on page 1 and "repo" on page 50
  • ❌ Lost context - Technical concepts misunderstood without document context
  • ❌ Broken structure - React components, code blocks, and formatting destroyed
  • ❌ Poor quality - Generic tools average 65-70/100 for technical content

PageTurner's Approach:

PageTurner treats documentation as a structured, interconnected system:

  • βœ… Perfect term consistency - "authentication" translates identically across all 200 pages
  • βœ… Context-aware - Understands your documentation's domain and terminology
  • βœ… Structure preservation - React components, MDX, and formatting stay intact
  • βœ… High quality - Average 91.3/100 quality score

The 5-Phase Pipeline​

Phase 1: Parallel Intelligence Extraction​

What happens:

  1. Content Analysis - Scans entire documentation to understand structure and domain
  2. Keyterm Extraction - Identifies critical technical terms requiring consistent translation
    • API names (e.g., "useEffect", "useState")
    • Technical concepts (e.g., "authentication", "middleware")
    • Product-specific terms (e.g., "Webhook", "OAuth")
  3. Initial Translation - Performs first-pass translation with full document context

Why this matters:

  • Identifies terms that must be translated consistently across all pages
  • Understands your documentation's domain (database, web framework, cloud platform, etc.)
  • Provides context for better translation decisions

Example:

For WatermelonDB documentation:

Extracted keyterms:
- "WatermelonDB" β†’ Keep untranslated (product name)
- "database" β†’ Must be consistent
- "query" β†’ Technical term, needs consistency
- "reactive" β†’ Core concept, critical consistency
- "synchronization" β†’ Feature name, must match everywhere

Phase 2: Term Relationship Analysis​

What happens:

  1. Semantic Clustering - Groups related terms using AI algorithms
  2. Relationship Mapping - Identifies terms that must maintain consistency
    • Synonyms (e.g., "repo" and "repository")
    • Hierarchical terms (e.g., "database", "database query", "database migration")
    • Related concepts (e.g., "authenticate" and "authentication")

Why this matters:

  • Ensures "database query" and "query" use the same translation for "query"
  • Prevents inconsistencies when synonyms are used (repo vs repository)
  • Maintains semantic relationships in target language

Example:

Relationship clusters identified:
Cluster 1: ["repository", "repo", "git repository"]
β†’ All must use consistent translation

Cluster 2: ["authenticate", "authentication", "auth"]
β†’ Related terms, translations must align

Cluster 3: ["synchronize", "sync", "synchronization"]
β†’ Ensure verb/noun consistency

Phase 3: Term Parsing & Validation​

What happens:

  1. Translation Extraction - Parses term translations from Phase 1
  2. Quality Validation - Ensures term translations are appropriate
    • Checks if translation matches term meaning
    • Validates grammatical correctness
    • Verifies cultural appropriateness

Why this matters:

  • Catches translation errors early before they propagate
  • Ensures technical accuracy
  • Prevents awkward or incorrect terminology

Example:

Term validation:
βœ… "database" β†’ "base de datos" (Spanish) - Validated
βœ… "query" β†’ "consulta" (Spanish) - Validated
❌ "sync" β†’ "sincronizar" (verb form) - Corrected to "sincronizaciΓ³n" (noun)
βœ… "authentication" β†’ "autenticaciΓ³n" (Spanish) - Validated

Phase 4: Consistency Resolution​

What happens:

  1. Global Dictionary Creation - Builds consistent translations for all keyterms
  2. Conflict Resolution - Resolves any translation inconsistencies
    • Chooses best translation when multiple options exist
    • Ensures consistency across entire documentation
  3. Term Locking - Finalizes terminology dictionary for Phase 5

Why this matters:

  • This is where perfect term consistency is guaranteed
  • One source of truth for all technical terms
  • Eliminates the "translated differently on different pages" problem

Example:

Global terminology dictionary (English β†’ Spanish):
{
"database": "base de datos",
"query": "consulta",
"synchronization": "sincronizaciΓ³n",
"authentication": "autenticaciΓ³n",
"WatermelonDB": "WatermelonDB" // Product name, kept as-is
}

This dictionary is applied to ALL translations in Phase 5.

Phase 5: Translation Refinement​

What happens:

  1. Second Pass Translation - Retranslates with enforced terminology consistency
  2. Quality Assurance - Final validation and error correction
    • Applies global dictionary from Phase 4
    • Validates MDX/React component preservation
    • Checks link integrity
    • Verifies formatting preservation
  3. Output Generation - Creates final translated files

Why this matters:

  • Guarantees perfect term consistency across entire site
  • Final quality check before deployment
  • Ensures production-ready output

Example:

Page 1: "The database query system..."
↓
Page 1: "El sistema de consulta de base de datos..."

Page 50: "Execute a database query..."
↓
Page 50: "Ejecutar una consulta de base de datos..."

βœ… "database" β†’ "base de datos" (consistent)
βœ… "query" β†’ "consulta" (consistent)

Translation Memory: The Secret Weapon​

PageTurner includes a powerful translation memory system that learns and improves over time.

How It Works​

SHA256-Based Change Detection:

Original content: "Install PageTurner with npm install pageturner"
Content hash: a3f5b8c9d2e1...

Spanish translation: "Instala PageTurner con npm install pageturner"
Stored with hash: a3f5b8c9d2e1...

When content changes:
Updated content: "Install PageTurner with npm install pageturner-cli"
New hash: b7e2c4f1a8d9...
β†’ Only this changed segment gets retranslated

Why This Matters​

Cost Savings on Updates:

  • First translation: 100 pages Γ— 3 languages = 300 translation requests ($30)
  • Update 5 pages: Only 5 pages Γ— 3 languages = 15 requests ($1.50)
  • Savings: 95% cost reduction on updates

Cross-Project Learning:

  • Translation memory is shared across all your projects
  • Translating a second Docusaurus site reuses 40-60% of translations
  • Team collaboration: Shared terminology across distributed teams

MDX & React Component Preservation​

PageTurner was built specifically for Docusaurus, which uses MDX (Markdown + JSX).

What Gets Preserved​

React Components:

<Tabs>
<TabItem value="js" label="JavaScript">
{/* code block with: const db = new Database(); */}
</TabItem>
</Tabs>

Result: Component structure preserved, only text labels translated:

<Tabs>
<TabItem value="js" label="JavaScript">
{/* code block with: const db = new Database(); */}
</TabItem>
</Tabs>

Code Blocks:

## Installation

(bash code block: npm install watermelondb)

Result: Code never translated, only surrounding text:

## InstalaciΓ³n

(bash code block: npm install watermelondb)

Quality Metrics​

Average Translation Quality: 91.3/100​

Based on 22 production deployments across 13+ languages:

MetricPageTurnerGeneric Tools
Term Consistency99.2%73%
Technical Accuracy94.1%68%
Natural Flow89.7%81%
Structure Preservation100%65%
Overall Quality91.3/10065-70/100

Real-World Examples​

WatermelonDB (62 pages, 3 languages):

  • Quality score: 92.4/100
  • Translation time: 20 minutes
  • Components preserved: 100%
  • Term consistency: 99.8%

Prettier (180 pages, 5 languages):

  • Quality score: 90.8/100
  • Translation time: 45 minutes
  • Components preserved: 100%
  • Term consistency: 99.1%

Performance & Scalability​

Parallel Processing​

  • Up to 100 concurrent translation tasks
  • Intelligent rate limiting (default: 1000 requests/minute)
  • Smart chunking for LLM context limits
  • Token optimization reduces costs by 30-40%

Typical Translation Times​

Documentation SizeLanguagesTime
50 pages310-15 min
100 pages320-30 min
200 pages340-60 min
100 pages1060-90 min

Updates (with translation memory): 2-5 minutes for typical changes


Multi-LLM Provider Strategy​

PageTurner uses different AI models for different tasks:

TaskModelWhy
TranslationClaude Sonnet 4Best quality, context awareness
Term ExtractionGPT-4Excellent at identifying key concepts
ValidationClaude OpusHighest quality checks
Cost-EffectiveDeepSeek V390% of quality at 20% cost

Smart provider selection optimizes both quality and cost.


What Makes PageTurner Different​

FeaturePageTurnerGeneric TranslationHuman Translation
Term Consistencyβœ… Perfect (99%+)❌ Inconsistent (70%)βœ… Perfect (100%)
Context Awarenessβœ… Full document❌ Sentence-levelβœ… Full document
MDX/React Preservationβœ… Native support❌ Breaks components⚠️ Manual effort
Translation Memoryβœ… Automatic, 60-80% savings❌ None⚠️ Manual CAT tools
Deployment Automationβœ… GitHub + Vercel❌ Manual❌ Manual
Time to Deployβœ… Minutes❌ Hours❌ Weeks
Cost (100 pages, 3 languages)βœ… $30βœ… $20❌ $3,000+
Qualityβœ… 91/100❌ 65-70/100βœ… 95-98/100

PageTurner sweet spot: Near-human quality at 1% of the cost, 100Γ— faster than human translation.


Next Steps​

Now that you understand how PageTurner works:

Questions? Check our FAQ or contact us.