Skip to main content
Add human‑level pronunciation assessment to your app with a single API call. Built for EdTech, speech therapy, and linguistic analysis.

Getting started

Make your first API request in minutes.

Output reference

Learn the JSON schema and fields.

API overview

Langcraft Speech API provides pronunciation assessment and prosody analysis in a single call. It’s designed for EdTech, speech therapy, and linguistic analysis, and returns structured JSON you can use directly in your app. Key capabilities:
  • Word and phoneme alignment with millisecond timing
  • Phoneme‑level scoring and error detection
  • Automatic transcription with word‑level timestamps when no reference text is provided
  • Multilingual support (40+ languages)
  • Model selection: send model=aurora-1 for languages other than English, or leave model unset for standard English analysis

Highlights

  • Per‑phoneme scores with timestamps
  • Per‑word rollups and summaries
  • Pitch and stress contours at the phoneme and word levels
  • Alignment metadata to connect audio, phones, and text

Inputs

You can analyze speech with any of:
  • A reference text (reference_text) plus language code (lang) — the API runs grapheme‑to‑phoneme generation to derive canonical phones
  • A direct IPA phone sequence (reference_phones) — bypasses G2P, useful for pronunciation contrast tests
  • Audio only — the API runs automatic transcription and uses the transcript as the reference
reference_text accepts the alias text. reference_phones accepts the alias ipa.

Model selection

For non-English speech, set the official public model selector:
model=aurora-1
aurora-1 is an experimental public model selector designed for languages other than English. It is currently recommended for German, French, and Spanish, and is the only non-default model selector currently documented for public integrations.