Langcraft Speech API - Langcraft Speech API Docs

Add human‑level pronunciation assessment to your app with a single API call. Built for EdTech, speech therapy, and linguistic analysis.

Getting started

Make your first API request in minutes.

Output reference

Learn the JSON schema and fields.

API overview

Langcraft Speech API provides pronunciation assessment and prosody analysis in a single call. It’s designed for EdTech, speech therapy, and linguistic analysis, and returns structured JSON you can use directly in your app. Key capabilities:

Word and phoneme alignment with millisecond timing
Phoneme‑level scoring and error detection
Automatic transcription with word‑level timestamps when no reference text is provided
Multilingual support (40+ languages)
Model selection: send model=aurora-1 for languages other than English, or leave model unset for standard English analysis

Highlights

Per‑phoneme scores with timestamps
Per‑word rollups and summaries
Pitch and stress contours at the phoneme and word levels
Alignment metadata to connect audio, phones, and text

Inputs

You can analyze speech with any of:

A reference text (reference_text) plus language code (lang) — the API runs grapheme‑to‑phoneme generation to derive canonical phones
A direct IPA phone sequence (reference_phones) — bypasses G2P, useful for pronunciation contrast tests
Audio only — the API runs automatic transcription and uses the transcript as the reference

reference_text accepts the alias text. reference_phones accepts the alias ipa.

Model selection

For non-English speech, set the official public model selector:

model=aurora-1

aurora-1 is an experimental public model selector designed for languages other than English. It is currently recommended for German, French, and Spanish, and is the only non-default model selector currently documented for public integrations.

Getting started

Output reference

​API overview

​Highlights

​Inputs

​Model selection

API overview

Highlights

Inputs

Model selection