Transcripts

Every recording is converted into a written transcript before any other processing occurs. The transcript is the foundation for all analysis that follows.

Speakers are separated

The transcript is not a single block of text. VOIX divides the conversation into turns and attributes each turn to a speaker, so that you can see who said what and in what order. Until a name is known, speakers are labelled neutrally (for example, the first and second voice heard); once the analysis identifies who is who, those labels are mapped to real names.

Speaker separation is most accurate when each participant is clearly audible. Heavy crosstalk, very short interjections, and two people who sound alike in the same room are the most difficult cases, and may occasionally be merged or split.

For the most reliable separation, send channel-separated audio: when each party is recorded on their own channel (two files, or a stereo file), VOIX attributes each speaker by channel rather than acoustically, which is exact and also lets it label the rep and customer deterministically. See multi-channel audio. The transcript looks the same either way.

Language is detected automatically

VOIX detects the language spoken on the call. You do not set it, and you do not need to indicate which language to expect.

This applies beyond the transcript itself: the analysis, the summary, the tasks, and the evidence quotations are all written in the language of the call. A Dutch call produces a Dutch summary; a French call produces a French one. Numeric scores and yes/no answers are language-neutral.

Tuned for quality, not configurable

Transcription settings, including the language model and audio handling, are owned by VOIX and tuned for consistent quality. They are not exposed as per-call parameters. This keeps results comparable across every call in your account, rather than varying with whatever a sender happened to configure. The exception is speaker separation, which you can improve by sending channel-separated audio; the output shape is unchanged. See why ingest is service-owned for the same principle on the sending side.