Transcripts
Every recording is converted into a written transcript before anything else happens. The transcript is the foundation for all of the analysis that follows.
Speakers are separated
The transcript is not a single block of text. VOIX splits the conversation into turns and attributes each turn to a speaker, so you can see who said what and in what order. Until a name is known, speakers are labelled neutrally (for example, the first and second voice heard); once the analysis identifies who is who, those labels are mapped to real names.
Speaker separation works best when each person is clearly audible. Heavy crosstalk, very short interjections, or two people who sound alike in the same room are the hardest cases and may occasionally be merged or split.
Language is detected automatically
VOIX detects the language spoken on the call. You do not set it, and you do not need to tell VOIX what language to expect.
This matters beyond the transcript itself: the analysis, the summary, the tasks, and the evidence quotes are all written in the call’s own language. A Dutch call produces a Dutch summary; a French call produces a French one. Numeric scores and yes/no answers are of course language-neutral.
Tuned for quality, not configurable
Transcription settings, the language model, how speakers are separated, audio handling, are owned by VOIX and tuned for consistent quality. They are not exposed as knobs on a per-call basis. This keeps results comparable across every call in your account rather than varying with whatever a sender happened to configure. See why ingest is service-owned for the same principle on the sending side.