The impact of mismatched recordings on an automatic-speaker-recognition system and human listeners

Tomáš Nechanský, Tomáš Bořil, Alžběta Houzar, Radek Skarnitzl

The impact of mismatched recordings on an automatic-speaker-recognition system and human listeners

Číslo: 1/2022
Periodikum: Acta Universitatis Carolinae Philologica
DOI: 10.14712/24646830.2022.25

Klíčová slova: forensic voice comparison; temporal mismatch; language mismatch; automatic speaker recognition; voice parade

Pro získání musíte mít účet v Citace PRO.

Přečíst po přihlášení

Anotace: The so-called ‘mismatch’ is a factor which experts in the forensic voice comparison field encounter regularly. Therefore, we decided to explore to what extent the automatic-speaker-recognition system’s and the earwitness’ ability to identify speakers is influenced when recordings are acquired in different languages and at different times. 100 voices in a database of 300 recordings (100 speakers recorded in three mutually mismatched sessions) were compared with an automatic-speaker-recognition software VOCALISE based on i-vectors and x-vectors, and by 39 respondents in simulated voice parades. Both the automatic and perceptual approach seem to have yielded similar results in that the less complex the mismatch type, the more successful the identification. The results point to the superiority of the x-vector approach, and also to varying identification abilities of listeners.