Impact of Phonetics on Preserving Identity in Adversarial Voice Attacks

ICASSP 2025 - Identity Drift Analysis

16
Target Texts
2
Model Types
48
Total Graphs

Research Overview

This website accompanies our ICASSP paper and provides interactive access to all materials referenced in the manuscript. For each target phrase, we include two sets of normalized speaker–similarity graphs—one generated using an ECAPA‐TDNN backbone and one using a ResNet backbone. These graphs visualize how the similarity distributions shift under adversarial perturbations. We also provide scatter plots illustrating the relationship between signal‑to‑noise ratio (SNR) and similarity scores for each phrase.

Target Metrics Table

ID Target Text Phonetic Short Note Model #Samples #Genuine #Impostor TMR@0.1%FMR d′
T1yesmono-syll.; 1:2 V:C; glide+fric stopECAPA11881109117721.00009.68
ResNet5011881109117721.00009.43
T2open the door4 syll.; 4:6 V:C; dental fric. + stopsECAPA11881109117721.00009.11
ResNet5011881109117721.00008.89
T3call emergency servicesfricative-heavy; 8 syll.; 8:16 V:CECAPA11881109117720.97255.59
ResNet5011881109117720.98176.22
T4the quick brown fox jumped over the lazy dogpangram; 11 syll.; broad coverageECAPA11881109117720.90834.80
ResNet5011881109117720.98175.14
T5shhh she sees the sea fishfricatives /ʃ s z/ cluster; 6 syll.ECAPA11881109117720.99087.46
ResNet5011881109117721.00007.74
T6do go big bag digvoiced stops chain; minimal vowels; 5 syll.ECAPA11881109117721.00008.44
ResNet5011881109117721.00008.35
T7two tall teachers talk to tim/t/ alliteration; alveolar bursts; 7 syll.ECAPA11881109117721.00007.79
ResNet5011881109117721.00007.59
T8i whisper while walking wildlyapproximants + /w/ clusters; 8 syll.ECAPA11881109117721.00007.17
ResNet5011881109117721.00007.33
T9pack my box with five dozen liquor jugspangram; many consonant clustersECAPA90259589300.86324.63
ResNet5090259589300.94745.02
T10glib jocks quiz nymph to vex dwarfpangram; high fricative/affricate loadECAPA90259589300.84214.75
ResNet5090259589300.95795.34
T11a mad boxer shot a quick gloved jab to the jaw of his dizzy opponentmixed plosives/fricatives; many unstressed vowelsECAPA90259589300.74743.79
ResNet5090259589300.92634.45
T12just before twilight the wizard quickly jabbed five boxes of hazy quartz to vex a plump knight’s jovial frogvery long pangram; many clusters; vowel centralizationECAPA65618164800.44443.07
ResNet5065618164800.71603.63
T13twelve jolly grizzlies briskly danced over waxy benches while a flighty kitten kept humming jazz tunes in the backgroundfricative+affricate mix; many unstressed vowelsECAPA30255529700.52733.06
ResNet5030255529700.63643.46
T14quantum driven flux engines jam beneath zigzagging vortex panels as cryptic bioforms whisper behind polymorphic glass domesdense consonant clusters; many fricatives/affricatesECAPA22094721620.68093.10
ResNet5022094721620.74473.78
T15while whispering winds wander westward jittery jackals jiggled jellies above velvet jars beyond flickering bonfires in a frozen junglerepeated /w/ /ʤ/; long with many approximantsECAPA30255529700.65453.39
ResNet5030255529700.70913.54
T16kindly expedite bizarre frozen jumpsuits for victors whirlwind gala to maximize xenon emissions before daybreak‘x/ʒ/ks’ clusters; mixed stops/fricatives; multi-syllabicECAPA51847251120.52783.09
ResNet5051847251120.68063.30

ECAPA-TDNN Normalized Results

ResNet Normalized Results

SNR vs Similarity Analysis

Scatter plots showing the relationship between Signal-to-Noise Ratio (SNR) and similarity metrics for each target text, providing insights into the trade-offs between attack strength and audio quality.