Skip to main content

Table 2 An overview of the query-based attacks against ASR

From: Towards the universal defense for query-based audio adversarial attacks on speech recognition system

Attack

Task

Attack method

Attack model

Target

M or D

Avg.Queries

SRoA(%)

CS Yuan et al. (2018)

ASR

GD

Kaldi-Aspire

Play music.

Open the front door.

Turn off the light.

M

\(\sim\)300

100

DS Carlini and Wagner (2018)

ASR

GD

DeepSpeech

Okay google browse

to evil dot com.

M & D

\(\sim\)1000

100

DW Chen et al. (2020)

ASR

Alt-M

APIs

Turn off The Light

Take a picture.

Call 911.

M

\(\sim\)150

100

DSG Taori et al. (2019)

ASR

GA & GE

DeepSpeech

Morning body.

Ball charge.

More they.

D

\(\sim\)150000

35

Foolgle Han et al. (2019)

ASR

GA

Google-API

D

86

SGEA Wang et al. (2021)

ASR

SGE

DeepSpeech

Thank you.

Hello world.

Open the door.

D

\(\sim\)78000

98

IRTA Qin et al. (2019)

ASR

Psy-M

Lingvo

Old will is a fine fellow

but poor and helpless sin

-ce missus rogers had her

accident.

D

\(\sim\)5000

100

PHA Schönherr et al. (2018)

ASR

Psy-M

Kaldi-WSJ

Do not blame you.

The command is planted.

The cake is a lie.

M & D

\(\sim\)500

98

EPA Abdullah et al. (2021)

ASR

Psy-M

DeepSpeech

and Wav2Letter

That is comparatively nothing.

Talking later is beneath us.

But there seemed no.

D

\(\sim\)1000

76

Occam Zheng et al. (2021)

ASR

Co-E

DeepSpeech

and APIs

Call my wife.

Navigate to my home.

Open the door.

M & D

\(\sim\)30000

100

SirenAttack Du et al. (2020)

ASR

PSO

DeepSpeech

Read last sms from boss.

Call the police for help.

D

\(\sim\)1000

100

MOGA-Attack Khare et al. (2018)

ASR

Mul-Obj

GO

DeepSpeech

and Kaldi

A cat.

All of these.

That i love you.

D

-

*

  1. In the table, “GD”, “GA”, “GE”, “SGE” represent the Gradient Descent, Genetic Algorithm, Gradient Estimation, and Selective Gradient Estimation. “Alt-M”, “Psy-M”, “Co-E”, “PSO”, “Mul-Obj GO” represent the Alternative Models, Psychoacoustic Masking, Co-evolutionary algorithm, Particle Swarm Optimization, Multi-Objective Genetic Optimization. “M or D” represents the Music-carrier or Dialogue-carrier, “–” denotes the author didn’t show, and “*” denotes the author told us the WER of the attack model to AEs was increased to \(980\%\)