Skip to main content

Table 2 Transfer rate (TR) of adversarial samples on commercial cloud speech-to-text APIs

From: Towards the transferable audio adversarial attack via ensemble methods

API service

Audio length

DS

ITRA

CS

DW

Occam

DC

RGE

DGWE

Aliyun-API

4s

2/50

0/50

9/50

15/50

50/50

35/50

20/50

Xfyun-API

4s

0/50

0/50

20/50

35/50

50/50

34.5/50

30/50

35/50

Baiduyun-API

4s

4/50

0/50

4/50

37.25/50

38/50

33/50

Tencent-API

4s

2/50

2/50

5/50

20/50

41.2/50

Microsoft-API

4s

15/50

7/50

15/50

40/50

50/50

Average

4s

4.6/50

1.4/50

10.6/50

27.5/50

50/50

37.7/50

34.3/50

28.3/50

  1. The abbreviations DS, ITRA, CS, and DW refer to different attack methods presented in Carlini and Wagner (2018), Qin et al. (2019), Yuan et al. (2018), and Chen et al. (2020), respectively. Occam and DC refer to the attack methods proposed in Zheng et al. (2021) and Xu et al. (2022), respectively. The “-” indicates that the attack method has not been tested on the API. In the DW Chen et al. (2020), the authors evaluated 10 AEs. To match the number format of this work, we doubled the number. In the DC Xu et al. (2022), the authors reported the attack success rate as the result, which was converted by calculation to the format