API service | Audio length | DS | ITRA | CS | DW | Occam | DC | RGE | DGWE |
---|
Aliyun-API | 4s | 2/50 | 0/50 | 9/50 | 15/50 | 50/50 | – | 35/50 | 20/50 |
Xfyun-API | 4s | 0/50 | 0/50 | 20/50 | 35/50 | 50/50 | 34.5/50 | 30/50 | 35/50 |
Baiduyun-API | 4s | 4/50 | 0/50 | 4/50 | – | – | 37.25/50 | 38/50 | 33/50 |
Tencent-API | 4s | 2/50 | 2/50 | 5/50 | 20/50 | – | 41.2/50 | – | – |
Microsoft-API | 4s | 15/50 | 7/50 | 15/50 | 40/50 | 50/50 | – | – | – |
Average | 4s | 4.6/50 | 1.4/50 | 10.6/50 | 27.5/50 | 50/50 | 37.7/50 | 34.3/50 | 28.3/50 |
- The abbreviations DS, ITRA, CS, and DW refer to different attack methods presented in Carlini and Wagner (2018), Qin et al. (2019), Yuan et al. (2018), and Chen et al. (2020), respectively. Occam and DC refer to the attack methods proposed in Zheng et al. (2021) and Xu et al. (2022), respectively. The “-” indicates that the attack method has not been tested on the API. In the DW Chen et al. (2020), the authors evaluated 10 AEs. To match the number format of this work, we doubled the number. In the DC Xu et al. (2022), the authors reported the attack success rate as the result, which was converted by calculation to the format