Are our clone detectors good enough? An empirical study of code effects by obfuscation

Cybersecurity

Table 2 Code clone detectors and their manifest including extracted feature, used dataset, and supported clone types

Tools	Venue	Method	Feature	Dataset	Clone Type
Tools	Venue	Method	Feature	Dataset	T1	T2	ST3	MT3	T4
CCFinder (Kamiya et al. 2002)	TSE 2002	token normalization + token-wised comparison	Token	JDK 1.3.0, FreeBSD 4.0	✓	✓
SDD (Lee and Jeong 2005)	OOPSLA 2005	inverted index + N-neighbor	Text	JDK 1.5, httpd-2.0.54	✓	✓	✓
Deckard (Jiang et al. 2007)	ICSE 2007	Locality Sensitive Hash	AST	JDK 1.4.2, Linux kernel 2.6.16	✓	✓	✓
SourcererCC (Sajnani et al. 2016)	ICSE 2016	Filtering Heuristics	Token	BigCloneBench, Mutation/Injection	✓	✓	✓
Oreo (Saini et al. 2018)	ESEC/FSE 2018	action token + metric comparison	Token	BigCloneBench	✓	✓	✓
CCAligner (Wang et al. 2018)	ICSE 2018	code window + edit distance	Token	JDK 1.2.2, OpenNLP 1.8.1	✓	✓	✓
DeepSim (Zhao and Huang 2018)	FSE 2018	Multilayer Perceptron	CFG	BigCloneBench, GCJ	✓	✓	✓	✓	✓
ASTNN (Zhang et al. 2019)	ICSE 2019	Bidirectional RNN	AST	BigCloneBench	✓	✓	✓	✓	✓
CCLearner (Li et al. 2017)	ICSME 2017	DNNs	Token	BigCloneBench	✓	✓	✓	✓