Performance evaluation of Cuckoo filters as an enhancement tool for password cracking

Cyberthreats continue their expansion, becoming more and more complex and varied. However, credentials and passwords are still a critical point in security. Password cracking can be a powerful tool to fight against cyber criminals if used by cybersecurity professionals and red teams, for instance, to evaluate compliance with security policies or in forensic investigations. For particular systems, one crucial step in the password-cracking process is comparison or matchmaking between password-guess hashes and real hashes. We hypothesize that using newer data structures such as Cuckoo filters could optimize this process. Experimental results show that, with a proper configuration, this data structure is two orders of magnitude more efficient in terms of size/usage compared to other data structures while keeping a comparable performance in terms of time.


Introduction
Cybercrime has significantly grown over the last decade, impacting individuals, enterprises, and governments worldwide.The variety and depth of breaches and incidents, such as social engineering or denial of service attacks, have increasingly created a permanent concern amongst security professionals to the point that it is considered a pandemic (The Economist 2021).
Examining the latest security events compromising confidentiality, integrity, and availability of information assets, and those that resulted in the exposure of data to unauthorized parties as published in (Verizon 2021), we observe that attacks with financial motivation are the number one and the organized crime is still the leading actor.Also, it is interesting also to realize that basic web application attacks represent the second most common pattern in incidents and breaches.In contrast, social engineering ranks first in breaches and third in incidents.
Commonly, some of these incidents are due to stolen credentials or password cracking.Similarly, the compromise of password-based or single-factor user credentials is identified by (VMWare et al. 2022) as one of the top concerns for many security leaders.
Understanding novel and agile forms of password cracking is paramount to creating robust defense systems and increasing the culture of cybersecurity amongst professionals and organizations.This paper proposes and evaluates a novel method to optimize the password-cracking process.The idea relies on incorporating cuckoo filter data structures to accelerate the matchmaking between guesses and targets.To evaluate this proposal, we use the widely known password cracking tool John the Ripper, due to its wide use in the cybersecurity communities, to crack an anonymized, publicly available dataset of leaked password hashes.Then, we compare the performance of the cuckoo filter data structure for matchmaking with other alternatives, specifically, binary search trees, binary search, linear search, and hash tables.Results show that (i) among the tested methods, Cuckoo filters are the second most efficient solution in terms of time consumption and that (ii) when compared in terms of usage, Cuckoo filters are two orders of magnitude more efficient than the second best option, hash tables.
The rest of the paper is organized as follows.Section "Related work" summarizes recent trends in passwordcracking methodologies from the related literature.The operation of Cuckoo filters and how to incorporate them into the cracking process is described in Sect."Cuckoo filters".The methodology is shown in Sect."Methodology".The evaluation of the proposal and the corresponding results are shown and discussed in Sect."Results".The paper concludes by highlighting the most important findings and our future work.

Related work
Textual passwords are still one of the most common authentication methods (Shi et al. 2021;Bonneau et al. 2012).They are composed of a set of alphanumeric characters, where the rule of thumb is that the longer and more complex the passwords are, the higher the security offered, which raises the known tradeoff between usability and security.Password authentication goes beyond using several characters to gain access to a system.As explained by (Ali et al. 2021), passwords can be divided into three categories: token-based, biometric-based, and knowledge-based.Within this last category, we have both graphical-based (involving a mouse password entry) and textual-based passwords.Then, a textual password could belong to a direct keying or a reformation-based scheme.In this work, we focus on direct keying passwords.
The reasons to investigate password cracking are varied, from law enforcement and forensic investigations (Kanta et al. 2020;Kanta et al. 2021;Maqbool et al. 2020) to cybersecurity training (Švábenský et al. 2021) and ethical hacking (Bishop and Klein 1995;Yang et al. 2022), through understanding the effect of people's culture and background on the use of passwords (Shin and Woob 2022;Brown et al. 2004) (Bonneau 2012;Wang et al. 2017;Wang et al. 2018).Much has been written about password cracking during the last decades.It is worth mentioning that whereas password cracking is an offline technique in which the attacker (usually) has access to the password hashes, password guessing is performed online while trying to gain unauthorized access to a system.Some research works from the related literature have proposed new methods for enhancing the performance of password cracking.In order to reduce the time needed for this operation, the work done by (Weir et al. 2009) was the first to show that it is possible to use Probabilistic Context Free Grammar (PCFG) to create either password guesses or templates via training and then employ them to crack passwords more efficiently.Numerous works have followed this trend with very interesting results (Veras et al. 2014;Houshmand et al. 2015).In (Ali et al. 2021), the authors presented a brute force algorithm to test the security of its reformation-based password scheme and compare it to other solutions from the related literature.From a different perspective, a Markovmodel-based scheme was introduced by (Narayanan and Shmatikov 2005) to improve dictionary-based attacks.The authors were able to generate guesses that accelerate the process compared to rainbow attacks, and some other works have followed this research line (Dürmuth et al. 2013).Lately, with the boost in the application of machine learning and deep learning methods, several authors have presented promising proposals, such as (Hitaj et al. 2019), where the authors use a Generative Adversarial Network (GAN) to learn the distribution of real passwords and to produce guesses, (Xia et al. 2020), combining a neural network with PCFG, (Kaleel and Nhien-An 2020) studying the performance of PassGAN, and other interesting works such as (Melicher et al. 2016) (Yang et al. 2022).
To the authors' knowledge, the two most complete works on password cracking up to date were presented by (Ji et al. 2017) and (Shi et al. 2021).In (Ji et al. 2017), the authors conducted a large-scale empirical study with almost 150 million real passwords.They tested the level of correlation, the effectiveness of commercial password meters demonstrating their inconsistency, and the best strategy in terms of what algorithms are more effective (e.g., with or without training, intra-or cross-site, etc.), concluding that a hybrid option could be more appropriate.Similarly, (Shi et al. 2021) studied over 220 million plaintext passwords.Although they also concluded that there does not exist a particular better cracking method due to the impact of multiple factors, the individual analysis of each dataset showed noteworthy results.For instance, they demonstrated that whereas English datasets were better cracked with PCFG algorithms, Chinese datasets responded better to Markov-based methods.
Regarding commercial tools for password cracking, Hashcat (Advanced Password Recovery 2022) and the latest versions of John the Ripper (Open Wall 2022), e.g., Jumbo, are some of the most common applications.JtR enables several modes of operation, specifically: • Wordlist mode, the simplest operation mode where it is only necessary to specify a wordlist and a (or some) password file; word mangling rules can be enabled.• Single crack mode, a faster mode than the wordlist one, but that only uses login names and directory names, being able to apply more mangling rules than in the previous mode.• Incremental mode, considered the most powerful mode because all possible character combinations are verified, with the consequent increase in time.
• Markov mode, one of the latest improvements to JtR, tests guesses using statistical analysis of similarities among known passwords.
To the authors' knowledge, neither previous research works nor commercial tools for password cracking have studied the use of cuckoo filters to enhance their operation.The following section explains the operation of this data structure, whose main advantage is double: guarantying a zero false-negative rate and a low falsepositive rate while maintaining a comparative low time consumption.

Cuckoo filters
Bloom filters (Bloom 1970) and Cuckoo filters (Fan et al. 2014) are probabilistic data structures commonly used to provide membership checking.Their two main advantages are speed and memory efficiency.Both methods offer a zero false-negative rate and a non-zero falsepositive rate, i.e., the answer to a membership query will always be definitely not (with no error) or maybe yes (with a false positive rate).
Some general characteristics of Bloom filters are the following.The larger the bit array (space) is, the lower the false-positive is.The more hash functions you have, the slower your Bloom filter and the quicker it fills up, and if the hash functions are just too few, we will encounter many false positives.In addition, an essential feature of classic Bloom filters is that to delete existing items, it is required to rebuild the complete filter.This fact led to the introduction of new Bloom filter variants (Cao et al. 2000;Song et al. 2005;Lim et al. 2017;Wu et al. 2021).However, some of these modifications incur a significant performance or space overhead.For this reason, (Fan et al. 2014) proposed Cuckoo filters as an alternative that allows adding and deleting items dynamically, demonstrating better performance and higher space efficiency if the false positive rate is kept below 3%.
Standard Cuckoo filters employ a modification of cuckoo hashing.As a brief note, a cuckoo hash table is composed of an array of buckets, and each bucket can hold several items.The number of items that can be stored in bucket is denoted by b.The number of candidate buckets where a new element x can be inserted equals the number of hashing functions used, k, and the output of the k hash functions gives the bucket numbers.Consequently, to insert an item x, the k hashes of x should be calculated.Then, we check if either of x's buckets is empty.If both buckets are empty, the algorithm chooses one of the candidate buckets and inserts the item.If all candidate buckets are in use, the algorithm ejects one of the existing items in one of the x's candidate buckets, re-inserting it into its alternate bucket.These relocations can be executed recursively several times until a free bucket is found or until a maximum number is reached.
The next advance in the use of these data structures was the introduction of Cuckoo filters.In contrast to Bloom filters, which store 1 bit for each item, or cuckoo-hashing, which store the complete item, a Cuckoo filter stores a fingerprint for each inserted item x.The fingerprint is a bit string obtained from the item x using another hash function.Furthermore, unlike cuckoo-hashing, Cuckoo filters will apply partial-key cuckoo hashing.When a set membership query for item x is required, the algorithm outputs true just in case an identical fingerprint of x is found.It is important to observe that the fingerprint is not the hash of x, and the original key-value pairs are not stored, consequently being non-retrievable.This fact implies that we could not calculate an item's alternate bucket as we did with cuckoo hashing.Partial-key cuckoo hashing was introduced to enable this capacity by stating that only two hash functions will be used.In other words, only two bucket candidates given by h 1 (x) and h 2 (x) are employed, and these two hashes will follow the rule depicted in (1).This rule guarantees that x's alternate bucket can be obtained using the location of either bucket h 1 (x) or h 2 (x) and x's fingerprint due to the XOR operation.That is, there is no need to know x. Figure 1 depicts an example of inserting elements in a Cuckoo filter and checking membership.
Following the same philosophy of Bloom filters, the Cuckoo filter's false-negative rate is zero, and the false-positive rate ε is shown in (2), where n is the number of items expected to be inserted into the set, m is the size of the bit array, b is the number of items that a bucket can hold, and f is the fingerprint length in bits.The Bloom filter's false positive rate is shown in (3), where k is the number of hash functions (Reviriego et al. 2020).Table 1 summarizes some characteristics of Bloom filters and Cuckoo filters that should be considered in their implementation.
(1) 1 An example of the Cuckoo Filter operation using a bit array of size 18 (m = 18), each bucket can store only one fingerprint (b = 1), and the fingerprint length in bits is 5 (f = 5).The hash function is hash(x) = 2*x mod 18 and fingerprint(x) = (x − 1) mod 18.(a) Inserting elements: To insert item 93, the value fingerprint(93) = 2 should be stored either in position h 1 (93) = 6 or in position h 2 (93) = 2, because position 6 is occupied the value 2 is inserted in position 2. To insert item 20, the value 1 should be stored either in position 4 or in position 7, both are available, so the value 1 is inserted in position 4. To insert item 24, the value 5 should be stored either in position 12 or in position 6, and both buckets are occupied.Consequently, we randomly choose bucket 6 and displace the current fingerprint 10 to its alternate bucket 4. To calculate the alternate bucket, we know that h 1 (x) = 6, then h 2 (x) = 6 ⊕ hash(10) = 6 ⊕ 2 = 4, so the new location should be position 4.Because bucket 4 is also in use, we displace the current fingerprint 1 to its alternate bucket 7 (if h 1 (x) = 4 then h 2 (x) = 4 ⊕ hash(1) = 4 ⊕ 2 = 7), which is free.(b) Verifying membership: To test the membership of element 15, we calculate its fingerprint, fingerprint(15) = 14, and the two possible buckets where the fingerprint could be located (positions h 1 (15) = 12 and h 2 (15) = 6), because position 12 stores the fingerprint of item 15, we can state that the item belongs to the set with a non-zero false-positive rate

Methodology
In this section, we explain how to improve the performance of password cracking by incorporating Cuckoo filters.Our hypothesis is that the time required to perform password cracking can be reduced if (1) the corresponding target hashes are inserted into a Cuckoo filter and (2) the guessed passwords are checked against this data structure.
In order to evaluate this proposal, we selected one leaked password dataset with New Technology Lan Manager (NTLM) hashes that may have resulted from different types of attacks. 1 The NTLM authentication protocols authenticate users and computers based on a challenge/response mechanism that proves to a server or domain controller that a user knows the password associated with an account.The format is shown in Fig. 2, where we can see the password's hash and the number of times this password had been seen in the source data breaches separated by a colon.It is important to state that we only employ these password hashes for research purposes, and no personally identifiable information is being used, explored, or disclosed.
The general method of password cracking, for instance, using well-known tools such as JtR, is as follows.A password guess is created.Then, the corresponding password hash of the guess should be computed and tested (compared) against the target hash value.This target hash value is frequently included in a large file with hundreds or thousands of other "leaked" hashes.Therefore, we could wonder, once the guess is generated and the program looks for a match to see if it is a valid crack, what is the most common technique used for searching?There could be many options: linear search, a hash map, etc. Particularly with JtR, if salts are correctly used, there is only one target hash per salt value.Therefore, it would be a one-to-one comparison, and there would be no need for a search.In other cases, it is more efficient to compute multiple hashes at once from multiple password guesses using newly available computing power such as Graphics Processing Units (GPU).In this case, the comparison would be many-to-one (Open Wall 2017).However, for some systems that use NTLM hashes, the comparison step requires using a searching algorithm, such as bitmap structures, hash tables, or linear searches (Open Wall 2023).NTLM hashes are still used by current Windows.They are very fast and don't use salts, so when cracking those the comparison step is in fact a bottleneck.Consequently, we propose to include a Cuckoo filter in the search process to improve its efficiency by speeding up the comparison stage.
In order to evaluate the performance of our proposal, we designed several tests with the following methodology (Fig. 3).Given that our goal is to measure the effectiveness of incorporating a Cuckoo filter into the cracking process, we decided not to modify JtT or Hashcat, which will be considered for future work, but to compare the performance of a linear search, a hash table, a binary search tree, a binary search, and a Cuckoo filter under the same case study of password cracking.
First, we split the leaked password dataset with NTLM hashes into n sub-datasets of different sizes (see Table 2).Each sub-date is called target i (i = 1..n).For each subdataset target i , we generate the corresponding Cuckoo filter using the cuckoo filter library for Python (Guan 2019).That is, we fill the Cuckoo data structure by inserting the items of the sub-dataset target i .Next, we crack the passwords of the sub-dataset target i using JtR and store the obtained cracked hashes in a temporary file tmp i .New random, fake hashes that do not belong to any of the cracked passwords are then inserted into the temporary file tmp i .The reason is to get as close as possible to reality so that there will be correct and incorrect guesses during the cracking process.Therefore, the temporary file tmp i includes the correct hashes, i.e., the target hashes corresponding to the cracked passwords, and filler (fake) hashes (50%).
Then, the search process is implemented using as input the temporary file tmp i , emulating the hashes created by the password cracking program, and checking if the hashes that it contains match the target hashes target i (and therefore, meaning that we were able to crack the passwords).The search time t i s is computed as a performance indicator.Figure 3 summarizes the evaluation process.The comparison has been carried out using a Virtual Machine (Virtual Box) with Debian (64-bit) (Kali Linux), 8192 MB RAM, six processors, 80 Gb storing capacity, VMSVGA 128 Mb graphics, and Python 3.11.2.In Fig. 2 An example of NTLM hashes from the dataset in hexadecimal format 1 https:// havei beenp wned.com/ Passw ords addition, specifically for the Cuckoo filter data structure, additional tests are carried out varying its configuration parameters to evaluate their impact on performance.

Results
Results are illustrated in Fig. 4. In Fig. 4a, we compare the different search methods with the Cuckoo Filter using its default parameters (m = total number of hashes, b = 4, f = 8).We can observe that the time required to find matches between the computed hashes (from the guess passwords) and the target hashes for each search algorithm t i s is lower for the hast table.The second and third faster methods are cuckoo filter and the binary search methods, respectively.Nevertheless, we can deduce from Fig. 4b that the sizes of the data structures are not negligible, making the Cuckoo filter the most optimal one in two orders of magnitude compared with the hash table.Linear search is simple but inefficient for large data sets.Although it is easy to implement, its search time increases linearly with data size, making it unsuitable for large data sets.However, it could be a viable option for very small data sets.Binary search significantly improves performance compared to linear search using an ordered data structure.It is an acceptable method for large data sets and presents a logarithmic search time.In addition, it offers high accuracy and has no false positives or negatives.However, it requires the data to be pre-sorted and can be more complex to implement.Binary search trees offer reasonable search time and high accuracy.Their performance improves compared to linear search but is inferior to binary search.The size of the data structure depends on the number of nodes and the depth of the tree, which implies significant memory consumption.Binary search trees are useful when a tidy data structure is required and accuracy is critical.
Hash tables are highly efficient regarding search time, as they allow direct access to items through a hash function.They provide high search speed, especially for large data sets.However, the size of the data structure grows with the number of elements, which can be a limitation in terms of memory consumption.In addition, it is critical to select a suitable hash function and consider possible collisions.Cuckoo filters are an attractive option for balancing speed, accuracy, and memory consumption.They offer fast search time and a compact data structure.The memory consumption of the Cuckoo filter is lower compared with the other methods, as depicted in Fig. 4 (b), which makes it an efficient alternative.
Correctly setting the Cuckoo filter parameters is important to minimize false positives.To study the Cuckoo filter's performance with more detail, we show in Figs. 5,  6, and 7 the effect of using different bucket sizes where total is the total number of hashes to be tested).For b = 4, i.e., each bucket can store four fingerprints, we can see in Fig. 5 that varying the size has little impact on the searching time.However, using a fingerprint length of more than 8 bits decreases the falsepositive rate to almost zero.This effect is also noticeable for b = 2 and b = 6, i.e., each bucket can store two or six fingerprints, respectively, as shown in Figs.6a and 7a.The reason is that a larger fingerprint reduces the likelihood of a collision.
As expected from the definition of a Cuckoo filter, larger values of m increase the size of the data structure.If this parameter growths then there will be more "space" available to store items.Nevertheless, this does not mean that the performance will be better as shown in Figs. 5, 6,  and 7. Observing these figures for f = 16, f = 24, and f = 32 bits, we can see that the best configuration in terms of searching time is found for the combination {b = 2, f = 16} (Fig. 6.b) and {b = 6, f = 24} (Fig. 7c).In the former, there is a slightly higher size cost because the best result is obtained with m = 3, opting for {b = 2, f = 16} with m = 1 as the best configuration considering the searching time and size.

Conclusion
New data structures, such as the Cuckoo filters, have been proven efficient in several computer network applications.Nevertheless, its use in security has been limited mainly to authentication tasks.In this work, we have introduced a new use of Cuckoo filters as a valuable tool within the password-cracking process.The proposed method is particularly interesting for systems that use NTLM hashes because, in this scenario, the comparison step between generated hashes and target hashes requires a searching algorithm.Results show that whereas there is no a direct reduction in time, the gain in terms of memory usage is of two orders of magnitude compared to commonly employed data structures, which opens the door to further research in this direction.

Fig. 3
Fig. 3 Flow diagram for the evaluation process.This process is repeated for each data structure under study.(*) The search methods are linear, hash table, binary search tree, binary search, and Cuckoo filter

Fig. 4 Fig. 6 Fig. 7
Fig.4Comparison of the use of different data structures for matchmaking.a Time in (s) required for matchmaking, i.e., finding matches between the computed hashes (from the guess passwords) and the target hashes for each search algorithm b Size in (MB) of the data structure used for matchmaking.The represented size is the output of the Python primitive sys.getsizeof() that returns the size of an object in bytes, considering that only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to

Table 1
Comparative characteristics of Bloom filters and Cuckoo filters (Notation: m≡number of buckets for Cuckoo or size of the array for Bloom; n≡number of items; b≡bucket size for Cuckoo; α≡load factor 0 ≤ α ≤ 1; k≡number of hash functions; f≡fingerprint length in bits for Cuckoo; n/a≡not applicable)

Table 2
Details of the files and hashes employed for the performance evaluation