TIM: threat context-enhanced TTP intelligence mining on unstructured threat data

TTPs (Tactics, Techniques, and Procedures), which represent an attacker’s goals and methods, are the long period and essential feature of the attacker. Defenders can use TTP intelligence to perform the penetration test and compensate for defense deficiency. However, most TTP intelligence is described in unstructured threat data, such as APT analysis reports. Manually converting natural language TTPs descriptions to standard TTP names, such as ATT&CK TTP names and IDs, is time-consuming and requires deep expertise. In this paper, we define the TTP classification task as a sentence classification task. We annotate a new sentence-level TTP dataset with 6 categories and 6061 TTP descriptions from 10761 security analysis reports. We construct a threat context-enhanced TTP intelligence mining (TIM) framework to mine TTP intelligence from unstructured threat data. The TIM framework uses TCENet (Threat Context Enhanced Network) to find and classify TTP descriptions, which we define as three continuous sentences, from textual data. Meanwhile, we use the element features of TTP in the descriptions to enhance the TTPs classification accuracy of TCENet. The evaluation result shows that the average classification accuracy of our proposed method on the 6 TTP categories reaches 0.941. The evaluation results also show that adding TTP element features can improve our classification accuracy compared to using only text features. TCENet also achieved the best results compared to the previous document-level TTP classification works and other popular text classification methods, even in the case of few-shot training samples. Finally, the TIM framework organizes TTP descriptions and TTP elements into STIX 2.1 format as final TTP intelligence for sharing the long-period and essential attack behavior characteristics of attackers. In addition, we transform TTP intelligence into sigma detection rules for attack behavior detection. Such TTP intelligence and rules can help defenders deploy long-term effective threat detection and perform more realistic attack simulations to strengthen defense.


Introduction
Cyber threat intelligence (CTI) is the information and knowledge used for defense and detection in cyber warfare. Traditional threat protection uses IOC (indicator of compromise) intelligence, such as IP, domain name, and malicious file hash to generate detection rules. However, there are many problems with IOC intelligence-based protection. Zhu and Dumitras (2018) mentioned that, although intelligence sharing standard STIX 2.1 (OASIS 2021) includes an attack pattern field, most open-source intelligence feeds do not provide IOC-related attack patterns or attack technique information to enrich IOC intelligence (e.g., IP threat type: a brute-force IP or a command & control server IP). This makes IOC-based protection more blind. When an IOC-based protection device generates an alert, the defender does not know what kind of attack is going on behind this IOC alert and cannot respond effectively because the IOC does not have an attack technique description. Nation-state APT Open Access Cybersecurity *Correspondence: jiangzhengwei@iie.ac.cn (advanced persistent threat) groups can also easily evade this IOC protection mechanism by shifting their attack infrastructures, such as changing attack IPs, phishing domain names, and modifying malicious codes. Thus, the Pyramid of Pain (DavidJBianco 2021) considers IOC as low-value and easily accessible intelligence.
TTPs (Tactics, Techniques, and Procedures) can describe the long-term behavior and essential features of an attacker. MITRE constructs a TTP knowledge base named ATT&CK (MITRE 2021) to provide TTP unified names and procedure examples of attack techniques. They use the tactic to represent the goals of each attack campaign stage and techniques to describe how attackers accomplish these goals. TTP intelligence can thus be used in penetration tests of enterprises to compensate for defense deficiencies.
However, high-value TTP intelligence is difficult to obtain directly. Cybersecurity analysts generally use natural language to describe TTP intelligence in security analysis reports (Tartare 2021). Figure 1 shows the TTP description examples in a security analysis report. The left represents the report text, and we use different colors to annotate the different TTP descriptions. The left also shows that the TTP description contains a great deal of IOC information and other security terms. We use the gray background to annotate them. This paper refers to these IOCs and other types of security terms in the TTP description as TTP elements. The right side shows the ATT&CK TTP name and TTP elements obtained from the left description text.
Manually converting these TTP descriptions into ATT&CK standard names is very time-consuming and requires in-depth expert knowledge. The existing NLP (natural language processing) methods (training on SST, AG News, DBpedia corpus, etc.) cannot be directly used on cybersecurity-related text because there is no available security corpus to train a TTP classification model.
Several existing TTP classification methods (Ayoade et al. 2018;Legoy 2019;Li et al. 2019;Niakanlahiji et al. 2018) are at the document level, which may cause lowaccuracy problems since the articles may consist of different kinds of TTPs. These methods also have limitations that can only provide static names with a confidence coefficient without providing more details, such as related TTP elements, which are also significant to cyber defenders. The static bag-of-words method proposed by Husari et al. (2017) is not robust enough to classify Fig. 1 The left figure is the origin security analysis report, and the right figure is the corresponding normalized TTP names and TTP elements. Security analysts need to manually extract these attack descriptions to normalize TTP names with the ATT&CK framework. Different colors in the figure represent different TTPs, and elements mentioned in the context are annotated with the gray background TTP intelligence in complex security analysis articles and needs a prebuild knowledge base. The subsequent work of Husari et al. (2018) tries to find threat behavior, consisting of one verb and one noun object. These verbobject pairs are too simple to describe complete TTPs. However, none of the existing TTP classification methods have evaluated the performance of their method on few-shot training sample cases, and many TTPs have only a few description texts.
In addition, previous methods have taken static TTP names as the final outcome and lacked TTP description details. As a result, this TTP information is difficult for defenders to use.

Motivation & Challenges
Based on the above discussion, our motivation is to automate obtaining long-term valid and more essential features of attackers, such as the TTPs from unstructured threat data. Specifically, this paper uses the text classification method to find and classify TTP descriptions from security analysis reports to represent such features and use them for better protection. Therefore, our work mainly faces two major challenges: the lack of available sentence-level TTP dataset and the need to accurately find and classify TTP-related descriptions, even in the case of few-shot training samples.

Our study
We define TTP intelligence as the detailed description and elements of TTPs in unstructured threat data. To obtain TTP intelligence from unstructured threat data, we built a threat context-enhanced TTP intelligence mining framework named TIM that crawls analysis reports from security websites and mines the TTPs intelligence.
To solve the challenge of no available dataset, we build a new TTP dataset at the sentence level. We use the MITRE ATT&CK framework to define TTPs rather than the incomplete and simple threat action used by Husari et al. (2018). Based on the suggestions of frontline threat intelligence analysts, we selected 6 categories of TTPs as classification targets to validate the feasibility of the TTP intelligence mining framework in this paper. These TTP categories cover the 7 major tactics of ATT&CK and are very representative. We annotate 6061 TTP descriptions from 10761 security reports.
To solve the low-accuracy problem of the previous document-level TTP classification methods, we design and propose the TCENet (Threat Context Enhanced Network) model to classify TTPs at the sentence level. We define the threat context as the TTP description and the TTP elements. A TTP description consists of three continuous sentences and includes the textual information of the threat context. The TTPs element represents the 12 categories of security terms contained in the threat context, such as IPs, URLs, file hashes, CVE, protocols, encryption algorithms, etc. We design a TTP elements correlation coefficient calculation method to prove the rationality of adding TTP element features into the classification.
To demonstrate that the TCENet used by the TIM framework can accurately find and classify TTP descriptions in security analysis reports, we perform a variety of evaluations on our annotated dataset.
The evaluation results show that our TCENet is better than previous document-level methods and mainstream text classification methods. The evaluation results also show that the TTP element feature improves the classification performance. In the case of small samples, the TTP classification accuracy of TCENet is still better than that of the comparison model.
Finally, we organize the TTP descriptions and corresponding TTP elements into shareable intelligence in STIX 2.1 format as well as Sigma detection rules (MSig-maHQ 2021). Cyber defenders can obtain more valuable and direct intelligence about the attack and update their defense rules and mechanism without reading the whole security analysis report. TTP intelligence containing specific TTP elements allows for more realistic attack simulation. Sigma detection rules that include TTP names and TTP elements can provide richer threat context information, enabling defenders to make more targeted defenses.

Contributions
In general, our contributions are as follows: • We annotate 10761 reports from 5 security vendor websites and build a TTP corpus containing 6 types of popular ATT&CK TTPs and 6061 descriptions. This will be the first study that builds a sentence-level TTP dataset. • We propose a framework named TIM for mining TTP intelligence from unstructured threat data such as security analysis reports. The TIM framework uses our proposed TCENet to find and classify TTP descriptions from reports. This is the first work to perform TTP classification at the sentence level using pretrained language models and TTP element features. The TIM framework eventually generates shareable TTP intelligence in STIX 2.1 format as well as Sigma detection rules for better protection. • The final experimental results show that the sentence-level TCENet in our TIM framework achieves better performance on precision, recall, and F1 than previous document-level TTP classification work and mainstream text classification methods, even in the case of few-shot training samples. The experimental results also demonstrate that our work can be generalized for mining TTP intelligence in most categories; in particular, some TTPs only have a few description texts. Ayoade et al. (2018) used a TF-IDF with the SVM classifier to classify TTPs at the document level. Legoy (2019) used TF-IDF and Word2Vec to represent the whole security analysis report. They leverage Adaboost, linear SVC, decision tree, etc., to classify TTPs at the document level. The linear SVC with the TF-IDF article vector achieved the best performance in their experiments. Li et al. (2019) used latent semantic analysis to generate topics of targeting articles and compared the topic vectors with the TF-IDF vectors of ATT&CK description pages to obtain cosine similarity. Then, they used these similarity vectors with naive Bayes and decision trees to classify TTPs. Niakanlahiji et al. (2018) used the TF-IDF score of the independent noun phrase in security analysis articles to find the keywords to represent the TTPs. They used these keywords to query analysis articles in their corpus. The above document-level methods can only output several static TTP names and cannot provide more detailed and specific information about an attack. Our work uses regex and gazetteer to extract TTP elements from the TTP description text, making the result more concrete than a static TTPs name. The IOC intelligence extracted in the TTP context can be used for intelligence sharing and detection rule generation, which has more security value than the general IOC without TTP information.

Related works
Other previous methods tried to find more atomistic descriptions of TTP, such as a verb-noun phrase, which they defined as threat action. Husari et al. (2017) used TF-IDF with an enhanced BM25 weight function to generate a word bag of candidate threat action text and compared it with the word bags in their knowledge base to obtain threat action in the text. The subsequent work of Husari et al. (2018) used entropy and mutual information to find the object-verb pairs of high mutual information in the malicious software-related Wikipedia and used these object-verb pairs to find the threat actions of equal mutual information in the security analysis report.
Compared with the above two works, our work does not need a heavy prebuild knowledge base. Unlike the bag-of-words method used by Husari et al. (2017), our work uses a pretrained language model as our word embedding model, which gives more context features than bag-of-words methods or static word-vector models. Therefore, compared with bag-of-words and static word embedding, the pretraining model can improve classification accuracy. Our work uses 6061 TTP descriptions from 10761 reports for TCENet training, which is more robust than the word-bag method in Ghaith's work (Husari et al. 2017). We classify TTPs at the ATT&CK technique level rather than the threat action level in Husari et al. (2018) because the threat action is too simple to represent the complete TTPs.
None of the current work evaluates the performance of TTP classification in few-shot training sample cases. However, many TTPs only have a few description texts, which makes the existing work not generalizable in the TTP classification task.
Our work uses both textual and TTP element features to enhance our TCENet. The evaluation result also shows that the classification accuracy of our TCENet model is better than that of other methods, even in the case of few-shot training samples. This means that our method can be generalized to most TTP classification tasks.

TTPs
Tactics, Techniques, and Procedures are three different levels of the cyberattack campaign derived from military terminology. In this work, we select the most popular 5 techniques and 1 tactic from ATT&CK as our TTP classification targets, as shown in Table 1.
Tactics represent multiphases objectives of an attack campaign, such as initial access, persistence, and privilege escalation. Techniques represent the method to accomplish the stage objective, such as using phishing or drive-by compromise to enter the victim's network; procedures represent the specific implementation instance of a technique.

TTP intelligence
Previous TTP intelligence mining works only output static TTP names as results. These results lack description details about the TTPs and are difficult to use for defense. We define TTP intelligence as the detailed description and elements of TTPs in unstructured threat data.

Threat context
In this paper, the threat context of TTP intelligence is defined as two parts: TTP description and TTP elements.

TTP description
Rather than supposing a whole paragraph as a TTP description, as in the work of Li et al. (2019), we focus on TTPs at the sentence-level to achieve a more accurate classification. We define three continuous sentences as a TTP description.

TTP element
As shown in Fig. 1, the TTP description contains a number of terms that are closely related to a particular TTP: IP, domain name, URL, CVE, and security terms. We refer to these terms as elements. In this paper, we have defined 12 types of TTP elements: IPv4, domain, email, filename, URL, file hash, file path, regkey, CVE, encode&encryption algorithm, communication protocols and the data object keyword (e.g., clipboard, screen, snap-shot, keylogging, password, outlook, etc. ), as shown in Table 2. -2014-6352, to implement the phishing attack. APT groups frequently use this method to gain initial access to victims' networks. This is an example of a TTP description that describes the procedure of two different APT groups. This description belongs to the phishing TTP category. CVE-2017-0199 and CVE-2014-6352 are the TTP elements of the above two TTP procedures examples. Based on the TTP elements in intelligence, defenders can perform different attack simulations and generate threat detection rules to improve the effectiveness of the defense.

TTP classification definition
In this paper, we define the TTP classification problem as a text classification task. Given a sentence S n in an analysis report, we first obtain its context C n = {S n−1 , S n , S n+1 } and sentence embedding CE n using Sentence-BERT (Reimers and Gurevych 2019). We then use regex and gazetteer to extract TTP elements in the context C n . We use the occurrence number of specific TTP element types in C n to represent the TTP element features. The element feature vector represents Elms n = {Elm 1 , Elm 2 , . . . , Elm k , . . . , Elm m } , where Elm k represents the number of occurrences of the k-th type TTP element in the context C n . The length of vector Elms n is 12 because we define 12 types of elements in this work. Our proposed method TCENet uses the description context embedding CE n and the TTPs element type vector Elms n as input and classifies the TTP type TTP i of sentence S n , which can be denoted as Eq. 1.

TTP intelligence mining
TTP intelligence mining is the process of finding TTPrelated description texts from security analysis reports and organizing them into a shareable intelligence format (e.g., STIX 2.1). In this paper, we propose a threat context-enhanced TTP intelligence mining framework TIM that finds and classifies TTP descriptions from security analysis reports by using the TCENet proposed in this paper. The TIM framework then organizes the TTP descriptions and TTP elements into a shareable intelligence in STIX 2.1 format as well as Sigma detection rules. Algorithm 1 describes the TTP intelligence mining process for one cybersecurity analysis report.

Data source
A major contribution of our work is to annotate the first sentence-level TTP dataset. We build our dataset by crawling the security analysis reports from security vendor websites, including: Malwarebytes (Malwarebytes 2021), Securelist (Securelist 2021), Welivesecurity (ESET 2021), Trendmicro (Trendmicro 2021), and Threatpost (Threatpost 2021). We crawl security analysis reports using the category tag of the report (e.g., malware, analysis, apt, etc.). Therefore, it is possible to remove ads and other nonsecurity analysis reports directly. We finally acquired 10761 security analysis reports.
The statistics of reports used by previous documentlevel TTP mining methods (Ayoade et al. 2018;Legoy (1) TTP i = TCENet(CE n , Elms n ) 2019; Li et al. 2019) and our annotated dataset are shown in Table 3.
Our sentence-level dataset are more bigger than that of Legoy (2019) and Li et al. (2019). While 17600 reports of Ayoade et al. (2018) dataset from a single resource of Symantec, ours comes from five different vendors with more balanced distribution. Therefore, our dataset is more general.
The distribution of reports in our dataset according to different vendors is shown in Table 4.

Annotation
Our annotation work is done by three threat intelligence researchers.
Annotators first learn the specific concepts of the 6 ATT&CK TTPs used in this paper. Then, annotators start to read the 10761 security analysis reports collected. We split the report into sentences. The annotator manually extracts the TTP descriptions (three continuous sentences) from the security analysis report and saves them in a file of specific TTPs.
From these reports, we annotated 6061 TTP descriptions and used regexps with a gazetteer to obtain TTPs elements. The annotation number of each TTP category is shown in Table 5. The annotation results were revised by three other cybersecurity researchers.

Dataset validation
As mentioned above, our dataset has been annotated by threat intelligence researchers and revised by cybersecurity researchers with domain expertise. To objectively  To obtain objective and accurate TTP keywords, we use the TTP procedure description instances from the ATT&CK website as the corpus. After removing the stop words, we calculate the TF-IDF score of terms in TTP description instances and select the top ten scored words. We use these keywords for queries in the dataset and calculate how many TTP descriptions are matched. The matching result shows that the keyword match reaches an average of 0.925 in the positive sample. This indicates that the vast majority of our labeled positive samples are consistent with the keywords mentioned in the TTP descriptions of the ATT&CK website. This means that our labeled dataset is valid and can be used for model training.

TTP correlation
The TTP description example in Fig. 1 shows the TTP elements that appear in the TTP threat context and correlate with specific TTPs.
Therefore, we designed a TTP elements correlation calculation method based on our dataset to evaluate the correlation between TTP elements and TTP categories.
We use Cyobstract (cmu-sei 2021) and our TTP element gazetteer to extract TTP elements of each TTP description in our TTP dataset.
We first calculate the support coefficients of each TTP element category at different positions of the TTP description. This coefficient represents the distribution of TTP elements categories in our dataset, as shown in Eq. 2: (2) where i denotes the i-th type of TTP element and p denotes the TTP element position in the TTP description. If an element is described in mid-sentence, it is as a direct element, represented as d in Eq. 2. Otherwise, it is a context element represented as c. Elms i p denotes the i th type elements described in position p. We use a logarithmic fraction to measure the support coefficients of the Elms i in position p. Elms i sup denotes two different position support coefficients of Elms i .
For each element in a specific TTP description, we select three verbs closest to the target element and then compare these verbs to the verbs in the corresponding ATT&CK TTP description page with BERT (Devlin et al. 2018) embedding. We select ATT&CK description verbs by using TF-IDF. Then we calculate the cosine similarity between the candidate verb and the TTP description page verb to find the most similar verb in a TTP description. SimV denotes the max cosine similarity, and the Vmax denotes the most similar verb.
We also take the distance factor between the verb Vmax and element into account, denoted as dist (Vmax, Elms). Thus, the text relevance about textual and lexical features of Elms i computed from: j in Eq. 3 is the j th instance of the TTP description. With the data distribution, textual, and lexical features, we calculate the average correlation coefficient of the i th element type to the specific TTPs using Eq. 4, where |Elms i | is the amount of the i th element type.
We then normalize each TTP correlation coefficient score and show the coefficient by using a heat map in Fig. 2. The normalized coefficient score takes values from 0 to 1. Scores close to 1 indicate strong relevance, and scores closer to 0 indicate weak relevance.
The result in Fig. 2 shows that there is a strong correlation between some elements and the particular TTPs. The code method, protocol, and data object TTP elements are the most correlated elements in the obfuscate, C2 ppplication layer protocol, and collection TTPs. These elements can provide details to the TTP instance, so they frequently appear in the specific context. The filename element is the most correlated element in the Scheduled Task/Job TTPs because attackers would use some scripts (e.g., .bat or .vbs ) and malicious files to create a scheduled task or create a scheduled task to execute other malicious files and scripts to perform a further attack. The Data object element is It is worth noting that the email elements obtain a low correlation score because only a few reports disclose the attack email address or victim address. The URL elements also obatin a low correlation value because there are only a few Phishing description instances that mentioned the specific URL in our dataset.
Many URL phishing description texts would only mention the attacker performing the attack by using URLs without giving the URL details, or the phishing URLs are mentioned outside the context window, so they would not be considered TTP elements.
The regkey element encounters the same problem that it may be described outside the context window, so there are only a few instances in our dataset. This is the limitation of our work, and we will discuss how to solve it in the future work section.

Threat context-enhanced TTP intelligence mining framework
We designed a threat context-enhanced TTP intelligence mining framework named TIM, as shown in Fig. 3. The TIM framework consists of five modules: crawling, preprocessing, feature embedding, TTP classification, and TTP intelligence generation.

Crawling and preprocessing Crawling
We first crawl 10761 security analysis reports from 5 data sources using category tags such as malware, threat analysis, etc., to filter security-related articles.

Preprocessing
We use BeautifulSoup (Richardson 2021) to clean all HTML tags and continuous line breaks. As we defined above, the TTP description contains three continuous sentences. We then split these articles with a 3-size sliding window by NLTK tools in Python. Next, we extract the TTP elements from each TTP description. We use Cyobstract (cmu-sei 2021) to extract TTP elements of the IOC type. The actual output of Cyobstract is normalized format IOC, such as 192.168.1.1. Thus, we modify the output function of Cyobstact to find the original IOC elements in the cyber analysis report, such as 192[.]168[.]1[.]1. We also construct a TTP element gazetteer, as shown in Table 6, to match non-IOC elements such as protocol names and encryption algorithms.
We then replace all found TTP elements with element holder $[Elms. ] to avoid unexpected tokens when tokenizing the whole TTP description. We resume these elements after TTP classification and use them to generate TTP intelligence. Figure 4 shows the architecture of our TCENet. Based on the TTP elements correlation result in Fig. 2, the TCE-Net model consists of two paths: element feature extraction path (upper) and description feature extraction path (lower). A fully connected layer would jointly learn the feature extracted by these two paths for final classification. We use binary-relevance to train the TCENet model on 6 types of TTP data.

Elements feature path
We use regex and a TTP elements gazetteer to extract 12 types of TTP elements in the TTP description. We construct the number of occurrences of each element type as a TTP element feature vector Elms.
Each type of element corresponds to one dimension of the vector. For example, if there are 2 hashes, 3 email addresses and 1 CVE ID described in the spear-phishing TTP threat context, the element embedding vector Elms would be the vector as: [0, 0, 3, 0, 0, 2, 0, 0, 1, 0, 0, 0]. Elements are organized in the following order: [ip, fqdn, email, filename, url, hash, file path, regkey, cve, code method, protocol, data object]. We then normalize Inspired by the malware analysis work of Nataraj et al. (2011) that transforms the binary file to a 2D greyscale map, we resize the element vector to a 4*3 matrix and use two different CNN filters to extract element features Elmf n .
In the TTPs element correlation section, we have proven that some types of elements may co-occur in the specific TTP description context. Therefore, transforming the TTP element vectors into 2D matrices can express the spatial relationship of co-occurring TTPs elements in the matrix. We use CNN to obtain the spatial features of TTP elements in the matrix, which cannot be obtained by 1D TTP element vectors and the fully connected layer. The Elmf n is computed from: where σ is the ReLU activation function and W k denotes different filters.
(5) Elmf n = σ (W k · Elms n + b) Fig. 3 Threat Context Enhanced TTPs Intelligence Mining Framework (TIM). The whole workflow starts with the crawling module. Via preprocessing, feature embedding, TTP classification (TCENet), and intelligence&detection rule generation modules, we finally obtain TTP intelligence in STIX 2.1 format and Sigma detection rules. We use these TTP intelligence and detection rule for intelligence sharing and defense We compare the feature extraction accuracy, recall, and F1 of using the fully connected layer or the CNN layer in the evaluation section. The result shows that the 2D element matrix with CNN performs better than the 1D vector with a fully connected layer. The element correlation heat map (Fig. 2) also proves that there is a co-occurrence relationship among TTP elements.
At the end of the element feature path, we use two different max pooling to handle the feature vector from two different CNN filters. The max-pooling layer lowers the feature dimension and reserves the main features. This path finally outputs a 4 * 1 dimension vector Elmfp n .
Description Feature Embedding. We use Sentence-Bert (Reimers and Gurevych 2019) to embed the description text into three 768-dimensional vectors CE n . These vectors capture the word features inside the sentence, and with the sentence-BERT mean-pooling embedding, the embedding vector could represent the sentence and be used for downstream tasks.
Then, we feed the sentence-embedding into stack Bi-LSTM. The output of the stack Bi-LSTM layers computed from: where → h j and ← h j represent the j-th state produced by LSTM from two directions; x j is the j-th input vector; h j is the j-th state, and [·, ·] represents the concatenation operation. After that, the attention mechanism (Shen and Lee 2016) outputs the weighted summing of the Bi-LSTM output sequence H = [h 1 , h 2 , · · · , h n ] , which is computed from: where Z is the TTPs description representation. Next, we use a fully connected layer to lower the dimension of the attention layer output. The final output of the description feature path is a 128-dimensional vector Z l .

TTPs classification
After TTP elements feature embedding and TTPs description feature embedding, we obtain the element features Elmfp n and textual features Z l . At the end of our TCENet architecture, we concatenate these two feature vectors into a 132-dimensional vector Z c and use a fully connected layer to output the final vector Z f . We use the position in the final two-dimensional vector as the class label. If the max Fig. 4 The TCENet architecture. The upper path uses CNN and max-pooling to extract element features, and the lower path uses stack-BiLSTM + attention to extract text features. A fully connected layer is used to obtain the final feature vector after concatenating the result of two paths value appears in the first dimension, the prediction result is negative; otherwise, it is positive, which is computed from: We use the cross-entropy as the loss function and use binary relevance to train six different TTP classification models. The loss L is computed from: where y is the true label of the TTP description, and pred is the predicted result of our TCENet. α and β are crossentropy weights used to balance positive and negative train samples. We minimize the loss L to train the TCE-Net. Algorithm 3 summarizes the training process.

TTP intelligence generation
Based on our proposed TIM framework, we organize the TTP descriptions and TTP elements into Sigma (MSig-maHQ 2021) attack detection rules and shareable intelligence in STIX 2.1 format, as shown in Fig. 5.
Sigma is a generic and open signature format that allows defenders to describe cyber-attack log events. Sigma rules can be used to transform TTPs into search criteria for system logs and SIEM alert events, as well as detection rules for defensive devices such as firewalls to detect threats in the system. Sigma rules can also be used for direct sharing, such as in the MISP intelligence community.
STIX 2.1 (OASIS 2021) is a language and a serialization format used to exchange cyber threat intelligence (CTI). Defenders can also use STIX 2.1 TTP intelligence for penetration testing to simulate attack methods and optimize protection strategies.
As shown in Fig. 5, we organize the TTP description and TTP element information obtained from the TCE-Net into STIX 2.1 intelligence and Sigma rules for querying relevant threats in the log data of multiple protection devices. The defender can also better grasp the longperiod and more essential attack characteristics of the attacker by using TTP intelligence. At the same time, we share the TTP intelligence and Sigma rules in the intelligence community, so that defenders can defend against threats more timely and effectively. Examples of TTP intelligence in STIX 2.1 format and Sigma detection rules can be found in our anonymous Github repository (TCENet 2021).

Evaluation
In this section, we evaluate the proposed TCENet using our labeled dataset.

Metrics
We evaluate the precision, recall, and F1 metrics of the TCENet and other models in comparison experiments and ablation experiments.
TP (True Positives) and TN (True Negatives) denote correctly classified data, while FP (False Positives) and FN (False Negatives) denote misclassified data.

Evaluation data
Since the model in this paper uses a binary-relevance method, we construct a negative sample set for each TTP category.

Negative samples
The negative samples consist of the non-TTP descriptions, which are also annotated by annotators, and the TTPs of the other categories. For model training, we use non-TTP descriptions equal to the number of positive samples and other TTP descriptions as negative samples. The negative sample composition is shown in Eq. 14: The Nsam j in Eq. 14 denotes the negative sample numbers of the j th type TTPs. Non TTP denotes the number of non-TTP descriptions, which is equal to the number of positive samples of the j th type TTPs. Other i TTP denotes the number of positive samples of the i th type TTP descriptions, where i = j . The m denotes all 6 categories of TTPs. Since we do not use positive samples of the j th type TTPs as its negative samples, only positive samples of other m-1 TTP categories are used as negative samples. Table 7 shows the number of positive and negative samples for the six types of TTP.

Dataset validation result
In the dataset section, we propose using the TTP keyword matching method to validate our dataset. Table 8 shows the matching rate of TTP keywords in both positive and negative samples. Table 8 also shows the accuracy of classifying TTP descriptions directly by TTP keywords.
The results show that the average matching rate of TTP keywords in the positive sample is 92.5%. This indicates that the vast majority of our labeled positive samples are consistent with the keywords mentioned in the TTP descriptions of the ATT&CK website. This means that our dataset is valid and can be used for model training.
However, TTP keywords cannot be directly used to classify TTP descriptions. The matching result also shows TTP keywords can also match many TTP description negative samples. Meanwhile, due to the limited nature of keyword enumeration, not all samples of TTP (14) descriptions can be covered by keywords. Moreover, if we directly classify the TTP descriptions using TTP keywords, it would introduce 28.3% false positives. Therefore, training a deep learning model for TTP classification can address the limitations of TTP keyword enumeration and also identify the false positive samples that are easily confused by the keyword matching approach. Our subsequent experiments show that our TCENet achieves an accuracy of 0.94 on 6 TTP classifications, which is much higher than the accuracy of keyword-based TTP classification (0.82).

Baseline model
The models we chose for comparison can be divided into four categories: document-level methods from previous work, machine learning methods based on static word embeddings, deep learning methods based on static word embeddings, and deep learning methods based on pre-trained models. Ayoade et al. (2018) and Legoy (2019) both use TF-IDF with an SVM classifier to classify TTPs at the document level. Li et al. (2019) leverage latent semantic indexing to compare the targeting analysis articles with ATT&CK description articles and use SVM with the cosine similarity to classify TTPs. Machine learning methods based on static word embedding include: Doc2Vec (Le and Mikolov 2014) with Linear SVC, Doc2Vec with Decision Tree (DT), Doc2Vec with random forest (RF). Deep learning methods based on static word embedding include: FastText (Joulin et al. 2016), TextCNN (Rakhlin 2016) with GloVe word embedding (Pennington et al. 2014), Bi-LSTM + Attention with GloVe word embedding. Methods based on pre-trained models include Bert-CLS and our proposed TCENet.

Train settings
We grid search for the best performance hyperparameter of our TCENet and other baseline models. Table 9 shows the results of our experiments on the hidden layer size and layer number of the Bi-LSTM network.
Based on the experimental results, we finally used a 3-layer Bi-LSTM and a hidden layer size of 200. The other hyperparameters are shown in Table 10. For crossentropy weights α and β in Eq. 9, we use the inverse ratio of positive and negative samples as the weight to train the model.

Overall results
We evaluate the overall accuracy of our TCENet on all six TTPs. We divide each TTP-labeled dataset into training, validation, and testing sets according to a 7:1:2 ratio. We train each model for 80 epochs. Table 11 shows the overall accuracy on six TTPs by using TCENet. The phishing classification model achieves the best performance because it has the largest dataset (2599 positive samples). The accuracies of obfuscated files or information and deobfuscate/decode Files or information are 0.92 and 0.916, respectively, because they have a smaller annotated dataset (439, 475, respectively). Table 12 shows the precision, recall, and F1 of TCENet compared to the three previous methods and six baseline models. Comparison evaluation is performed on the Phishing TTP data.

Comparison evaluation
The result shows that our TCENet method achieves the best performance on all three metrics, and sentence-level   Cross-Entropy Weight Inverse ratio of positive and negative samples methods are obviously more accurate than documentlevel methods. The results also indicate that methods based on the pretrained language model perform better than static word embedding methods such as GloVe and fastText. Language models such as BERT and its variant Sentence-BERT consider the context features of a word and generate dynamic word embedding compared to static word embedding methods using the co-occurrence matrix. Three Dov2Vec based baseline models achieve approximate results. The method using the random forest (RF) classifier performs better than the linear SVC and decision tree (DT).
FastText considers the n-gram features of words and achieves the best precision among the three static word embedding models. BiLSTM+Attention considers the temporal features of text and uses the attention mechanism to determine the weights of context and achieves the best recall among the three static word vector models. TextCNN uses multiple convolution kernels to capture the spatial features of the text and achieves the best F1 score among the three static word embedding models.
Our TCENet and the mainstream BERT-CLS model performed better than the above baseline. TCENet outperforms BERT by 3-4% on three metrics. TCENet uses a pretrained model, considers the differences between contextual sentences, and assigns weights to contexts using bidirectional LSTM and attention. Additionally, TCENet uses TTP element features to enhance the classification effect.
We then conducted ablation experiments to explore the effects of text features and TTP element features on the final classification results.

Ablation experiment
To demonstrate the effectiveness of each component of TCENet, we perform an ablation experiment. We evaluate the classification accuracy results using only TTP element features, only text features, and TCENet variants, as shown in Table 13. For TCENet variants, we change or remove different parts of TCENet to prove the validity of each part, e.g., using different neural networks to extract the text or the TTP element features, or not using TTP elements.
We first evaluate the TTP classification performance of the TCENet model without considering text features using only TTP element features and CNN. We denote this model as Only TTPs Elms. in Table 13.
The TCENet w/o Elms with CNN model in Table 13 uses Sentence-BERT for text embedding and CNN to extract context textual features without any elements features. The TCENet w/o Elms with BiLSTM model uses BiLSTM to extract context textual features without elements features. The TCENet with FC_E (TCENet with a fully connected layer for element features) uses contextual text features and TTP element features for TTP classification. It uses FC for TTP element features extracting. The TCENet with FC_C (TCENet with a fully connected   Nataraj et al. (2011), who transformed binary files into matrices, the TCENet transforms 1D TTP element vectors into 2D TTP element matrices and uses CNN to extract the spatial features of TTP element co-occurrences implicitly in the matrices.
The results show that both context description features and TTP element features improve the TTP classification performance. The result also shows that the model cannot perform effective classification when using only TTP element features. Without TTP elements features, the two TCENet variants (w/o Elms. models in Table 13) drops 3-4 % compared with TCENet. The TCENet with FC_E and the TCENet with FC_C leverage FC to extract TTP element features and context features. These two TCNet variants cannot better capture the text and TTP element features using FC than TCENet.
Therefore, TCENet obtains the best evaluation results using CNN to extract elemental features and BiLSTM to extract contextual features.

Few-shot evaluation
Some ATT&CK TTPs may have only a small amount of description text. Therefore, we performed a few-shot evaluation on the obfuscated files or information TTPs dataset, which has the least data.
In this experiment, we divided the positive sample data into training and test data a ratio of 8:2. We then keep reducing the positive sample training data (from 350 to 50) to evaluate the performance of different models in the case of few-shot training samples. The results are shown in Fig. 6.
From 350 to 50 training samples, Doc2Vec+RF and FastText's performance drops sharply on 200 samples.
TextCNN also drops sharply on 50 samples, which obtains only a 0.638 accuracy score.
In this experiment, the results of BERT-CLS and the TCENet w/o Elms are similar. Without element features, the performance of TCENet w/o Elms, is also influenced by the number of training samples when it drops to 100.
Our TCENet method achieves the most stable performance on small training sets and achieves 0.857 accurate performance even when the training dataset drops to 50 samples.
The results demonstrate that our TCENet can be generalized to most TTP classification tasks, even in the few-shot training data case.

Annotation cost reduction
TTP data annotation requires expert knowledge and is time-consuming. However, there is no available dataset for sentence-level TTP description, which also hinders research in TTP classification. Therefore, our dataset is necessary and valuable. We annotated a total of 10761 security articles with 6061 TTP descriptions in 6 TTP categories. Based on the experience of annotation and the results of our proposed method, we believe that we can reduce the annotation cost and extend the approach of this paper to other TTP annotation tasks in two ways.
To reduce the time cost of TTP annotation, the annotation process can utilize the aforementioned TTP keyword matching method to prioritize the annotation of sentences containing TTP keywords in the security analysis reports. The annotator only needs to confirm whether the matched descriptions are false positives. False alarm data can be used as negative sample data for model training data. Therefore, the annotator does not need to read the full analysis report to obtain the TTP description data.
The few-shot evaluation experiment (shown in Fig. 6) shows that our TCENet achieves an accuracy of 0.857 even for 50 training data samples and 0.93 for 350 training data samples. Therefore, we believe that the absolute number of data annotations can be reduced when the TCENet model is extended to other TTP classification tasks.

Limitations
In this paper, we use a sliding window of size 3 to obtain the TTP description, and the annotators keep only these three sentences in the final dataset when annotating the data. However, we find that some TTP elements may be outside the sliding window, so some elements in the TTP element association heat map show weak associations with TTPs, such as Phishing-URL and Scheduled Task/ Job-Regkey. These TTP elements and TTPs that theoretically have strong correlations may also show weak correlations in the heat map (Fig. 2) due to insufficient data. In future work, we will retain longer contextual information to introduce more TTP element features in TTP classification.

Conclusions
In this work, we propose a threat context-enhanced TTP intelligence mining framework named TIM to mine TTP intelligence from unstructured threat data. This framework uses TCENet to classify sentences in security analysis reports for TTP intelligence by using threat context features consisting of TTP descriptions and TTP elements. TCENet achieve an average of 0.94 classification accuracies on 6 types of TTP data and achieves the best performance compared with previous document-level methods and mainstream text classification methods. The TTP element features promote overall performance by 2-3%. Our TCENet achieves considerable performance (0.875) even in the case of few-shot training samples, which means our proposed method could be generalized to classify most ATT&CK TTPs with a few training data.
Our TIM framework finally organizes the TTP description and the TTP elements into STIX 2.1 intelligence format and Sigma attack detection rules. TTP intelligence and sigma detection rules can be used to attack simulation and threat detection and greatly benefit security defenders for better protection in enterprise security operations centers.
In the future, we will find the relationship of TTPs and their elements in the global document to solve the limitations of this work. We will also expand our dataset and use our proposed TCENet on all ATT&CK TTPs. With TTP intelligence and other cybersecurity entities, we will build a cyber threat knowledge graph to go deeper into APT attack campaigns in a more grand threat context.