Skip to main content

Table 3 Evaluation metrics with proper description

From: Enhanced detection of obfuscated malware in memory dumps: a machine learning approach for advanced cybersecurity

Evaluation metrics

Description

Accuracy (ACC)

Accuracy, defined as the ratio of correctly predicted observations to the total observations, is calculated as \(Accuracy = \frac{{True \,Positives \,\left( {TP} \right) + True \,Negatives\, \left( {TN} \right)}}{Total \,Observations}\), providing a fundamental measure of the models overall predictive correctness

Precision (PR)

Precision, a key metric in model evaluation, is quantified as \(\frac{{True \,Positives \,\left( {TP} \right)}}{{True \,Positives\, \left( {TP} \right) + False \,Positives\, \left( {FP} \right)}}\), reflecting the proportion of true positive predictions among all positive predictions made by the model

Recall (RE)

Recall, an essential metric in classification models, is determined by the formula \(Recall = \frac{{True\, Positives \,\left( {TP} \right)}}{{True\, Positives\, \left( {TP} \right) + False \,Negatives \left( {FN} \right)}}\), capturing the model’s ability to correctly identify all relevant instances within a dataset

F1-score (FS)

F1-score, a crucial metric that balances precision and recall, is calculated using the formula \(F1\)\(- score = 2*\frac{Precision \times Recall}{{Precision + Recall}}\), thereby providing a harmonic mean that encapsulates the model’s accuracy in classifying data points correctly

False positive rate (FPR)

FPR, a critical metric in assessing classification errors, is \(FPR = \frac{False \,Positives \,(FP}{{False\, Positives\, \left( {FP} \right) + True\, Negatives \,\left( {TN} \right)}}\), computed as quantifying the proportion of negative instances incorrectly classified as positive by the model

Error rate (ER)

Error rate, a metric that quantifies the overall prediction inaccuracies of a model, is calculated as \(Error \,Rate = \frac{{False \,Positives\, \left( {FP} \right) + False \,Negatives \,\left( {FN} \right)}}{Total \,Observations}\), effectively measuring the proportion of all predictions that the model got wrong

AUC score (AUC)

The Area Under the Curve (AUC) Score, a measure of the model’s ability to distinguish between classes, is calculated by plotting the TPR against the FPR at various threshold settings, with the AUC value ranging from 0 to 1, where a higher value indicates better classification performance

Cohen’s Kappa (KP)

Cohen’s Kappa, a statistical measure of inter-rater agreement for categorical items, is calculated as \(Kappa = \frac{{P_{o} - P_{e} }}{{1 - P_{e} }}\), where \(P_{o}\) represents the relative observed agreement among raters, and \(P_{e}\) is the hypothetical probability of chance agreement, providing a robust assessment of the model’s predictive accuracy beyond random chance