Exploring best-matched embedding model and classifier for charging-pile fault diagnosis

Wang, Wen; Wang, Jianhua; Peng, Xiaofeng; Yang, Ye; Xiao, Chun; Yang, Shuai; Wang, Mingcai; Wang, Lingfei; Li, Lin; Chang, Xiaolin

doi:10.1186/s42400-023-00138-z

Research
Open access
Published: 04 April 2023

Exploring best-matched embedding model and classifier for charging-pile fault diagnosis

Wen Wang¹,
Jianhua Wang³,
Xiaofeng Peng¹,
Ye Yang¹,
Chun Xiao²,
Shuai Yang²,
Mingcai Wang¹,
Lingfei Wang¹,
Lin Li³ &
…
Xiaolin Chang ORCID: orcid.org/0000-0002-2975-8857³

Cybersecurity volume 6, Article number: 7 (2023) Cite this article

2036 Accesses
Metrics details

Abstract

The continuous increase of electric vehicles is being facilitating the large-scale distributed charging-pile deployment. It is crucial to guarantee normal operation of charging piles, resulting in the importance of diagnosing charging-pile faults. The existing fault-diagnosis approaches were based on physical fault data like mechanical log data and sensor data streams. However, there are other types of fault data, which cannot be used for diagnosis by these existing approaches. This paper aims to fill this gap and consider 8 types of fault data for diagnosing, at least including physical installation error fault, charging-pile mechanical fault, charging-pile program fault, user personal fault, signal fault (offline), pile compatibility fault, charging platform fault, and other faults. We aim to find out how to combine existing feature-extraction and machine learning techniques to make the better diagnosis by conducting experiments on realistic dataset. 4 word embedding models are investigated for feature extraction of fault data, including N-gram, GloVe, Word2vec, and BERT. Moreover, we classify the word embedding results using 10 machine learning classifiers, including Random Forest (RF), Support Vector Machine, K-Nearest Neighbor, Multilayer Perceptron, Recurrent Neural Network, AdaBoost, Gradient Boosted Decision Tree, Decision Tree, Extra Tree, and VOTE. Compared with original fault record dataset, we utilize paraphrasing-based data augmentation method to improve the classification accuracy up to 10.40%. Our extensive experiment results reveal that RF classifier combining the GloVe embedding model achieves the best accuracy with acceptable training time. In addition, we discuss the interpretability of RF and GloVe.

Introduction

Recently, with the acceleration of global warming, human beings have realized that unrestricted use of fossil energy is harmful to the earth. Electric vehicles (EVs), with the advantage of environment-friendliness and energy efficiency, are considered to replace traditional fuel vehicles (Yan et al. 2019). With the increasing number of EVs, many distributed charging piles are among the essential infrastructures (Chen et al. 2020). Generally, a large number of charging piles locate in the wild with uncontrollable environmental factors, causing frequent charging-pile faults. Therefore, it is crucial to maintain the effectiveness of charging piles (Zhang et al. 2022; Wei et al. 2021).

Charging-pile service companies have been bringing a series of measures into force, with the aim to guarantee the effectiveness of charging piles. For example, when the customers encounter problems, they offer a service hotline and WeChat (Hao et al. 1087) mini program to publish emergency work orders. We now explain why it is necessary for a service provider to predict charging-pile faults to improve the efficiency of repairing service. The occurrence of charging-pile work orders may be due to a mechanical fault or cyber security. We can imagine a scenario of mechanical fault: (a) a customer describes a fault of the charging pile using the service hotline; (b) the staff receives the fault work order, records the fault description, and dispatches maintenance workers to repair piles; (c) maintenance workers finish the work order and submit the fault category to the service system. However, dispatching maintenance workers will waste human and material resources if the fault is in the software platform or online electric system. Moreover, from the aspect of cyber security, security analysis and protection mechanisms must be conducted in order to improve the communication security between EVs and charging piles (Li et al. 2021). These discussions emphasize the importance of predicting charging-pile faults.

Recently, machine learning (ML) or deep learning (DL)-based techniques play a crucial role in charging-pile fault diagnosis (Shuai et al. 2022; Du et al. 2021) and abnormal detection (Li et al. 2021). Especially, Li et al. (2021) utilized Random Forest (RF) classifier to implement abnormal detection. However, existing studies on charging-pile fault diagnosis focus on the mechanical log data or sensor data streams (Gao et al. 2020, 2018; Wang et al. 2021; Yong and Ji 1650), while we concentrate on work order fault description data recorded by staff (different from mechanical log data and sensor data streams) and classify 8 types of faults, including installation error fault, charging-pile mechanical fault, charging-pile program fault, user personal fault, signal fault (offline), pile compatibility fault, charging platform fault, and other faults.

Figure 1 presents a simplified workflow of our paper. We firstly collect the raw data from the real-world electric service work orders to build a fault record dataset. Then, we conduct data preprocess by utilizing Jieba (Junyi 2022) tokenizer to tokenize the Chinese fault description. After that, we extract fault features based on fault description by adopting the extensively used word embedding models, such as N-gram (Suen 1979), Word2vec (Mikolov et al. 2013), GloVe (Pennington et al. 2014), and BERT (Devlin et al. 2018). At last, we utilize 10 ML or DL classifiers, including RF (Breiman 2001), Support Vector Machine (SVM) (Cortes and Vapnik 1995), K-Nearest Neighbor (KNN) (Sebastiani 2002), Multilayer Perceptron (MLP) (Rumelhart et al. 1986), Recurrent Neural Network (RNN) (Elman 1990), AdaBoost (AB) (Freund and Schapire 1997), Gradient Boosted Decision Tree (GBDT) (Friedman 2001), Decision Tree (DT) (Breiman et al. 2017), Extra Tree (ET) (Geurts et al. 2006), and VOTE, to classify the word embedding features for fault diagnosis.

We summarize the following main contributions:

We create a dataset of realistic charging-pile faults. Specifically, we collect original long-term real-world electric service work orders from June to December 2021. Moreover, we select fault description and category to build a structured fault record dataset. “Fault record dataset” section details the building of the dataset.
We carry out extensive experiments to explore the best-matched combination between 4 fault description feature extraction models and 10 classifiers for effective fault diagnosis. To the best of our knowledge, we are the first to achieve all types of charging-pile fault diagnoses using fault descriptions (“Experimental result and discussion” section).

The left paper is organized as follows. “Preliminary” section overviews word embedding approaches and classifiers. “Fault record dataset” section gives the fault-record dataset. Experimental results and discussion are provided in “Experimental result and discussion” section. “Conclusion” section presents the conclusion.

Preliminary

Word embedding vector is a crucial feature extraction approach and benefits calculating the cumulative sentence embedding to conduct ML operation. This section first introduces 4 word embedding approaches to be investigated in this paper, including TF-IDF N-gram, Word2vector, GloVe, and BERT. Then 10 ML/DL classifiers are presented.

Word embedding approaches

Four word embedding approaches are discussed.

N-gram (Suen 1979)

It is a distinguished language feature extraction method. Due to its outstanding performance in dealing with sequence information, N-gram has been used in text feature extraction and classification fields and also achieved great success. N-gram utilizes a sliding window to divide a sequence into n-slice parts. After counting the term frequency-inverse document frequency (TF-IDF) and One-Hot embedding, we obtain a sequence embedding. As illustrated in Fig. 2, the red box is a sliding window whose sizes are 2, 3, and 4. Then the Chinese Word (CW) sentence of our corpus will be mapped into a vector.

Word2vec (Mikolov et al. 2013)

It is a neural network-based algorithm for training word vectors. It has two types of architecture. One is the Continuous Bag-Of-Words (CBOW) model, and the other is the continuous skip-gram model. CBOW is similar to Feedforward Neural Net Language Model (Bengio et al. 2000), where the non-linear hidden layer is removed, and the projection layer is shared for all words. After the training converges, words with similar meanings are mapped to a similar position in the vector space (illustrated in Fig. 3).

GloVe (Pennington et al. 2014)

It was proposed as a global vector for the word embedding model in 2014. This model combines the advantages of global matrix factorization and local context window methods and efficiently leverages the statistical information of a large corpus. After training on the non-zero elements in the word-word co-occurrence matrix, GloVe will produce a vector space with meaning in a fixed dimension. Figure 4 discloses the flow of GloVe training. We put corpus as input. Then we count CW term frequency and compute the co-occurrence matrix to train GloVe using proper hyper-parameters. At last, we obtain the word embedding result with a specific dimension.

BERT

Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al. 2018) considers the bidirectional contexts and achieves denoising autoencoding-based model pre-training. It performs better than pre-training methods based on autoregressive language modeling (Yang et al. 2019). As illustrated in Fig. 5, if we input our corpus, each CW will obtain a token embedding, a sentence embedding, and a position embedding. Then all of them have to be put in two layers bidirectional transformer. After that, the contextual representation will be output as a specific dimension vector for the following training.

Classifiers

RF (Breiman 2001)

This classifier is based on ensemble learning and involves many independent decision trees. It uses bootstrap to extract samples as input and combines each decision tree classification result. Then RF gains the classification result via majority voting. In fact, it overcomes the over-fitting of a single tree by taking the average of multi predictions.

SVM (Cortes and Vapnik 1995)

SVM maps input vectors non-linearly to high dimension feature space, which builds a hyperplane. It aims at maximizing the margin between the two sides of a separating hyperplane.

KNN (Sebastiani 2002)

KNN is a widely used text classifier due to its simplicity and efficiency. It computes the nearest neighbors of each point by majority vote to classify.

MLP (Rumelhart et al. 1986)

MLP is a feedforward artificial neural network model. Given a set of features, MLP can learn a non-linear function approximator for classification.

RNN (Elman 1990)

RNN is a kind of neural network and is effective in processing sequence text data classification. Unlike feedforward neural networks, RNN can recurrent in the self-network to obtain a better sequence representation.

AB (Freund and Schapire 1997)

A new weak classifier is added in each AB training round until the predetermined error rate is reached. Each training sample is assigned a weight indicating the probability that it is selected into the training set by a classifier.

GBDT (Friedman 2001)

GBDT classifier is composed of multiple decision trees, and the conclusion of all trees adds up to the final classification result. Notably, the previous decision tree's residual is taken as the next decision tree's input.

DT (Breiman et al. 2017)

DT is a non-parametric supervised learning method used by the classifier. It utilizes a set of if-else decision rules to learn from data. Therefore, DT is simple and easy to understand and interpret.

ET (Geurts et al. 2006)

This classifier implements many randomized decision trees on various sub-samples and uses averaging to improve the predictive accuracy and control over-fitting.

VOTE

The VOTE classifier is an ML model that trains on an ensemble of numerous models and predicts an output based on the highest probability of chosen class as the output. It will simply aggregate the result of each classifier and predict the output based on the highest majority of voting. Instead of creating separate dedicated models and finding the accuracy for each classifier, VOTE will create a single model which trains by these models and predicts output based on their combined majority of voting for each output.

Fault record dataset

In this section, we first introduce one example of raw data. Then, we conduct raw data analysis, including work order source, top 10 cities or provinces of fault recordings, and the relationship between month and fault record amount. At last, we build a fault record dataset for subsequent studies.

Raw data sample

We collect the 8,481 raw data from an actual Internet of Vehicles platform service center from June to December 2021. Intuitively, we give one example of raw data in Table 1, which includes pile number, work order source, work date, work city, work order number, client type, fault description, work order state, fault category, and fault reason. Notably, we use ‘xxx’ to represent the actual number considering data privacy.

Table 1 One example of raw data

Exploring best-matched embedding model and classifier for charging-pile fault diagnosis

Abstract

Introduction

Preliminary

Word embedding approaches

N-gram (Suen 1979)

Word2vec (Mikolov et al. 2013)

GloVe (Pennington et al. 2014)

BERT

Classifiers

RF (Breiman 2001)

SVM (Cortes and Vapnik 1995)

KNN (Sebastiani 2002)

MLP (Rumelhart et al. 1986)

RNN (Elman 1990)

AB (Freund and Schapire 1997)

GBDT (Friedman 2001)

DT (Breiman et al. 2017)

ET (Geurts et al. 2006)

VOTE

Fault record dataset

Raw data sample

Raw data analysis

Fault record dataset

Experimental result and discussion

Data preprocessing

Data augmentation

Experimental goal

Experimental configuration

Metric

Experimental results

Training time comparison

Imbalance learning comparison

Performance comparisons

Performance with data augmentation

Interpretability discussion

Result discussion

Why the accuracy of the raw data with data preprocessing is low?

What can we learn from the interpretability result?

What can we conclude from the different performances of models?

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Authors’ Information

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords