Credit Card Fraud Detection using SMOTE and Ensemble Methods
Abstract
We focused on the study of using math modeling and machine learning to do big data analysis, therefore to detect Credit card fraud, which is one of the serious issues in real life. In order to detect credit card fraud, after reviewed many recent research, we chose the most popular models among credit card fraud detection, which are Random Forest (RF), and ANN with multi-layers (DNN). We evaluated the accuracy and recall of these models in detecting credit card fraud with or without SMOTE, and found out that there is no significant improvement in the accuracy of these models with or without SMOTE training, but RF with SOMTE has a little bit vantage than others. There is a significant improvement in recall of these three models with SMOTE training. Especially, with SMOTE training, ANN or DNN is of better performance in the recall than RF. Therefore, we combine RF and DNN to generate a hybrid model so that it produces better stability in accuracy and recall. The study discovered that neural network models have greater potential for finding abnormal data in the big data stream. This has important guiding significance for what mathematical model that credit card companies use to monitor the cash flow and remind customers of the possible risk of credit card fraud.
Keywords
Download Options
Introduction
Credit cards are convenient to use and easy to carry. It not only supports off-line payment, but also online payment. With the development of internet technology, more and more people are using credit cards. Nowadays, most people choose to use credit cards for transactions. However, with the growth in the use of credit card transactions, credit card fraud is also on the rise.
To reduce the growing number of credit card frauds, many methods have been developed to detect the fraud. Among them, machine learning models have been proved to be good solutions for credit card fraud detection. There are various machine learning models, either supervised or unsupervised, such as logistic regression, support vector machine (SVM), random forest (RF), k-nearest neighbor, and k-means clustering. Besides these models, Neural networks became popular in recent years, and it was proved to be powerful in many fields, including credit card fraud detection. In 2014, Sitaram patel and Sunita Gond found that the SVM algorithm with user profile instead of only spending profile can improve TP (true positive), TN (true negative) rate, and decreases the FP (false positive) & FN (false negative) rate [7]. In 2017, S. Akila and U. Srinivasulu Reddy analyzed the internal factors that affect the abnormal data found in the credit card transaction and tried to find a way to eliminate these factors. Simulation experiments proved that Non-overlapped Risk based Bagged Ensemble model (NRBE) can improve performances of 5% in terms of BCR and BER, 50% in terms of Recall and 2X to 2.5X times reduced cost [1]. Their research provided an idea for later research, that is, a new method can be used to re-sample existing historical data to generate more efficient training data, thereby improving the accuracy and recall of detecting credit card fraud.
Conclusion
In this research, we evaluated RF, 1 hidden layer ANN and DNN models with or without SMOTE. After comparison and analysis, we come to the following conclusions:
1) No matter with or without SMOTE training, the accuracy of RF model is of a little bit vantage than ANN and DNN.
2) With SMOTE training, the accuracy of RF, ANN and DNN are all improved, but the recall of ANN and DNN are all of better performance than RF. Especially DNN has better performance than ANN in both accuracy and recall.
3) Based on optimal combination, we generate a hybrid model using RF and DNN with SMOTE training to build a stable performance in both accuracy and recall.
In other words, we proposed a method for credit card fraud detection that is based on SMOTE, Ensemble Methods and some popular existing models. By comparing models with and without SMOTE, we show that applying SMOTE to deal with imbalanced data can increase the model performance. And then, we show that our proposed model is well suited for credit card fraud detection by comparing it to RF and ANN. RF shows its good performance on Accuracy and Precision, while ANN is better on Recall. The proposed model combines the advantages of these two models and provides high recall and high accuracy.