Intelligent Fraud Detection: Applying Advanced Analytics and Cybersecurity Insights in U.S. Finance

Mohammad Shahidullah; Hammed Esa; Md Abdur Rob; Md Bayzid Kamal; Md Mohaimin Rashid; Md Fakhrul Hasan Bhuiyan; Md Shayakh Alam; Durga Shahi

doi:10.63332/joph.v4i3.3593

Authors

Mohammad Shahidullah Department of Business Administration International American University
Hammed Esa Department of Business Administration International American University
Md Abdur Rob Department of Economics Ohio University, Athens
Md Bayzid Kamal Department of Business Analytics Brooklyn College, CUNY (City University of New York
Md Mohaimin Rashid Department of Business Administration and International American University
Md Fakhrul Hasan Bhuiyan Department of Information Studies Trine University
Md Shayakh Alam Master of Engineering Management Trine University
Durga Shahi Department of Business Administration Westcliff University

DOI:

https://doi.org/10.63332/joph.v4i3.3593

Keywords:

Fraud Detection, Machine Learning, Financial Crime, Random Forest, Gradient Boosting, Logistic Regression, Precision-Recall Analysis.

Abstract

Fraud detection in financial transactions is a major and crucial problem that does not cease to exist, mainly because of the enormous imbalance in the datasets obtained and the very high requirement for an accurate distinction between legitimate and fraudulent activities. In the following study, we assess the performance of three common machine learning models: Logistic Regression, Random Forest, and Gradient Boosting, for the detection of fraud, using a real-data set of transactions (284807 of which only 0.173% are labelled as fraudulent). The models were thoroughly evaluated with respect to critical metrics of performance including the precision, recall, F1-score and Area Under the Receiver Operating Characteristic Curve (AUC) to try to understand which of the models may be appropriate for dealing with class imbalance and false positives. Of the analyzed models, Random Forest was the best, with AUC being equal to 0.98, being superior to Logistic Regression (AUC = 0.97) and equal to Gradient Boosting (AUC = 0.98), while enabling more superior recall (0.88) and precision (0.44). This implies a higher capacity of detecting fraud cases without compromising the rate of false alarm too much. Feature importance analysis further noted that V14, V10, and V4 features were most predictive and most responsible for model classifying accuracy. Furthermore, calibration analysis revealed that Random Forest was the most reliable in estimating probabilities, outputs closely conformed to the ideal calibration curve implicating better reliability in practical applications. These findings suggest the effectiveness of the ensemble machine learning models especially Random Forest in promoting the efficacy of fraud detection systems. The study supports future research on real-time deployment and integration with deep learning methods to enhance the strength of fraud detection in the constantly changing financial spaces.

Intelligent Fraud Detection: Applying Advanced Analytics and Cybersecurity Insights in U.S. Finance

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License