Intelligent Fraud Detection: Applying Advanced Analytics and Cybersecurity Insights in U.S. Finance

Authors

  • Mohammad Shahidullah Department of Business Administration International American University
  • Hammed Esa Department of Business Administration International American University
  • Md Abdur Rob Department of Economics Ohio University, Athens
  • Md Bayzid Kamal Department of Business Analytics Brooklyn College, CUNY (City University of New York
  • Md Mohaimin Rashid Department of Business Administration and International American University
  • Md Fakhrul Hasan Bhuiyan Department of Information Studies Trine University
  • Md Shayakh Alam Master of Engineering Management Trine University
  • Durga Shahi Department of Business Administration Westcliff University

DOI:

https://doi.org/10.63332/joph.v4i3.3593

Keywords:

Fraud Detection, Machine Learning, Financial Crime, Random Forest, Gradient Boosting, Logistic Regression, Precision-Recall Analysis.

Abstract

Fraud detection in financial transactions is a major and crucial problem that does not cease to exist, mainly because of the enormous imbalance in the datasets obtained and the very high requirement for an accurate distinction between legitimate and fraudulent activities. In the following study, we assess the performance of three common machine learning models: Logistic Regression, Random Forest, and Gradient Boosting, for the detection of fraud, using a real-data set of transactions (284807 of which only 0.173% are labelled as fraudulent). The models were thoroughly evaluated with respect to critical metrics of performance including the precision, recall, F1-score and Area Under the Receiver Operating Characteristic Curve (AUC) to try to understand which of the models may be appropriate for dealing with class imbalance and false positives. Of the analyzed models, Random Forest was the best, with AUC being equal to 0.98, being superior to Logistic Regression (AUC = 0.97) and equal to Gradient Boosting (AUC = 0.98), while enabling more superior recall (0.88) and precision (0.44). This implies a higher capacity of detecting fraud cases without compromising the rate of false alarm too much. Feature importance analysis further noted that V14, V10, and V4 features were most predictive and most responsible for model classifying accuracy. Furthermore, calibration analysis revealed that Random Forest was the most reliable in estimating probabilities, outputs closely conformed to the ideal calibration curve implicating better reliability in practical applications. These findings suggest the effectiveness of the ensemble machine learning models especially Random Forest in promoting the efficacy of fraud detection systems. The study supports future research on real-time deployment and integration with deep learning methods to enhance the strength of fraud detection in the constantly changing financial spaces.

Downloads

Published

2024-11-20

How to Cite

Shahidullah, M., Esa, H., Rob, M. A., Kamal, M. B., Rashid, M. M., Bhuiyan, M. F. H., … Shahi, D. (2024). Intelligent Fraud Detection: Applying Advanced Analytics and Cybersecurity Insights in U.S. Finance. Journal of Posthumanism, 4(3), 2157–2184. https://doi.org/10.63332/joph.v4i3.3593

Issue

Section

Articles