The Impact of Data Distribution and Feature Selection on Machine Learning Performance in Fake Audio Detection

Authors

  • Asmaa M. S. Abo Alfadl Department of Information Systems, Faculty of Computers and Artificial Intelligence, Fayoum University, Fayoum 63514, Egypt
  • Mohamed H. Khafagy Department of Computer Science, Faculty of Computers and Artificial Intelligence, Fayoum University, Fayoum, Egypt
  • Engy R. Abdelmaksoud Department of Basic Science, Faculty of Computers and Artificial Intelligence, Fayoum University, Fayoum, Egypt
  • Ahmed S. Ismail Department of Information Systems, Faculty of Computers and Artificial Intelligence, Fayoum University, Fayoum 63514, Egypt

DOI:

https://doi.org/10.63332/joph.v6i1.3838

Keywords:

Audio deepfakes, Artificial intelligence, Deep fakes, Feature selection, Fake audio detection, Machine learning

Abstract

The aim behind this research is to investigate the effect of using machine learning algorithms in enhancing the performance of fake audio detection, particularly after applying different data distribution patterns and feature selection techniques. The study is conducted under three data distribution scenarios: a balanced dataset with equal real and synthetic samples, a real-dominant dataset, and a fake-dominant dataset. Three classification algorithms (RFC, SVM, and GB) are implemented to analyze the performance of audio features with different sizes across the three classifiers. The experimental data revealed that RFC and GB achieved better performance compared to SVM by reaching 99% accuracy in balanced conditions. The research obtained its datasets from two different sources: the original clean CFAD dataset and the rerecorded "For-rerec" version of the Fake-or-Real dataset. The results indicate that data distribution, in conjunction with feature richness, plays a crucial role in developing dependable fake audio detection systems.

Downloads

Published

2025-12-31

How to Cite

Alfadl, A. M. S. A., Khafagy, M. H., Abdelmaksoud, E. R., & Ismail, A. S. (2025). The Impact of Data Distribution and Feature Selection on Machine Learning Performance in Fake Audio Detection. Journal of Posthumanism, 6(1), 1–30. https://doi.org/10.63332/joph.v6i1.3838

Issue

Section

Articles