The Impact of Data Distribution and Feature Selection on Machine Learning Performance in Fake Audio Detection
DOI:
https://doi.org/10.63332/joph.v6i1.3838Keywords:
Audio deepfakes, Artificial intelligence, Deep fakes, Feature selection, Fake audio detection, Machine learningAbstract
The aim behind this research is to investigate the effect of using machine learning algorithms in enhancing the performance of fake audio detection, particularly after applying different data distribution patterns and feature selection techniques. The study is conducted under three data distribution scenarios: a balanced dataset with equal real and synthetic samples, a real-dominant dataset, and a fake-dominant dataset. Three classification algorithms (RFC, SVM, and GB) are implemented to analyze the performance of audio features with different sizes across the three classifiers. The experimental data revealed that RFC and GB achieved better performance compared to SVM by reaching 99% accuracy in balanced conditions. The research obtained its datasets from two different sources: the original clean CFAD dataset and the rerecorded "For-rerec" version of the Fake-or-Real dataset. The results indicate that data distribution, in conjunction with feature richness, plays a crucial role in developing dependable fake audio detection systems.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
CC Attribution-NonCommercial-NoDerivatives 4.0
The works in this journal is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
