Using Natural Language Processing Techniques for Indexing and Analyzing Archive Documents

Authors

  • Wiem Ben Khalifa Information Science Department, College of Arts, Imam Abdulrahman Bin Faisal University

DOI:

https://doi.org/10.63332/joph.v5i1.1394

Keywords:

Natural Language Processing (NLP), Automated Archival Document Indexing, Named Entity and Keyphrase Extraction, Topic Modeling in Historical Texts, Enhanced Access to Archival Data

Abstract

This research explores the use of Natural Language Processing (NLP) techniques to improve the accuracy and efficiency of indexing historical archival documents. It aims to enhance indexing precision, extract metadata, and uncover hidden patterns in historical texts through advanced NLP methods such as named entity recognition, key-phrase extraction, and topic modeling. Employing a descriptive research design, the study utilizes a diverse archival corpus prepared via Optical Character Recognition (OCR) and data cleaning. It evaluates NLP-generated index terms against manually created ones using metrics like precision, recall, and F1-scores, emphasizing improved accuracy and time savings through automation. Additionally, the research highlights NLP’s ability to reveal semantic relationships, generate enriched metadata, and identify latent themes or cultural trends, aiming to transform archival practices and enhance access to historical insights.

Downloads

Published

2025-05-02

How to Cite

Khalifa, W. B. (2025). Using Natural Language Processing Techniques for Indexing and Analyzing Archive Documents. Journal of Posthumanism, 5(1), 1520–1540. https://doi.org/10.63332/joph.v5i1.1394

Issue

Section

Articles