Using Natural Language Processing Techniques for Indexing and Analyzing Archive Documents
DOI:
https://doi.org/10.63332/joph.v5i1.1394Keywords:
Natural Language Processing (NLP), Automated Archival Document Indexing, Named Entity and Keyphrase Extraction, Topic Modeling in Historical Texts, Enhanced Access to Archival DataAbstract
This research explores the use of Natural Language Processing (NLP) techniques to improve the accuracy and efficiency of indexing historical archival documents. It aims to enhance indexing precision, extract metadata, and uncover hidden patterns in historical texts through advanced NLP methods such as named entity recognition, key-phrase extraction, and topic modeling. Employing a descriptive research design, the study utilizes a diverse archival corpus prepared via Optical Character Recognition (OCR) and data cleaning. It evaluates NLP-generated index terms against manually created ones using metrics like precision, recall, and F1-scores, emphasizing improved accuracy and time savings through automation. Additionally, the research highlights NLP’s ability to reveal semantic relationships, generate enriched metadata, and identify latent themes or cultural trends, aiming to transform archival practices and enhance access to historical insights.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
CC Attribution-NonCommercial-NoDerivatives 4.0
The works in this journal is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.