Ethical Alignment in Large Language Models: Interpreting Moral Reasoning in Transformer-Based AI Systems
DOI:
https://doi.org/10.63332/joph.v5i4.1089Keywords:
Moral Reasoning, Transformer Models, Ethical Alignment, Responsible AI, AI ExplainabilityAbstract
The adoption of Large language models (LLMs) based on transformer networks in contexts of high stakes has much accentuated ethical debates devolving over the alignment of machine values and moral reasoning with those same of the human. This study looks at interpretability of moral-based reasoning in LLMs, in terms of the ability to learn, apply, and justify ethical norms or reasoning. The article also discusses how transformer architectures learn and fix values in the face of normative judgments under a number of interdisciplinary frameworks well at home within psychology, philosophy, and socio-technical perspectives. The article critiques the relevant methodologies: value alignment frameworks, simulation environments, transparency-enhancing tools that, while they can be helpful, can be harmful if not operated carefully, all in an attempt to gauge ethical robustness. It scrutinizes attribute bias detection, fairness interventions, and the limitations of the current moral reasoning in AI-generated outputs. Case studies of healthcare, mental health, and the justice system were provided to show the ethical implications of misalignment. Finally, the paper gives recommendations on how to move towards the development of moral AI systems through inclusive design, explainable decision pathways, and global ethical governance.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
CC Attribution-NonCommercial-NoDerivatives 4.0
The works in this journal is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.