DS Journal of Digital Science and Technology (DS-DST)

Research Article | Open Access | Download Full Text

Volume 5 | Issue 1 | Year 2026 | Article Id: DST-V5I1P101 DOI: https://doi.org/10.59232/DST-V5I1P101

Integrating Knowledge Graph with Retrieval-Augmented Generation for Vietnamese Legal Question Answering

Vuong T Pham, Minh Phan

ReceivedRevisedAcceptedPublished
20 Nov 202526 Nov 202518 Jan 202630 Jan 2026

Citation

Vuong T Pham, Minh Phan. “Integrating Knowledge Graph with Retrieval-Augmented Generation for Vietnamese Legal Question Answering.” DS Journal of Digital Science and Technology, vol. 5, no. 1, pp. 1-13, 2026.

Abstract

Large Language Models (LLMs) are increasingly used for question answering, but legal applications require grounded answers with article-level citations and minimal unsupported statements. This paper presents a framework that integrates an ontology-driven Knowledge Graph (KG) with Retrieval-Augmented Generation (RAG) for Vietnamese legal question answering. The method builds a document-aware KG that preserves the hierarchical structure of statutes (Document → Chapter → Section → Article → Clause → Point) and encodes semantic and cross-reference relations. Graph-aware retrieval is combined with a hybrid ranking function that balances semantic similarity and graph-derived confidence, and the retrieved legal provisions are provided to an LLM using a constrained prompting template to generate answers with explicit citations. Experiments on three Vietnamese law domains (Electronic Transaction Law, Land Law, and Labor Law) report 85.7% answer accuracy, outperforming BM25 keyword search (62.3%) and a pure LLM baseline (71.5%), while improving citation quality. The framework is intended to increase traceability by grounding generation in retrieved statutory provisions.

Keywords

Knowledge Graph, Large Language Models, Legal Question Answering, Ontology, Retrieval-Augmented Generation, Vietnamese NLP.

References

[1] Abdul Qadir Khan et al., “Knowledge-Based Anomaly Detection: Survey, Challenges, and Future Directions,” Engineering Applications of Artificial Intelligence, vol. 136, pp. 1-30, 2024. 
[CrossRef] [Google Scholar] [Publisher Link]

[2] Cornelius T. Leondes, Knowledge based Systems Techniques and Applications, Volume 1, Academic Press. [Online]. Available: https://booksite.elsevier.com/9780124438750/pdf/vol-1-fm.pdf

[3] Zhenyu Li et al., “FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 17, pp. 18608-18616, 2024. 
[CrossRef] [Google Scholar] [Publisher Link]

[4] Zhaorun Chen et al., “Agentpoison: Red-Teaming LLM Agents via Poisoning Memory or Knowledge Bases,” Advances Neural Information Processing Systems (NeurIPS), pp. 130185-130213, 2024. 
[CrossRef] [Google Scholar] [Publisher Link]

[5] Jinheon Baek, Alham Fikri Aji, and Amir Saffari, “Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering,” Proceedings of the 1st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE),  Toronto, Canada, pp. 78-106, 2023. 
[CrossRef] [Google Scholar] [Publisher Link]

[6] Patrick Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 9459-9474, 2020. 
[Google Scholar] [Publisher Link]

[7] Vladimir Karpukhin et al., “Dense Passage Retrieval for Open-Domain Question Answering,” Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6769-6781, 2020. 
[CrossRef] [Google Scholar] [Publisher Link]

[8] Carl Yang et al., “Knowledge Graph and LLM Co-Learning via Structure-Oriented Retrieval-Augmented Generation,” IEEE Data Engineering Bulletin, vol. 48, no. 4, pp. 1-38, 2025. 
[Google Scholar] [Publisher Link]

[9] Marvin Hofer et al., “Construction of Knowledge Graphs: Current State and Challenges,” Information, vol. 15, no. 8, pp. 1-61, 2024. 
[CrossRef] [Google Scholar] [Publisher Link]

[10] Aidan Hogan et al., “Knowledge Graphs,” ACM Computing Surveys (CSUR), vol. 54, no. 4, pp. 1-37, 2021. 
[CrossRef] [Google Scholar] [Publisher Link]

[11] Seungone Kim et al., “The CoT Collection: Improving Zero-Shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning,” Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, pp. 12685-12708, 2023.  
[CrossRef] [Google Scholar] [Publisher Link]

[12] Boci Peng et al., “Graph Retrieval-Augmented Generation: A Survey,” arXiv Preprint, pp. 1-41, 2024. 
[CrossRef] [Google Scholar] [Publisher Link]

[13] Haoyu Han et al., “Retrieval-Augmented Generation with Graphs (GraphRAG),” arXiv Preprint, pp. 1-88, 2024. 
[CrossRef] [Google Scholar] [Publisher Link]

[14] Neel Guha et al., “LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models,” arXiv Preprint, pp. 1-143, 2023. 
[CrossRef] [Google Scholar] [Publisher Link]

[15] Ilias Chalkidis et al., “LEGAL-BERT: The Muppets Straight Out of Law School,” Findings of the Association for Computational Linguistics: EMNLP, pp. 2898-2904, 2020. 
[CrossRef] [Google Scholar] [Publisher Link]

[16] Zhiwei Fei et al., “LawBench: Benchmarking Legal Knowledge of Large Language Models,” Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA, pp. 7933-7692, 2024. 
[CrossRef] [Google Scholar] [Publisher Link]

[17] Hung Q. Ngo et al., “Ontology Knowledge Map Approach towards Building Linked Data for Vietnamese Legal Applications,” Vietnam Journal of Computer Science, vol. 11, no. 2, pp. 323-342, 2024. 
[CrossRef] [Google Scholar] [Publisher Link]

[18] Dat Quoc Nguyen, and Anh Tuan Nguyen, “PhoBERT: Pre-Trained Language Models for Vietnamese,” Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1037-1042, 2020. 
[CrossRef] [Google Scholar] [Publisher Link]

[19] Thanh Vu et al., “VnCoreNLP: A Vietnamese Natural Language Processing Toolkit,” Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, New Orleans, Louisiana, pp. 56-60, 2018. 
[CrossRef] [Google Scholar] [Publisher Link]

[20] Thi-Hai-Yen Vuong et al., “Constructing a Knowledge Graph for Vietnamese Legal Cases with Heterogeneous Graphs,” 2023 15th International Conference on Knowledge and Systems Engineering (KSE), Hanoi, Vietnam, pp. 1-6, 2023. 
[CrossRef] [Google Scholar] [Publisher Link]

[21] Ankita Gupta, and Frank Schilder, “Improving Legal Question Answering through Structured Knowledge Representation,” Proceedings of the First Argument Mining and Empirical Legal Research Workshop (AMELR 2025), Chicago, United States, pp. 39-45, 2025. 
[Google Scholar]  [Publisher Link]