Research Article | Open Access | Download Full Text
Volume 3 | Issue 4 | Year 2024 | Article Id: DST-V3I4P103 DOI: https://doi.org/10.59232/DST-V3I4P103
Constructing Model for Entity Extraction from Specification Documents of the Database
Giang Ma, Nhan Pham, Bao Thai, Thanh Cao, Hai Tran
| Received | Revised | Accepted | Published |
|---|---|---|---|
| 05 Oct 2024 | 06 Nov 2024 | 01 Dec 2024 | 24 Dec 2024 |
Citation
Giang Ma, Nhan Pham, Bao Thai, Thanh Cao, Hai Tran. “Constructing Model for Entity Extraction from Specification Documents of the Database.” DS Journal of Digital Science and Technology, vol. 3, no. 4, pp. 19-25, 2024.
Abstract
Today, automatically generating Entity-Relationship Diagrams (ERDs) from raw data based on software or system requirements is still predominantly performed manually, incurring significant design costs. Recently, several methods have been proposed to assist users in accomplishing these tasks. However, these methods often rely on rigid rule-based approaches, which cannot be generalized across all scenarios of the exact requirement. Despite having better generalization capabilities than rule-based methods, deep learning-based models need more large-scale labeled datasets. Therefore, this paper recognises the similarity between the NL2ERD and the text-to-SQL problems and proposes an approach to transform existing text-to-SQL datasets into NL2ERD data. Combined with data collected from various Natural Language (NL) types, this approach yields a large-scale NL2ERD dataset. Since NL2ERD can be regarded as a specific task of Information Extraction (IE), we employ this dataset for relation extraction modeling. Experimental results demonstrate that our model achieves high performance.
Keywords
Entity Extraction, Entity-Relationship Diagrams (ERDs), Text-to-SQL, NL2ERD.
References
[1] Nazlia Omar, Paul Hanna, and Paul Mc Kevitt, “Heuristic-Based entity-Relationship Modelling through Natural Language Processing,” Proceeding of the 15th Artificial Intelligence and Cognitive Science Conference, Artificial Intelligence Association of Ireland, 2004.
[Google Scholar] [Publisher Link]
[2] Mudassar Adeel Ahmed et al., “A Novel Natural Language Processing Approach to Automatically Visualize Entity-Relationship Model from Initial Software Requirements,” 2021 International Conference on Communication Technologies (ComTech), Rawalpindi, Pakistan, pp. 39-43, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Eman S. Btoush, and Mustafa M. Hammad, “Generating ER Diagrams from Requirement Specifications Based on Natural Language Processing,” International Journal of Database Theory and Application, vol. 8, no. 2, pp. 61-70, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Sashini Hettiarachchi et al., “A Scenario-Based ER Diagram and Query Generation Engine,” 2019 4th International Conference on Information Technology Research (ICITR), Moratuwa, Sri Lanka, pp. 1-5, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Zhenwen Li, Jian-Guang Lou, and Tao Xie, “Data Transformation to Construct a Dataset for Generating Entity-Relationship Model from Natural Language,” arXiv Preprint, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Christina Niklaus et al., “A Survey on Open Information Extraction,” arXiv Preprint, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Pere-Lluís Huguet Cabot, and Roberto Navigli, “REBEL: Relation Extraction by End-to-End Language Generation,” Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 2370-2381, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[8] A. Min Tjoa, and Linda Berger, “Transformation of Requirement Specifications Expressed in Natural Language into an EER Model,” Entity-Relationship Approach - ER '93, pp. 206-217, 1993.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Fernando Gomez, Carlos Segami, and Carl Delaune, “A System for the Semiautomatic Generation of ER Models from Natural Language Specifications,” Data & Knowledge Engineering, vol. 29, no. 1, pp. 57-81, 1999.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Luisa Mich, “NL-OOPS: From Natural Language to Object Oriented Requirements Using the Natural Language Processing System LOLITA,” Natural Language Engineering, vol. 2, no. 2, pp. 161-187, 1996.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Azucena Montes et al., “Conceptual Model Generation from Requirements Model: A Natural Language Processing Approach,” Natural Language and Information Systems, pp. 325-326, 2008.
[CrossRef] [Google Scholar] [Publisher Link]
[12] P.G.T.H. Kashmira, and Sagara Sumathipala, “Generating Entity Relationship Diagram from Requirement Specification Based on NLP,” 2018 3rd International Conference on Information Technology Research (ICITR), Moratuwa, Sri Lanka, pp. 1-4, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Dowming Yeh, Yuwen Li, and William Chu, “Extracting Entity-Relationship Diagram from a Table-Based Legacy Database,” Journal of Systems and Software, vol. 81, no. 5, pp. 764-771, 2008.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Tao Yu et al., “Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task,” Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3911-3921, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Yaojie Lu et al., “Unified Structure Generation for Universal Information Extraction,” arXiv Preprint, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[16] N. Omar, P. Hanna, P. Mc Kevitt, “Semantic Analysis in the Automation of ER Modelling through Natural Language Processing,” 2006 International Conference on Computing & Informatics, Kuala Lumpur, Malaysia, pp. 1-5, 2006.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Raghu Ramakrishnan, and Johannes Gehrke, Database Management Systems, McGraw-Hill, Inc., 2002.
[Google Scholar] [Publisher Link]
[18] Sebastian Riedel, Limin Yao, and Andrew McCallum, “Modeling Relations and their Mentions without Labeled Text,” Machine Learning and Knowledge Discovery in Databases, pp. 148-163, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Guillaume Lample et al., “Neural Architectures for Named Entity Recognition,” Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 260-270, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Mohd Ibrahim, and Rodina Ahmad, “Class Diagram Extraction from Textual Requirements Using Natural Language Processing (NLP) Techniques,” 2010 Second International Conference on Computer Research and Development, Kuala Lumpur, Malaysia, pp. 200-204, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Vinay S., Shridhar Aithal, and Prashanth Desai, “An NLP Based Requirements Analysis Tool,” International Advance Computing Conference, Patiala, India, pp. 2355-2360, 2009.
[22] Frederik Hogenboom, Flavius Frasincar, Uzay Kaymak, “An Overview of Approaches to Extract Information from Natural Language Corpora,“ Proceedings of the 10th Dutch-Belgian Information Retrieval Workshop, pp. 69-70, 2010.