Transforming BIM data interaction: A user-centric framework leveraging lightweight ontology and large language model integration

Zhikun Ding; Yongchang Li; Zhiwei Liu; Hongping Yuan

doi:10.70401/jbde.2026.0034

Transforming BIM data interaction: A user-centric framework leveraging lightweight ontology and large language model integration

Zhikun Ding

1,2,3

Yongchang Li

Zhiwei Liu

Hongping Yuan

4,*

Affiliation +

*Correspondence to: Hongping Yuan, School of Management, Guangzhou University, Guangzhou 510006, Guangdong, China. E-mail: hpyuan@gzhu.edu.cn

J Build Des Environ. 2026;4:202599. 10.70401/jbde.2026.0034

Received: November 05, 2025Accepted: March 09, 2026Published: March 12, 2026

This manuscript is made available in its unedited form to allow early access to the reported findings. Further editing will be completed before final publication. As such, the content may include errors, and standard legal disclaimers are applicable.

Abstract

This study addresses the interoperability of building information modeling (BIM) across different systems and platforms. Because the semantics of the industry foundation classes (IFC) standard are large and complex, traditional full semantic conversion methods are general-purpose, but they often lead to data expansion, which reduces system responsiveness and usability. In addition, the strong dependence of BIM models on specialized software further limits flexibility in practical use. Our novel framework combines a streamlined ontology with IFC, significantly improving memory efficiency and operational effectiveness compared to existing systems. Additionally, we integrate large language models for enhanced natural language processing in BIM data interactions. This harmonization of technologies not only simplifies system extension but also makes BIM data services more user-friendly and adaptable to various industry needs. By streamlining BIM data management and enriching data services, our approach broadens BIM’s applicability and improves data integration and extraction, establishing a more interactive and user-centric paradigm.

Keywords

Building information modeling, ontology interaction, large language models, intelligence semantic interaction, data management

1. Introduction

Building information modeling (BIM) plays a crucial role in the digital transformation of the construction industry, creating digital replicas of physical structures and compiling comprehensive lifecycle data of building components^[1,2]. However, integrating BIM data across various phases of engineering projects presents considerable challenges, mainly due to inconsistent data standards among different BIM software, leading to significant interoperability issues^[3]. Industry foundation classes (IFC), as a universal BIM data standard, alleviate some interoperability issues but have limitations. The complexity of IFC’s structure and its extensive learning curve hinder its widespread adoption, while the text-heavy nature of IFC files reduces efficiency^[4].

Currently, academics have tried to combine IFC standards with database or ontology technologies to enhance BIM data management^[5]. The core of these studies involves converting the IFC Schema into more advanced data models, such as database schemas and Industry Foundation Classes Web Ontology Language (IFCOWL) schemas^[6]. In this conversion mode, users can efficiently query data using Structure Query Language (SQL) or SPARQL Protocol and RDF Query Language (SPARQL) commands and simplify data querying operations through the design of customized views. These methods are difficult to achieve with the traditional IFC schema. Current research in this area has two main shortcomings. First, the complex structure of the IFC schema leads to system redundancy^[6,7]. The second key issue is human-computer data interaction (HDI), with existing methods mainly relying on static HDI (SHDI) with fixed datasets. However, this method’s drawback is that it requires direct user intervention whenever data requirements change. Additionally, this method is not only inefficient but also demands high levels of user expertise and operational skills^[8]. Therefore, a lightweight system and a Dynamic Human-Data Interaction (DHDI) approach are essential.

Based on the analysis above, this study aims to address issues in two primary areas. Firstly, we focus on mitigating data redundancy in IFCOWL. To address redundancy, a simplified IFC data management method is proposed, centered on developing a concise ontology dedicated to representing building topology, intentionally excluding the conversion of detailed entities such as geometry and materials in the IFC schema. This ontology captures building component instances and their spatial relationships. Subsequently, users can retrieve entity instances using SPARQL queries. This allows for precise retrieval of components at the spatial level. Finally, using the GUID, it is possible to extract geometric, material, and other detailed attribute values of each building element instance from the text-based IFC dataset^[9]. This approach enhances the overall efficiency and streamlining of the system.

To address the second issue, we focus on constructing a DHDI framework. Unlike SHDI, DHDI enables computers to directly participate in the workflow. In this model, computers act as intelligent agents, replacing the user’s role^[10]. The key challenge in implementing DHDI lies in accurately parsing the semantic information of natural language and converting it into precise structured data instructions. Traditional keyword recognition and instance matching technologies only work for sentences with clear structures and struggle to handle complex or implied semantic expressions^[11]. The rapid development of large language models (LLMs) has proven effective in addressing this issue, thereby opening up new possibilities for intelligent agents and DHDI^[12]. The application of LLMs in professional domains simplifies the processing of traditional tasks, allowing originally complex tasks to be completed efficiently through natural language dialogue, significantly enhancing the naturalness, intuitiveness, and efficiency of human-computer interaction^[13].

Therefore, this study introduces LLMs as the core intelligent agents. LLMs receive and analyze data requirements expressed by users in natural language and, through semantic analysis, convert these into SPARQL commands usable by subsequent modules. Throughout the system’s operational cycle, LLMs manage data validation and transfer tasks between different submodules. This integration significantly reduces the need for users to deeply engage in the DHDI process, simplifies the workflow, and lowers the learning curve. In our proposed framework, we synergistically combine IFC-expressed BIM data, ontology technology, and LLM technology to provide users with enhanced BIM data services. This integration allows for intuitive, natural language interactions with BIM data, accommodating diverse industrial needs and use cases. It significantly improves the efficiency of data integration and extraction, effectively addressing the current shortcomings in BIM data management.

The introduction section primarily provides an overview of the research content and its significance. In the second section, we conduct a comprehensive literature review to summarize the common methods of BIM data processing and sequentially analyze their advantages and disadvantages. Based on related work, we outline the objectives that this study aims to achieve. In the third section, we design a specific technical framework aimed at achieving the objectives set out above and detail how each technology and method collaborates within this framework. In the fourth section, we test the effectiveness of the proposed methods through a case study. Finally, we analyze the effectiveness and limitations of our research, providing insights for future scholars to build upon this work.

2. Review of Related Literature

In this section, we systematically review the literature related to our study, focusing on two main aspects. The first aspect concerns BIM data processing methods related to the IFC. The second aspect involves HDI driven by natural language within the BIM domain.

2.1 Processing methods of BIM data

2.1.1 Direct processing methods

The proprietary format-based approach to BIM data processing relies on using API interfaces provided by BIM software vendors to parse unique formats, such as .rvt and .dgn. This method, which is relatively common for processing BIM data, involves creating data import/export tools that facilitate the movement and use of BIM data across various software platforms^[14-16]. This method, tailored for individual software formats, constitutes a ‘point-to-point’ BIM data processing strategy. Its main limitation lies in its specificity to certain software platforms.

The IFC functions as a 3D building product data standard based on the object-oriented EXPRESS data specification language, essential for articulating BIM. As an open data standard, IFC seeks to overcome data interoperability challenges among diverse BIM software platforms by offering an intermediary format. To address different data application requirements across various project phases, buildingSMART has developed the information delivery manual (IDM)^[17] and model view definition (MVD)^[18] based on IFC. IDM delineates data requirements for distinct business phases, and MVD enables the extraction of specific segments of BIM model data. This method requires precise definitions and standardized methodologies to establish IDM/MVD, targeting specialized domain-specific data needs. Based on standardized IDM/MVDs, tools are developed by domain experts to extract data from IFC datasets. These tools support specific domain requirements and enhance data exchange processes in areas such as engineering measurement, cost estimation^[19], energy consumption analysis^[20], and construction schedule generation^[21]. Beyond standardized development processes, some researchers have explored more streamlined operations and development workflows to support the implementation of customized data extraction requirements. Won et al.^[22] introduced a method for extracting partial model data without requiring MVD definitions, thereby streamlining the operational process. This innovative approach depends solely on the relationships between data instances within the IFC instance model file to extract partial models. Further enhancing this concept, Mingjuan^[23] advanced the IFC partial model extraction algorithm, focusing on the identification and removal of invalid instances.

However, the intricate data structure of BIM and the required advanced programming skills pose significant challenges and inefficiencies for practitioners. The lack of uniformity in proprietary BIM data formats exacerbates interoperability issues across different BIM platforms. While the IFC offers a universal standard for BIM data storage and mitigate interoperability challenges to a degree, several critical issues remain, including semantic limitations, complex data structures, difficulties in data processing and analysis, and constrained extensibility^[24]. As the IFC schema is subject to alternative definitions and interpretations of domain knowledge, its flexible data representation can be an obstacle to defining strict rules and robust validation processes regarding object attributes and references^[25]. This flexibility aims to provide a comprehensive representation of various objects in the construction domain, despite the inherent incompleteness of IFC. The detailed data representation complicates processing and makes analysis challenging, requiring specialized knowledge and advanced techniques for resolution. Finally, the overly detailed data representation results in BIM datasets based on the IFC standard being excessively large. So, efficient algorithms and tools are often required to extract, transform, and analyze IFC data to meet the demands of various application scenarios.

These challenges underscore the need for ongoing development and enhancement of BIM data standards to enable more effective and comprehensive data integration across various BIM systems. Current ontology methods fail to effectively balance expressive completeness and system efficiency, highlighting the need for a selective transformation strategy that retains only high-level semantic structures to support rapid querying.

2.1.2 Indirect processing methods

Researchers implement the conversion from IFC schema to database schema, so that, BIM data can be stored and managed via databases. Compared with extending IFC schema, extending at the database schema level is clearly easier and more efficient. Similarly, the conversion between IFC schema and Ontology Web Language (OWL) schema is also due to this reason. The difference lies in that OWL offers has advantages in semantic extension and the expression, storage, and processing of unstructured data.

A notable example is BIMserver^[26], an open-source platform based on the IFC standard, providing centralized storage, querying, and sharing of BIM data. Building on BIMserver, Mazairac et al.^[27] developed BIMQL, a query language for BIM models, enhancing data management efficiency. Furthermore, researchers including Cheng et al.^[28], Solihin et al.^[29], and Guo et al.^[30] have explored converting IFC models to relational database models, using relational databases for IFC data management. Wyszomirski^[31] applied NoSQL databases for IFC data processing, aiming for BIM-GIS data integration, while Yang et al.^[32] and Gradišar et al.^[33] examined the combination of IFC data with other data sources using graph databases. These studies demonstrate the exploration of various database systems integrated with IFC to address different research challenges. However, it is important to note that constructing such systems is often complex, relying on specialized databases and resulting in a sophisticated ecosystem. Although these systems mainly focus on associating, storing, and querying structured data, they still face challenges in handling unstructured knowledge and data integration.

Semantic web technologies have been developed to provide a structured context for unstructured data. These technologies enable data to be understood not only by humans but also by machines, thereby supporting more intelligent information retrieval, data mining, and knowledge management. Their integration with BIM promises substantial benefits for the architecture, engineering, and construction industry^[34]. During the design phase, an ontology model of building codes can be developed using the Semantic Web, enabling the automation of various tasks. For instance, the system can automatically verify whether the fire-resistance rating of walls complies with relevant code requirements or generate building energy performance reports, thereby eliminating the need for manual, item-by-item checks^[35,36]. In practical engineering applications, semantic reasoning engines can perform complex queries similar to those in databases. For example, one can query all walls that are constructed with sustainable materials and meet specific seismic performance standards, a capability that is difficult to achieve with traditional BIM approaches^[37]. Furthermore, by leveraging semantic web technologies, such as the BIM-to-GEO standard, semantic mapping between GIS and BIM can be achieved, supporting the unified management of 3D city models^[38].

Ontologies, fundamental to the semantic web, offer a structured conceptual model for domain knowledge, enabling effective machine understanding and interpretation of data. OWL is designed for representing these ontologies, providing an array of tools and syntax to define and elaborate on classes, properties, relationships, and other complex semantic elements. In the realm of BIM, integrating BIM with the semantic web chiefly involves translating IFC schema into OWL schema. Pauwels’ pioneering work in converting IFC into an ontology format led to the creation of the IFCOWL ontology, now recognized as a standard by buildingSMART^[39]. The IFCOWL offers a comprehensive transformation of IFC schema, creating a corresponding individual in the IFCOWL ontology for each IFC instance.

Notably, IFCOWL files tend to occupy considerably more memory than their IFC counterparts, potentially obstructing efficient ontology-based reasoning and querying processes. As indicated in Table 1, examinations of five distinct models demonstrate that the memory usage of IFCOWL files is approximately 3-10 times greater than that of IFC files, with the variance contingent on the model’s complexity. This becomes particularly burdensome in large-scale projects and significantly limits the performance of ontology-based reasoning and querying. To address this issue, researchers have proposed semantics-oriented simplification strategies. For example, Wu et al. introduced a geometry simplification method based on semantic constraints: the level of geometric detail retained for each building component is dynamically determined according to its semantic attributes, such as type and function^[40]. This approach achieves selective geometric optimization while preserving, or even enhancing, the semantic completeness of the model, offering a practical way to balance semantic expressiveness with computational efficiency.

Table 1. Comparison of data volume between IFC and IFCOWL.

	Model1	Model2	Model3	Model4	Model5
IFC(.ifc)	12kb	3.09MB	25.8MB	49.1MB	110MB
IFCOWL(.ttl)	98kb	9.50MB	259MB	427MB	1.09GB

Class	SubClass
OntoProject
OntoBuildingElement	OntoColumn, OntoBeam, OntoSlab, OntoWallOntoDoor, OntoWindow, OntoStair, OntoChimney, OntoCovering, OntoCurtainWall, OntoFooting, OntoMember, OntoPile, OntoRailing, OntoRamp, OntoRampFlight, OntoRoof, OntoShadingDevice, OntoStairFlight, OntoBuildingElementProxy
OntoSpatialStructure	OntoSite, OntoBuilding, OntoBuildingStorey, OntoSpace

Object Property	Sub-Property	Domains	Ranges
hasSpatialStructure	hasSite	OntoProject	OntoSite
	hasBuilding	OntoSite	OntoBuilding
	hasBuildingStorey	OntoBuilding	OntoBuildingStorey
	hasSpace	OntoBuildingOntoBuildingStorey	OntoSpace
hasBuildingElement		OntoSpatialStructure	OntoBuildingElement

Data Property	Sub-Property	Data Type	Usage
InstanceID	Name	String	UniqueID
	ID
DataExtractTag		Bool	Data extraction marker
DataPrecisionTag	Geometry	Enumerate (A, B, C)	Data precision marker
	Property
	Material

Performance Dimension	Metric	Full IFCtoRDF Conversion	Proposed Lightweight Method
Space Complexity	Converted file size	1,393 MB	14 MB
	Size ratio to original IFC	870.6%	8.75%
Loadability	Protégé loading result	Memory overflow (failed)	Successfully loaded in 1s
Rendering Efficiency	Geometric primitives	876,131	148,876 (↓83.0%)
	GPU memory usage	98.59 MB	66.44 MB (↓32.6%)
	Primitive variable storage	18.8 MB	3.58 MB (↓81.0%)
LLM Interaction	Avg. rounds per task	—	2.3 rounds*
	Expert intervention rate	—	1 out of 3 rounds (33.3%)*

	Primitives	GPU Memory	Primitive Variable	Topology
Geometry	876,131	98.59 MB	18.8 MB	21.37 MB
Simple Geometry	148,876	66.44 MB	3.58 MB	21.37 MB

	Geometry	Material	Property
A	Fine	Photorealistic Rendering	All Properties
B	Medium	Base Color	Partial Properties
C	Simple	White Model	None Properties

Journal of Building Design and Environment

Transforming BIM data interaction: A user-centric framework leveraging lightweight ontology and large language model integration

Zhikun Ding

Yongchang Li

Zhiwei Liu

Hongping Yuan

Abstract

Keywords

References

Copyright

Publisher’s Note

Share And Cite

Science Exploration Style

Download

Export Citation

Article Metrics

Article Updates

Related Articles

Contents

Science Exploration Style

Share Link

Subscribe

Journal of Building Design and Environment

Navigation

Follow us