Chaoqun Zhuang, College of Urban Construction, Nanjing Tech University, Nanjing 211816, Jiangsu, China. E-mail: zhuangcq@njtech.edu.cn
1. Introduction
Automated fault detection and diagnosis (FDD) of heating, ventilation and air conditioning (HVAC) systems is a key technique for modern building energy system management, maintenance, and predictive control within indoor and built environment[1,2]. Early and accurate HVAC FDD is essential for maintaining indoor air quality, ensuring occupant comfort, improving energy efficiency, and reducing carbon emissions[3]. Notably, various HVAC faults can lead to energy losses of up to 20% of total building energy consumption[4]. Therefore, implementing effective FDD strategies is critical to ensuring the reliable operation and optimization of building energy systems.
With advances in sensor technologies and artificial intelligence (AI), data-driven FDD methods[5] have gained considerable attention, as they reduce reliance on expert knowledge and enable scalable, automated diagnostics. However, their performance and widespread adoption are constrained by two key challenges: (1) the shortage of fault data, owing to the infrequent and costly nature of fault occurrences; and (2) the scarcity of labelled samples, given the high cost and specialized expertise required for manual annotation.
(1) Lack of fault data. HVAC systems generally operate under normal conditions, and faults often develop gradually. This makes the collection of real-world fault data both challenging and costly, especially for large equipment such as chillers. As a result, the number of available fault samples is significantly lower than that of normal samples. This imbalance, commonly referred to as the data imbalance problem in computer science[6], poses a major challenge. In such scenarios, machine learning models may struggle to accurately identify fault samples because of highly biased classifiers, ultimately reducing the effectiveness of fault diagnosis.
(2) Scarcity of labelled fault samples. In real-world building management systems, the majority of collected HVAC operational data are unlabeled. Identifying fault categories within this data and assigning accurate labels requires manual inspection. However, this labeling process is time-consuming and labor-intensive. Therefore, obtaining labelled fault samples remains a significant challenge for data-driven methods.
2. Generative Models for Data Augmentation in HVAC FDD
Recent advances in generative models offer promising solutions for data augmentation in HVAC FDD scenarios, directly tackling challenges related to data scarcity and imbalance. Common generative techniques include synthetic minority oversampling technology (SMOTE), variational autoencoder (VAE), generative adversarial network (GAN), diffusion model (DM), and their hybrid approaches. These methods generate synthetic fault samples within the data space, thereby effectively enhancing the diagnostic performance of classifiers.
SMOTE is a classic oversampling technique that generates new samples by interpolating between minority class instances and their nearest neighbors to achieve class balance[7]. Due to its straightforward data augmentation approach, SMOTE is often combined with more advanced machine learning methods, such as support vector machine and principal component analysis (PCA), to enhance its effectiveness. For example, the integration of PCA and SMOTE has been applied to address class imbalance in fault datasets of packaged rooftop units, thereby improving the model’s capability to accurately detect and diagnose minority fault classes[8].
The VAE is a type of probabilistic generative model that learns to encode input data into a latent space and reconstructs the data through a decoder[9]. The encoder maps the input data to a probability distribution within the latent space, while the decoder samples from this distribution to generate new data that resembles the original data distribution. CVAE-based techniques have been applied to fault diagnosis in air handling units (AHUs) to address data imbalance issues[10]. Additionally, a variable-β convolutional VAE has been developed to extract discriminative 2D features and enable robust classification across machine health states, especially under noisy and complex operating conditions[11]. Despite their relatively simple network structure, VAEs often have limited capacity to capture complex, high-dimensional dependencies, which restricts their broader application in the FDD of HVAC systems.
GANs are widely adopted unsupervised generative models[12]. In GANs, the generator and discriminator are trained adversarially, enabling the model to learn the underlying distribution of the input data and generate new samples that closely resemble the original ones. An automatic FDD framework was developed based on a Conditional Wasserstein GAN (CWGAN)[13], where a multilayer perceptron was integrated to enhance diagnostic performance in chiller systems. Moreover, recent studies have highlighted the effectiveness of adversarial autoencoder variants such as the adaptive adversarial autoencoder[14] in unsupervised fault detection tasks for AHU. These methods leverage latent space regularization and compactness strategies to identify anomalies without prior knowledge of fault labels, achieving superior detection performance compared to traditional shallow models or VAE based approaches. Although existing GAN based methods have shown promising results in addressing the scarcity of fault samples, they typically depend on labeled fault data for training, neglecting the use of unlabeled samples and thereby limiting further improvements in the quality of synthetic data.
Although SMOTE, VAE and GAN each offer distinct advantages for synthesizing fault data in HVAC systems, they also have specific limitations. To capitalize on the strengths of these methods, hybrid frameworks have been developed and shown promising results in HVAC FDD. A representative example is the integration of VAE with Wasserstein GANs using gradient penalty (WGAN-GP) for fault diagnosis in variable refrigerant flow systems, which has led to significant improvements in diagnostic accuracy and model robustness[15]. Despite these advantages and encouraging research outcomes, GANs and their variants often suffer from training instabilities[16], such as convergence difficulties and mode collapse, which can considerably degrade the quality of generated fault data in HVAC applications.
The denoising DM is an emerging data augmentation technique that gradually removes noise in the latent space to generate synthetic data samples approximating the real data distribution[17]. This model exhibits stable training dynamics and demonstrates significant advantages and strong potential in image-based data generation. Several studies have shown that DM can provide a more robust solution compared to GANs, stabilizing the synthetic data generation process in HVAC FDD applications[18]. For example, the Gramian angular field transformation has been employed to convert one-dimensional chiller fault time-series data into image representations, which are then processed using a denoising diffusion probabilistic model to generate high-quality synthetic fault samples[19]. Their results confirm the effectiveness and superiority of the denoising diffusion model for generating chiller fault diagnosis data.
Recent developments in large language models (LLMs) demonstrate great potential for their application in data augmentation for HVAC FDD scenarios[20,21]. The large and complex neural network architectures of LLMs offer new paradigms for generating reliable fault data based on existing HVAC configurations. It is anticipate that further research leveraging LLMs will drive significant advances in FDD within the indoor and built environment sectors.
3. Conclusion and Outlook
This editorial has reviewed the current state and recent advances in generative modelling for FDD in HVAC systems, focusing particular on data generation techniques designed to address data imbalance under conditions of limited fault samples. The primary generative models applied in HVAC systems include SMOTE, VAE, GAN, DM, and their variants. Recent studies have demonstrated that these generative models play a crucial role in improving FDD performance and enhancing the identification of minority faults classes. Among them, GAN-based data generation frameworks have been the most extensively studied and applied due to their strong data fitting and generation capabilities. Various GAN-based enhancements have been proposed, effectively mitigating the adverse effects of fault sample scarcity on FDD model performance. However, compared to GANs, the application of denoising DMs for fault data generation in HVAC systems is still in its infancy, with relatively few studies conducted to date. Given the superior performance of DMs in fields such as image generation, their potential for HVAC system FDD merits further investigation, offering promising new directions to improve diagnostic accuracy and generalization. Furthermore, the emergence of LLMs presents exciting opportunities for data augmentation and diagnostic enhancement in next-generation FDD systems.
Authors contribution
Yan K: Methodology, writing-original draft.
Yang B, Zhuang C: Conceptualization, writing-review & editing.
Conflicts of interest
Bin Yang is an Editorial Board member of the Journal of Building Design and Environment. The other authors declare no conflicts of interest.
Ethical approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Availability of data and materials
Not applicable.
Funding
None.
Copyright
© The Author(s) 2025.
References
-
1. Bi J, Wang H, Yan E, Wang C, Yan K, Jiang L, et al. AI in HVAC fault detection and diagnosis: A systematic review. Energy Rev. 2024;3(2):100071.
[DOI] -
2. Mirnaghi MA, Haghighat F. Fault detection and diagnosis of large-scale HVAC systems in buildings using data-driven methods: A comprehensive review. Energy Build. 2020;229:110492.
[DOI] -
3. Shaban IA, Salem H, Abdullah AY, Ameri HMAQA, Alnahdi MM. Maintenance 4.0 for HVAC systems: Addressing implementation challenges and research gaps. Smart Cities. 2025;8(2):66.
[DOI] -
4. Teraoka H, Balaji B, Zhang R, Nwokafor A, Narayanaswamy B, Agarwal Y. Buildingsherlock: Fault management framework for HVAC systems in commercial buildings. 2014. Available from: https://www.synergylabs.org/Balaji_TechnicalReport2014_BDSherlock.pdf
-
5. Mirnaghi MS, Haghighat F. Fault detection and diagnosis of large-scale HVAC systems in buildings using data-driven methods: A comprehensive review. Energy Build. 2020;229:110492.
[DOI] -
6. Ali H, Salleh MNM, Saedudin R, Hussain K, Mushtaq MF. Imbalance class problems in data mining: A review. Indones J Electr Eng Comput Sci. 2019;14(3):1560-1571.
[DOI] -
7. Yan K, Chong A, Mo Y. Generative adversarial network for fault detection diagnosis of chillers. Build Environ. 2020;172:106698.
[DOI] -
8. Tra V, Amayri M, Bouguila N. Unsupervised outlier detection using neural network-based mixtures of probabilistic principal component analyzers for building chiller fault diagnosis. Build Environ. 2022;225:109620.
[DOI] -
9. Asesh A. Variational Autoencoder Frameworks in Generative AI Model. In: 2023 24th International Arab Conference on Information Technology (ACIT); 2023 Dec 6-8; Ajman, United Arab Emirates. Piscataway: IEEE; 2023. p. 01-06.
[DOI] -
10. Fan C, Li X, Zhao Y, Wang J. Quantitative assessments on advanced data synthesis strategies for enhancing imbalanced AHU fault diagnosis performance. Energy Build. 2021;252:111423.
[DOI] -
11. Dewangan G, Maurya S. Fault diagnosis of machines using deep convolutional beta-variational autoencoder. IEEE Trans Artif Intell. 2022;3(2):287-296.
[DOI] -
12. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial networks. Commun ACM. 2020;63(11):139-144.
[DOI] -
13. Yan K, Su J, Huang J, Mo Y. Chiller fault diagnosis based on VAE-enabled generative adversarial networks. IEEE Trans Autom Sci Eng. 2022;19(1):387-395.
[DOI] -
14. Tra V, Amayri M, Bouguila N. Latent code description for unsupervised AHU fault detection using adaptive adversarial autoencoder. IEEE Trans Autom Sci Eng. 2024.
[DOI] -
15. Zhang J, Li Z, Chen H, Cheng H, Xing L, Wang Y, et al. Integrated generative networks embedded with ensemble classifiers for fault detection and diagnosis under small and imbalanced data of building air condition system. Energy Build. 2022;268:112207.
[DOI] -
16. Ahmad Z, Jaffri ZA, Chen M, Bao S. Understanding GANs: Fundamentals, variants, training challenges, applications, and open problems. Multimed Tools Appl. 2025;84:10347-10423.
[DOI] -
17. Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, editors. 2020 Dec 6-12; Vancouver, Canada. United States: Curran Associates Inc. 2020. p. 6840-6851. Available from: https://graphics.stanford.edu/courses/Denoising%20Diffusion%20Probabilistic%20Models%202006.11239.pdf
-
18. Song MK, Niaz A, Umraiz M, Iqbal E, Soomro S, Choi KN. Denoising diffusion-based image generation model using principal component analysis. IEEE Access. 2024;12:170487-170498.
[DOI] -
19. Zhang X, Zhang W, Wen S, Ding Q. Augmentation framework for HVAC fault diagnosis based on denoising diffusion models. J Build Eng. 2025;106:112646.
[DOI] -
20. Zhang J, Zhang C, Lu J, Zhao Y. Domain-specific large language models for fault diagnosis of heating, ventilation, and air conditioning systems by labeled-data-supervised fine-tuning. Appl Energy. 2025;377(Part A):124378.
[DOI] -
21. Langer G, Hirsch T, Kern R, Kohl T, Schweiger G. Large language models for fault detection in buildings’ HVAC systems. In: Jørgensen BN, Ma ZG, Wijaya FD, Irnawan R, Sarjiya S, editors. Energy Informatics. Cham: Springer; 2024. p. 49-60.
[DOI]
Copyright
© The Author(s) 2025. This is an Open Access article licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Publisher’s Note
Share And Cite