Surface electromyography-based action recognition towards exoskeleton control for construction tasks

Zihao Zhou; Yantao Yu

doi:10.70401/jbde.2026.0038

Surface electromyography-based action recognition towards exoskeleton control for construction tasks

Zihao Zhou

Yantao Yu

Affiliation +

*Correspondence to: Yantao Yu, Department of Civil and Environmental Engineering, Hong Kong University of Science and Technology, Hong Kong, China. E-mail: ceyantao@ust.hk

J Build Des Environ. 2026;4:2025131. 10.70401/jbde.2026.0038

Received: December 05, 2025Accepted: April 09, 2026Published: April 14, 2026

This article belongs to the Special lssue Health and Safety Management in Construction: Innovations and Challenges

Abstract

Construction jobs involve heavy, repetitive loads and awkward postures that are prone to increased risk of musculoskeletal injury. As a result, a recognized demand exists for dependable exoskeleton support to improve workers’ safety and productivity. The main objective of this project was to enhance the controls based on the ability of surface electromyography (sEMG) to obtain the motion intentions of its users, preventing mistimed torque and excessive force. To solve the issues of signal noise and inter-individual variability, this study proposes a hybrid model, which combines nonnegative matrix factorization to extract transferable muscle coordination patterns, a convolutional neural network to learn robust local cross-channel features, and a Transformer encoder to capture long-range temporal dependencies. This approach was tested on a bricklaying dataset that mimics the actual conditions on construction jobsites. The model performance was 87% in action-recognition accuracy, and generalized across subjects, illustrating robustness to signal artifacts along with differences in tools, postures, and workloads. The results support the proposal of sEMG as a non-invasive practical control signal for exoskeletons and indicate that the proposed model can enable timely and precise assistance on a variety of tasks. This study extends the previous research in intent recognition for field-ready exoskeleton manipulation and establishes the foundation for future multi-task adaptation and personalized assistance in the construction process.

Keywords

Surface electromyography, activity recognition, transformer

1. Introduction

The‌ construction industry is still mainly a labour-intensive sector where the operations are done manually. Due to this reliance, musculoskeletal injuries among workers, which are generally caused by elevated physical loads and awkward postures, are the most frequent type of injuries^[1]. The introduction and raising of such new technologies as modular integrated construction, systematic automation, and human-machine collaboration can significantly increase the sector’s productivity and reduce the local occurrence of worker hazards in construction^[2]. Nevertheless, the existence of various obstacles such as shortage of on-site data, highly repetitive tasks, and processes that are difficult to standardize, impedes the embracing of such technologies^[3,4]. The industry is still facing the problem of how to increase worker productivity and at the same time decrease the risk of ‌injuries.

Exoskeletons, as one of the wearable technologies, have been demonstrated to increase productivity, mitigate fatigue, and lower the risk of injuries^[5,6]. It is reported that both passive and powered exoskeletons are progressively being employed in the construction sector for activities that involve manual lifting and repetitive movements^[7]. Evidence from the recent studies shows that in the specified experiments and on workers who assume certain postures, the use of these devices can lessen muscle activation levels and prolong the time before fatigue sets in, thereby reducing the ergonomic risk associated with physical labour^[8]. To be more precise, the research on the aim of supporting the shoulders and the back has revealed that the activation of the lumbar muscles in the manual handling tasks is ‌decreased^[9].

Recent developments have seen soft pneumatic actuators increasingly applied in exoskeletons. These devices assist muscle movement through airbag support, offering the advantage of greater portability. For instance, new devices utilizing flat inflatable artificial muscle technology can alleviate strain on back muscles^[10], while asymmetrically expanding artificial pneumatic muscles reduce load on the biceps brachii^[11]. However, a current limitation of such devices is their insufficient dynamic responsiveness, preventing timely and precise delivery of optimal muscle assistance.

Nevertheless,‌ in the construction sector, exoskeleton extension is full of difficulties. The actuality of the construction surroundings is non-ideal, characterized by noise and complex decision-making. Therefore, the exact way of recognizing the user’s intent is very hard^[12]. A delayed or false input can not only lower the efficiency of workers but also cause wrong positions that double the possibility of getting an injury^[13]. To solve these problems, surface electromyography (sEMG) has been chosen as a suitable, non-invasive way of communication that can be used as a source of assistive systems, which can respond dynamically to construction task demands in different ‌situations^[14]. By‌ using sEMG signals as the source of information for exoskeleton systems, the identification and classification of the movement intentions of workers can be significantly improved. The breakthrough in deep learning over the last few years has paved the way for the use of multi-channel sEMG data in the domain of action classification. Employing feature-learning strategies based on convolutional neural network (CNN) and recurrent neural network (RNN), these methods could efficiently be implemented for the precise recognition of complex employee ‌activities^[15].

sEMG has many challenges, including noise in field applications (e.g., electrode displacement and perspiration). These challenges significantly affect the sensor’s signal quality and classification performance^[16]. These‌ issues are complicated even more in the construction field. In such tough situations on a person’s body, irregular abnormal behaviours, and fatigue that worsen the sensor’s accuracy, as these affect the changes of the actions and intentions. As workers multitask and handle several tools, the uncertainties resulting from the changing of work positions and the transition of tasks that lead to great individual differences in the electromyographic (EMG) behavioural patterns and their spatiotemporal ‌manifestations.

One‌ way of tackling variability is through the usage of modelling techniques that are capable of understanding and generalizing the transferability of action representations across different individuals without the need to adapt the unique features of a single person. Without a robust cross-individual generalization capability, classifiers struggle to recognize the behaviours performed by workers with the same characteristics. Model classification levels can therefore be dependent on the changes of muscle activation patterns or differences in tasks. Such differences may result in the occurrence of force feedback being too early or too late, thereby safety in construction, worker’s comfort, and construction efficiency may be adversely influenced. Hence, the issue of real-time high-accuracy action classification is still quite intricate. The systems should be able to eliminate signal artifacts, improve cross-individual co-representation learning, and achieve high accuracy in the recognition of similar action patterns of multiple workers to solve this problem.

This study focuses on the effective classification of worker movements using sEMG in real construction sites. The research objective is to develop a robust recognition framework capable of operating effectively across diverse worker populations, task types, and environmental conditions. Additionally, this study employs non-negative matrix factorization (NMF) to extract muscle co-activation patterns from subjects, effectively reducing noise interference in multi-channel signals. CNNs focus on local features to identify distributed characteristics of specific actions. Furthermore, the Transformer encoder enables recognition in construction scenarios by capturing long-term temporal context.

Consequently, the model demonstrates robust classification accuracy under local noise and across different subjects, revealing significant potential for exoskeleton control applications. These findings indicate that the proposed model enhances bricklayer motion classification precision and signal robustness, establishing a viable foundation for real-time sEMG-based exoskeleton systems suitable for field deployment.

2. Literature Review

2.1 Applications of exoskeletons in construction

In the construction industry, exoskeleton devices are widely recognized as having significant potential to enable workers to perform tasks more efficiently and safely. However, numerous obstacles remain due to limitations in equipment configuration and the inability to fully adapt to construction environments^[17]. Brunner et al.^[18] show that specific devices, e.g., shoulder support systems have been proven effective in alleviating stress during high-position work by utilizing external joint torque or load transfer mechanisms. Back support exoskeletons reduce musculoskeletal stress on the torso during bending and lifting tasks, maintaining spinal stability while decreasing pressure on the spine^[19]. Among classification methods, De Looze et al.^[20] classify exoskeletons into active and passive actuation mechanisms. Passive exoskeletons employ mechanical structures, such as springs, elastic bands, or dampers, to store and release energy without needing an external power source. Active exoskeletons use motors, pneumatic, hydraulic, or similar systems to produce force, in contrast with passive exoskeletons, which do not depend on an external power source. Sensor fusion techniques, including force, position, inertial measurement unit (IMU), and electromyography sensors, are integrated into the control structures of active exoskeletons to detect human intention and enable human-driven mechanical behaviour accurately^[21,22].

2.2 Application of sEMG in worker exoskeletons

sEMG is a potential source of a critical feedback signal in exoskeleton control. It makes the device not only recognize the movement of the worker but also respond to it intelligently and proactively, hence increasing the exoskeleton’s usability. sEMG-based control methods mainly fall into three broad categories: threshold control, proportional control, and pattern recognition, which differ significantly in complexity levels from simple threshold processing to complicated intent inference. Inherently, a threshold control, which is only capable of performing simple tasks is insufficiently flexible^[23]. Hence, more sophisticated approaches involve the use of mathematical models. For example, an inverse proportional relationship between exoskeleton output and sEMG signal amplitude can be set to linearly optimize control ‌performance. In recent years, the application of machine learning technology has enabled task pattern recognition, analysing muscle coordination control through sEMG signals to predict worker intent. Apart from its role in feedback signal, sEMG is considered helpful for ergonomic assessment. For instance, Bangaru et al.^[24] highlight that a substantial use of sEMG in worker systems is the real-time measurement of workload and muscle fatigue. Additionally, muscle movement pattern recognition helps enhance the model’s generalization ability on multi-person datasets^[25]. Furthermore, it has been suggested that sEMG combined with IMU Euler angle data provides a more comprehensive approach for identifying hazardous postures^[26].

2.3 Current challenges and limitations

Among these significant limitations noted in the literature, cross-individual distribution shifts, which remain a key factor in recognition precision errors, have been identified. Discrepancies in muscle morphology, subcutaneous fat distribution, and exercise tactics lead to drifts in amplitude and spectral features. As a result, significant performance decreases are often observed during the “leave-one-subject-out (LOSO)” or in testing. A study by Gong et al.^[27] shows that standard deep-learning models exhibit limitations in cross-subject generalization, particularly in real-world scenarios involving micro displacement of electrodes and perspiration artifacts. Ling et al.^[28] demonstrate that adversarial domain adaptation produces good, albeit modest, improvements, although Chen et al.^[29] show that contrastive learning remains robust to tool switching. It concerns the modelling of the coordination of muscle group function. The approach of a few modern designs has been to include graph and spatial topology, such as adaptive graph convolutional network (GCN) and spatial topology techniques, including adaptive GCN and spatial-temporal GCN, to improve cross-subject alignment among subjects^[30]. Yet, Ferdousi et al.^[31] also contend that topology learning in data-based design for optimization may be unstable when array geometry is biased or when the electrode positions vary with respect to the topology learned from data-based optimization.

Furthermore, interpretability is hindered by their lack of physiological a priori constraints. For HD-sEMG, despite manifold convolutions that provide robustness against local noise, performance degradation is observed, especially with a low number of irregular electrode arrays and the absence of channels^[32]. Thus, meta-learning, together with personalized adaptation, has proven to be a promising avenue for enhancing inter-person performance. Ye et al.^[33] show that architectures such as Adapter/FiLM, when trained with sparse calibration or meta-learning, can achieve fast convergence and performance improvements for novice operators. However, empirical data show that many industrial environments experience challenges related to memory issues and cumulative biasing during computation, driven by a combination of calibration costs, delayed operation, and extended drift. Moreover, Al-Li et al.^[34] emphasize that online self-training methods require stronger safeguards to ensure the validity of non-standard labels, such as pseudo-labels, and to reduce the transmission of errors. Finally, regarding multi-modal fusion, which combines sEMG with IMUs, force sensors, or visual input, Rolandino et al.^[35] demonstrate stable increases in inter-subject motion alignment, and that the temporal anchors reduce inter-individual variability in sEMG. However, recent studies reveal some of the challenges to widespread use, especially in noisy construction settings. According to Bi et al.^[36], difficulties, including cross-modal synchronization errors, heterogeneous devices, and constraints to energy usage and privacy, further complicate matters.

3. Methods

This study implemented the framework shown in Figure 1 for collecting and processing data from subjects before feeding it into the classification model. Multi-channel sEMG signals recorded from designated muscle regions in the arm and lower back undergo fixed pre-processing before serving as input signals. Following self-organizing clustering decomposition, data is projected onto the subspace of the muscle synergy matrix to extract stable H-statistic features. Finally, a CNN-Transformer hybrid model integrates local and long-range temporal information to achieve action classification.

Display Full Size

Figure 1. Data processing flowchart. CNN: convolutional neural network.

3.1 Data acquisition and pre-processing

A total of 10 workers were organized into teams to perform bricklaying, plastering, and rebar tying tasks. The behavioural patterns for each action are shown in Figure 2. The experiment was conducted at an outdoor construction site, each working 5 to 10 minutes. The actions of workers were classified into four categories: bending, laying bricks, plastering, and measuring verticality. The dataset is characterized by many repetitive construction actions. The sEMG sensor captured six muscle channels, two trapezius muscles, the biceps, and triceps of both side muscles. Electrode placement followed SENIAM or equivalent standards, with protocols ensuring consistent electrode positioning and fatigue control. Skin was cleaned using 75% alcohol swabs, and electrode spacing was maintained uniformly.

Display Full Size

Figure 2. Action classifications of bricklaying Action classifications of bricklaying.

The sEMG signals were sampled at 2 kHz. To strictly prevent any potential data leakage, the dataset was partitioned into training and testing sets prior to the sliding window segmentation. Specifically, for each subject, sEMG recordings from the first 80% of the experimental trials were assigned to the training set, while the remaining 20% of the trials were reserved for the test set.

Only after this temporal separation were the continuous signals segmented into overlapping time windows of 200 ms with a 50% overlap rate. After extracting and processing these discrete segments, the total number of valid samples reached 25,844. Among these, Category 2 accounted for the largest proportion (58%), while the remaining three categories were distributed relatively evenly.

Line interference was removed with 50/60 Hz notch filtering (including harmonics when needed), followed by a 20-450 Hz band-pass filter. Signals were then full-wave rectified and smoothed with a moving root mean square (RMS) envelope of 10-20 ms, implemented as follows.

(1) $RMS [t] = \sqrt{\frac{1}{M} \sum_{τ = t - M + 1}^{t} x [τ]^{2}}$

3.2 NMF

Given a windowed sEMG matrix V, the NMF is:

(2) $V \approx W * H, W \in R_{+}^{C \times K}, H \in R_{+}^{K \times T}$

where W represents a nonnegative basis (muscle synergy patterns, each column spans a cross-channel co-activation), and H represents the nonnegative activations (time-varying weights of each synergy). K means the number of synergies. This additive, parts-based representation aligns with muscle coordination and yields a low-dimensional, noise-robust embedding. The number of components K was selected via nested cross-validation using a composite criterion (reconstruction error, downstream validation accuracy, and cross-subject stability of W measured by bootstrap cosine similarity). For sEMG, K = 6-8 consistently maximized performance; the K value selected for this experiment is 8 because a higher K value results in higher feature resolution input to subsequent models, thereby improving recognition accuracy for similar actions. The NMF module is implemented as a fixed offline pre-processing step. The muscle synergy basis matrix is extracted solely from the training dataset. During the testing phase, this learned basis matrix remains frozen; synergy patterns are not re-estimated. Instead, unseen test signals are projected onto the fixed basis matrix via non-negative least squares to compute time-activation coefficients.

3.3 Residual passthrough of the raw waveform

To retain high-frequency and subject-specific details, a parallel residual passthrough was introduced. A learnable nonnegative projection produced synergy activations as follows.

(3) $\hat{H} = W^{⊤} V, W \geq 0$

The two streams were concatenated along channels as input to the backbone.

(4) $X_{in} = Con (\hat{H}, V^{'}) \in R^{(K + C) \times T}$

3.4 Hybrid CNN-transformer model

To address complementary challenges in sEMG classification under construction-site conditions, a 1D convolutional front-end is coupled with transformer encoders. The CNN primarily mitigates local variability and channel-level artifacts; the transformer resolves long-range temporal ambiguity and rate variability.

3.4.1 1D-CNN module

Convolutions extract locally stable, cross-channel co-activation patterns from NMF components, providing invariance to small time shifts, electrode displacement, amplitude scaling, and perspiration-induced impedance changes. Pooling attenuates transient spikes and motion artifacts, improving signal-to-noise ratio.

Three 1D convolutional blocks with kernel sizes of 5, 3, and 3; filter counts of 32, 64, and 128. Each block includes batch normalization, ReLU activation, and max pooling with stride 2 for down sampling and denoising.

3.4.2 Transformer encoder module

Self-attention integrates information over extended horizons without a fixed receptive field, enabling discrimination of locally similar segments that correspond to different actions, loads, or execution styles, and accommodating variable-duration contractions typical of job-site work. Attention mechanisms emphasize salient time steps, improving robustness when informative segments are sparse within noisy windows.

One to two encoder layers with multi-head attention (4 heads; head dimension 32; model width 128), pre-layer normalization for each sublayer, residual connections throughout, position encodings, and a feed-forward network with dimensionality 128.

In summary, CNN addresses short-term, channel-level noise and alignment variability, while the transformer captures long-range dependencies and action context. This division of labor enhances cross-subject and cross-task generalizability under field conditions, in line with decomposition-first feature extraction and supervised classification practices reported in construction ergonomics.

3.5 Training protocol

The model was trained using the following optimization strategy. The categorical cross-entropy was used as the objective function:

(5) $L = - \sum_{i = 1}^{N} \sum_{c = 1}^{C} y_{i, c} \log ({\hat{y}}_{i, c})$

where y_i_,_c is the true label, ${\hat{y}}_{i, c}$ is the predicted probability, N is the number of samples, and C is the number of classes. The Adam optimizer is chosen with an initial learning rate of 0.001. A reduce-on-plateau strategy decreased the learning rate by a factor of 0.5 when validation loss stopped improving for 10 epochs. A batch size of 32 was used during training. The model was trained for a maximum of 100 epochs, subject to early stopping. All random processes during training, including weight initialization, batch shuffling, and dropout, were controlled by a fixed random seed to ensure reproducibility of results.

3.6 Model building

Four model types were employed in this study: a conventional CNN, a long short-term memory (LSTM)-based model, a feature-based multilayer perceptron (MLP) model for analysing the importance of sEMG-related features, and the proposed model. To enhance the model’s ability to capture both time-frequency information and muscle coordination patterns, several modifications were made to the baseline architectures.

3.6.1 Baselines

In addition to fundamental models such as the classic CNN, LSTM, and SVM, this study also selected bidirectional long short-term memory (BiLSTM) and recurrent convolutional neural network (RCNN). In manual handling tasks more akin to bricklaying, these two models demonstrated particularly strong performance^[37]. The selection of these two models stems from the ability of BiLSTM to capture long-term dependencies compared to classical baseline models, thereby resolving the vanishing gradient issue inherent in standard RNNs. Additionally, it leverages bidirectional contextual information to aid in classifying ambiguous actions. RCNN not only effectively extracts feature correlations across multiple channels but also enhances robustness by reducing dimensionality through CNN + RNN processing while filtering out high-noise components. Due to the scarcity of construction worker data, no extensive public datasets were incorporated into the experiments. Consequently, all models were trained using datasets collected on-site.

(a) CNN

The CNN model consists of two consecutive 1D convolutional layers with 32 and 64 filters, kernel sizes 5 and 3, respectively, followed by batch normalization and max pooling. The output feature map was flattened for processing and subsequently applied to a dense layer (128 units, ReLU activation) prior to classification using a softmax output layer.

(b) LSTM and LSTM-discrete wavelet transform (DWT)

For the LSTM model, the input sEMG windows were processed using the DWT, which decomposed the signal into different frequency sub-bands to obtain time-frequency characteristics. DWT coefficients were then fed into a stacked LSTM network (two layers, 64 units each) to capture temporal dependencies. Furthermore, a muscle coordination processing module generated a feature-aggregate expression based on these inter-channel relationships, producing the multivariable model output from which the model extracts all muscle coordination features at the LSTM output. However, the LSTM-based model with DWT and the muscle coordination module showed poor classification performance, suggesting that it is challenging to model the features of complex muscle attributes.

Unlike raw signals, a broader set of time-domain and frequency-domain features with significant capability in the description of sEMG patterns was utilised (Time domain features: Mean Absolute Value, RMS, Variance, Waveform Length, Zero Crossing Rate, Slope Sign Changes, Signal Energy, Higher order Moments (Skewness, Kurtosis), and Frequency domain features: Mean Frequency, Median Frequency, Root Mean Square Frequency, Total Power, Peak Power and Peak Frequency). From the initial feature pool, the SelectKBest algorithm was employed to select the 30 most discriminative features based on the ANOVA F-statistic as the screening criterion, thereby maximizing inter-class separability while limiting dimensionality. The standard MLP model incorporates multiple feature association types: two hidden layers (128 and 64 neurons), ReLU activation functions, dropout regularization (0.4, 0.3), batch normalization after each hidden layer, and L2 regularization (λ = 0.001). The result is shown in Figure 3. Feature importance analysis for the Random Forest model indicates that statistical and energy-related metrics hold the highest significance, with variance and RMS being the most important. Kurtosis‌ and the third moment are also showing a large influence. Features of medium importance (like total energy, skewness, instantaneous energy, standard deviation, maximum absolute value, and peak characteristics) help to differentiate the subtle changes in motion; thus, the model can be used to recognize more complex behavioural patterns in a robust way. Essentially, the model decisions depend mostly on the features that capture the general pattern and energy amplitude of the signal, whereas many local characteristics have little ‌effect.

Display Full Size

Figure 3. Random forest feature importance. RMS: root mean square; MAE: mean absolute error; MAV: mean absolute value.

(d) BiLSTM and RCNN

Based on Bassani et al.^[37] experiment, with corresponding modifications to the data format and processing sections, appropriate parameters were selected as the baseline model for this experiment. The specific parameter selection and model configuration for BiLSTM involve a sliding window slice of 200-time steps with a stride of 100. For the training data, a hybrid resampling strategy is employed: under-sampling the majority class, over-sampling the minority class, and introducing a small amount of Gaussian noise to align the sample counts of each class to the average category size. The model’s input layer consists of time series data with shape (200, 4). The bidirectional LSTM layer contains 100 hidden units. L2 regularization is applied to this layer to prevent overfitting. The fully connected layer comprises 100 neurons using the ReLU activation function, also with L2 regularization at a coefficient of 0.0001. The output layer comprises 4 neurons using the Softmax activation function to output classification probabilities. The Adam optimizer is employed. The initial learning rate is set to 0.001. A multi-class cross-entropy loss function is utilized. The batch size is set to 128, with a maximum of 100 iterations. Learning rate decay and early stopping are adopted as training strategies. Five-fold stratified cross-validation is used for evaluation.

Due to significant differences in data format and reference studies, the R-CNN model was modified based on the original baseline model. The time window was set to 240-time steps to cover the complete action cycle, with a stride of 120. Hierarchical feature extraction employed 32, 64, and 128 convolutional kernels, respectively. The processed new data structure was input into a two-layer stacked bidirectional GRU. The first layer contained 64 units, and the second layer contained 32 units. Dropout was set to 0.3 to suppress overfitting. Dynamic learning rate and early stopping mechanisms were used. Five-fold hierarchical cross-validation was also employed.

(e) GCN-based model

To clearly evaluate the model’s effectiveness in modeling inter-channel relationships and muscle synergies, a GCN-based baseline model was compared. The temporal module employed a two-layer 1D-CNN (with kernel sizes of 5 and 3, and output channel counts of 16 and 32 respectively), combined with adaptive average pooling layers, to independently extract temporal dynamic features from six sEMG channels. For spatial modeling, a two-layer GCN (with a hidden layer dimension of 64) was employed. This GCN utilized a learnable 6 × 6 adjacency matrix normalized via Softmax, with a dropout rate of 0.5.

(f) Proposed model

To‌ extract the activation coefficients that reflect the muscle synergy patterns, NMF was performed on each pre-processed sEMG window. These NMF-derived features were combined as extra inputs to the Transformer, together with the features obtained by a 1D-CNN module. This combined architecture is referred to as NMF-CNN-Transformer, with its parameters shown in Table 1.

Table 1. Model parameters.

Display Full Size

Properties	Value
Number of NMF components	8
Sequence Length	20
Model dimension	64
Number of Attention heads	8
Number of Transformer layers	3
Feed-forward network dimension	25
Dropout probability	0.001
Number of output classes	4

NMF: non-negative matrix factorization.

4. Results

The results compared with different baseline models are shown in Table 2. Traditional CNNs and LSTM models exhibit significantly lower classification accuracy on real-world datasets compared to other models. Therefore, this paper will focus on comparing the results of the proposed model with those of BiLSTM, RCNN, and Transformer. As shown in Figure 4, the proposed model achieved the highest classification accuracy (86%), performing best among all tested models. MLP achieved an accuracy rate of 76%, slightly higher than the Transformer model (73%). Additionally, both BiLSTM and RCNN achieved accuracy rates around 75% as shown in Figure 5. However, none of these models meet the requirements for practical application.

Display Full Size

Figure 4. Model performance comparison. MLP: multilayer perceptron.

Display Full Size

Figure 5. K-fold cross-validation results plot (top: BiLSTM, bottom: RCNN). BiLSTM: bidirectional long short-term memory; RCNN: recurrent convolutional neural network.

Table 2. Accuracies (%) of different recognition experiment.

Display Full Size

Method	Accuracies
CNN	0.41
LSTM	0.45
MLP	0.76
Transformer	0.73
BiLSTM	0.77
RCNN	0.76
GCN-based	0.64
Our Model	0.86

CNN: convolutional neural network; LSTM: long short-term memory; MLP: multilayer perceptron; BiLSTM: bidirectional long short-term memory; RCNN: recurrent convolutional neural network; GCN: graph convolutional network.

Given that accuracy rates across cross-validation folds may not strictly follow a normal distribution, we employed the Wilcoxon signed-rank test. Statistical testing was conducted between the proposed model and the optimal benchmark model. The result yielded p < 0.05, below the standard significance level. To examine the independent contribution of each module to the model, this study also conducted basic ablation experiments. The results are shown in the Table 3. When removing NMF, directly feeding the raw sEMG signal into deep networks introduces substantial noise and high-dimensional redundancy. This demonstrates that NMF is crucial for extracting effective features of muscle synergy patterns. Conversely, relying solely on CNN after NMF filtering leads to a significant decline in action classification performance due to the lack of long-term dependencies, indicating that the self-attention mechanism of Transformers inherently excels at capturing complex structural patterns.

Table 3. Results of the ablation study.

Display Full Size

Model Variant	Accuracy (%)	F1-Score (%)
Proposed Full Model	0.86	0.84
w/o NMF (Raw sEMG)	0.75	0.74
w/o Transformer	0.55	0.53

NMF: non-negative matrix factorization; sEMG: surface electromyography.

To evaluate the generalization ability of the proposed model, “LOSO” cross-validation was performed on all 10 subjects. As shown in Table 4, the overall average classification accuracy reached 65.90% ± 4.98%. It is worth noting that due to the unique characteristics of each subject, results varied across individuals, with accuracy ranging from a low of 58.00% to a high of 73.50%.

Table 4. Results of LOSO validation.

Display Full Size

Test Subject (unseen)	Accuracy (%)
Subject 1	69.50
Subject 2	58.00
Subject 3	65.50
Subject 4	71.00
Subject 5	62.50
Subject 6	68.00
Subject 7	59.50
Subject 8	73.50
Subject 9	64.00
Subject 10	67.50
Average	65.90

LOSO: leave-one-subject-out.

Figure 6 shows the training curve results for the model. The fluctuations in the curve result from the real dataset failing to meet the ideal data distribution structure. The model’s test and validation sets do not achieve ideal fitting conditions, indicating that there is room for improvement. For more precise results, please refer to Table 5.

Display Full Size

Figure 6. Training process of the proposed model.

Table 5. Results of the model of recognition accuracy.

Display Full Size

Metric	Mean	Std
Precision	0.8690	0.0499
Recall	0.8501	0.0501
F1	0.8405	0.0499
Accuracy	0.8652	0.0363
Specificity	0.9460	0.0193

The confusion matrix for all four action classifications is shown in Figure 7. Based on the results on the main diagonal, the model shows a high classification performance, indicating that most samples are correctly identified. Among these, the mortar spreading action has the highest accuracy of 90%. Bricklaying is closely behind, though correct in most cases, some of it remains misclassified in adjacent categories. To the contrary, the accuracy rate for line drawing and measurement is lower; these two are more prone to confusion. A similar pattern of misclassification was observed in the bending actions, with most being misclassified into a single target rather than distributed evenly.

Display Full Size

Figure 7. Confusion matrix.

The confusion centres on the line-drawing and brick-laying procedures on the one hand, and a shift in measurement procedures toward a specific category on the other. Thus, boundary overlap arises from characteristic similarity or from an imbalance in the sample distributions between the classes. By contrast, mortar-trowelling activity exhibits more distinct, stable discriminative features, making it more straightforward to categorize. This proposed method can be used to mimic muscle activation patterns associated with manual labour force movements and can serve as a tool for ergonomic assessment, safety monitoring, and work efficiency analysis in the construction industry. Its high accuracy and real-time feedback also enable generalization to exoskeleton control systems that assist workers’ movements.

5. Discussion

From a theoretical perspective, the suboptimal performance of benchmark methods in classical models demonstrates that purely data-driven architectures lack the necessary inductive bias to handle the high noise inherent in data beyond the receptive field and the non-stationary characteristics of sEMG signals under conditions of scarce field test data. While BiLSTM and RCNN partially mitigate noise such as by enhancing contextual sensitivity through bidirectional processing and reducing dimensionality while filtering high-frequency noise; their failure to fully resolve classification tasks indicates that multi-resolution spatiotemporal features alone are theoretically insufficient to address cross-subject heterogeneity in data-scarce scenarios. Conversely, the success of the NMF-CNN-Transformer architecture validates the importance of physiological foundations. This demonstrates that muscle synergies form stable low-dimensional manifolds in neuromuscular control, exhibiting robustness across individuals, unlike raw sensor-level features.

Compared to intra-subject evaluations, a significant drop in accuracy was observed in the LOSO scenario. This decline is primarily attributed to inter-subject variability. Furthermore, slight electrode shifts and differences in motor characteristics among individuals led to significant discrepancies between results across different subjects. From an algorithmic perspective, the spatial muscle synergy basis matrix extracted via NMF exhibits strong subject-specificity. This static feature extraction mechanism lacks dynamic adaptability. Despite challenges in cross-subject generalization, the proposed model still achieves higher classification accuracy compared to the baseline model.

In practice, these findings are crucial for deploying reliable human-machine interfaces. The integration of synergistic patterns with Transformer sequence modelling effectively focuses on task-critical actions, particularly for exoskeleton-assisted construction tasks. On one hand, noise suppression occurs directly at the data processing level. On the other hand, it enhances action classification capabilities across subjects, offering solutions to persistent challenges in worker applications, such as insufficiently coordinated force feedback and inaccurate motion recognition when using exoskeletons. Consequently, this approach breaks free from the strict constraints of laboratory environments, enabling preliminary field application trials.

Although the overall performance has significantly improved after incorporating muscle synergy characteristics, the current model architecture still has some limitations. First, while synergy features provide some interpretability, errors persist in distinguishing similar movements during practical applications. Unsupervised clustering decomposition in NMF may introduce bias. Consider incorporating an L1 sparsity penalty term to extract collaborative patterns more aligned with true physiological significance. Combining this with supervised feature extraction is also feasible, though such methods carry inherent risks: algorithms may simulate distorted collaborative patterns that do not exist in human physiology in pursuit of higher accuracy. Additionally, the dataset is based on a small sample of 10 bricklayers, resulting in class imbalance. For example, class 1 may have a larger proportion. Constrained by sample size, the model risks overfitting to specific movement habits, potentially developing a bias toward high-frequency repetitive actions. Although the model performs well when trained on diverse subject datasets, achieving true zero-shot cross-subject generalization (e.g., via LOSO evaluation) remains challenging due to the inherent physiological variability of sEMG. Future research will explore domain adaptation techniques to enable subject-independent recognition without individual calibration. Consequently, this algorithm requires further validation on new tasks with different class distribution patterns. Simultaneously, individual worker characteristics can be extracted via the NMF module to form more universal coordination patterns compatible with diverse subjects. For the bricklaying task, considering the convenience and safety of sEMG sensor placement, six muscle channels have been selected at present. When extending to different occupations in the future, additional muscle sites can be selected based on task characteristics and worker clothing requirements to enhance the model’s generalization capability.

The‌ findings of this research, in general, illustrate the following accomplishments: A time-frequency decomposition-based processing method leads a system to be more robust against signal artifacts. Context-aware sequence modelling with a combination of CNNs and Transformers can distinguish even more subtle differences between similar actions. Feature extraction with the help of synergistic decomposition makes a system less sensitive to inter-subject ‌variability. However, significant computational constraints remain. To translate these advances into practical applications, future research should prioritize exploring data-efficient generalization strategies.

Furthermore, deploying this model in actual construction sites still faces numerous challenges. For instance, workers must wear specific work attire (such as belts and vests), which may prevent electromyography sensors from accurately adhering to target muscles. Accidental events like collisions could also cause sensor displacement or detachment. Expanding diverse field datasets and standardizing electrode placement protocols will help reduce variability and validate the robustness of research findings in real-world environments.

6. Conclusions

The aim of this study was to contribute to a wide-ranging field-based improvement in sEMG recognition for construction, addressing certain issues such as limited and heterogeneous data, high noise, non-stationarity and pronounced inter-subject variability. The analysis has demonstrated that physiologically grounded feature learning combined with expressive temporal modelling provides a promising approach. The proposed NMF–CNN–Transformer architecture provides explainable, low-dimensional co-activation patterns by incorporating muscle synergy factorization, a compact convolutional front end for capturing locally specific structure and a Transformer to model long-range temporal dependencies. In comparison with CNN, LSTM, MLP, and Transformer baselines, the achieved model outperformed on a real-world bricklaying dataset, obtaining the highest accuracy with continued good precision, recall, F1 and specificity among all baselines. The removal of DWT-based time–frequency features led to a weaker but still significant reduction in accuracy while demonstrating the significance of robustness to motion artifacts and transient activations common in on-site recordings. It is particularly noteworthy that the proposed NMF-CNN-Transformer framework employs a sequential pipeline processing approach. This design aims to preserve the physiological integrity of extracted muscle synergies. Jointly optimizing NMF with deep learning classifiers may lead to minimizing classification loss, thereby compromising their true biomechanical significance. Furthermore, compared to existing paradigms such as GCNs, this physiologically constrained approach does not rely on rigid spatial graph structures. Muscle coordination patterns represent relative states of cooperation, making them more adaptable to physical spatial variations caused by different body types. This enables dynamic capture of functional muscle connectivity relationships. Overall, this indicates that mapping of sEMG into a synergy-informed subspace followed by multi-scale temporal structure facilitates recognition accuracy and interpretability for more realistic operational constraints.

In current research on construction worker motion recognition, due to the scarcity of relevant datasets, there is limited literature available for reference regarding applications on construction sites. Based on literature reviews of similar motion recognition studies, ergonomic assessment research using sEMG has introduced end-to-end deep learning models to capture features of continuous motions and temporal dependencies^[38]. However, while these methods have proven effective in controlled laboratory settings, their feasibility in real-world construction sites with noise interference remains unvalidated. In attempts to reduce model complexity, transfer learning strategies combined with attention mechanisms have been shown effective in lower-limb motion recognition experiments^[28]. However, these approaches demand a substantial quantity and quality of initial user datasets. Such purely data-driven methods overlook the biomechanical principles underlying human movement. This study enhances model interpretability by integrating physiological principles with deep learning techniques. It explicitly incorporates NMF to extract muscle synergy factor decomposition, a fundamental physiological feature.

In recent advancements, multimodal fusion (such as combining sEMG and IMU) has demonstrated strong performance in recognition. From a practical application perspective, relying solely on sEMG can significantly reduce hardware complexity and worker workload while maintaining performance. Furthermore, sEMG has broader applications in exoskeleton devices and is better suited for providing feedback for the control of exoskeletons used by workers. Simultaneously, the proposed model can also be integrated into future multimodal systems to further improve overall performance.

As a result, the future direction for research should focus on three interrelated directions. Such efforts will require organizing multi-site, multi-task masonry, rebar tying, and material handling datasets so that cross-site, cross-device, and cross-season generalization can be systematically tested for task diversity and scale. To further increase learning efficiency and decrease computational costs, future studies might use self-supervised pretraining and knowledge distillation to minimize annotation requirements and model size while investigating lightweight temporal modules to minimize latency. Lastly, for real-world efficacy, ongoing work is necessary to perform a closed-loop validation, e.g., applying the model to exoskeleton controllers for real-time intent recognition to quantify assistance timing accuracy, ergonomic load reduction and safety outcomes within live construction workflows.

Acknowledgements

Some statements in this paper were translated and polished with the assistance of Gemini 3.1, which may result in a higher AI detection rate. However, the core academic arguments and experimental data are entirely original.

Authors contribution

Zhou Z: Writing-original draft, data curation, validation, formal analysis.

Yu Y: Writing review & editing, supervision, resources, project administration, methodology, investigation, funding acquisition, data curation, conceptualization.

Conflicts of interest

Yantao Yu is an Editorial Board Member of Journal of Building Design and Environment. The other author declares no conflicts of interest.

Ethical approval

This study was approved by the Human and Artefacts Research Ethics Committee of The Hong Kong University of Science and Technology (Approval No. HREP-2023-0245).

Consent to participate

All participants provided written informed consent prior to participation. All participants were fully informed about the study’s purpose, procedures, and potential risks, and they provided explicit consent to participate.

Consent for publication

Written informed consent was obtained from the participant for the publication of their identifiable image and any associated data in this article.

Availability of data and materials

Data supporting the findings of this study are available from the corresponding author upon reasonable request.

Funding

This work was supported by the Research Grants Council (Hong Kong) (Grant Nos. 26208323 and C6044-23GF).

Copyright

References

1. Chen X, Yu Y. Automatic repetitive action counting for construction worker ergonomic assessment. Autom Constr. 2024;167:105726.

[DOI]
2. Delgado JMD, Oyedele L, Ajayi A, Akanbi L, Akinade O, Bilal M, et al. Robotics and automated systems in construction: Understanding industry-specific challenges for adoption. J Build Eng. 2019;26:100868.

[DOI]
3. Jin R, Hong J, Zuo J. Environmental performance of off-site constructed facilities: A critical review. Energy Build. 2020;207:109567.

[DOI]
4. Gamage ANKK, Kumar S. Review of alternative dispute resolution methods in construction projects. Saudi J Eng Technol. 2024;9(2):75-87.

[DOI]
5. Fournier DE, Yung M, Somasundram KG, Du BB, Rezvani S, Yazdani A. Quality, productivity, and economic implications of exoskeletons for occupational use: A systematic review. PLoS One. 2023;18(6):e0287742.

[DOI]
6. Ojha A, Guo H, Jebelli H, Martin A, Akanmu A. Assessing the impact of active back support exoskeletons on muscular activity during construction tasks: Insights from physiological sensing. In: Turkan Y, Louis J, Leite F, Ergan S, editors. Computing in civil engineering 2023. Reston: ASCE; 2024. p. 340-347.

[DOI]
7. Oh J, Cho GY, Kim H. Performance analysis of wearable robotic exoskeleton in construction tasks: Productivity and motion stability assessment. Appl Sci. 2025;15(7):3808.

[DOI]
8. Golabchi A, Miller L, Rouhani H, Tavakoli M. Impact of passive back-support exoskeletons on manual material handling postures in construction. In: 39th International Symposium on Automation and Robotics in Construction (ISARC 2022); 2022 Jul 13-15; Bogotá, Colombia. Seoul: IAARC; 2022.

[DOI]
9. Al-Khiami MI, Lindhard SM, Wandahl S. Integrating exoskeletons in the construction sector: A systematic review of empirical evaluation tools and future directions. Eng Constr Archit Manag. 2026;33(3):2364-2399.

[DOI]
10. Hong T, Lee C, Chang S, Choi E, Kim B, Ahn J, et al. Design of a fully-soft lift-assist wearable suit powered by flat inflatable artificial muscles. IEEE Robot Autom Lett. 2025;10(5):4428-4435.

[DOI]
11. Lu J, Lv X, Liu J, Liu H, Huang T. Fully flexible wearable pouch pneumatic artificial muscle based on asymmetric expansion. Chin J Mech Eng. 2026;39:100085.

[DOI]
12. Masengo G, Zhang X, Dong R, Alhassan AB, Hamza K, Mudaheranwa E. Lower limb exoskeleton robot and its cooperative control: A review, trends, and challenges for future research. Front Neurorobot. 2023;16:913748.

[DOI]
13. Ruhrberg Estévez S, Mallah J, Kazieczko D, Tang C, Occhipinti LG. Deep learning for motion classification in ankle exoskeletons using surface EMG and IMU signals. Sci Rep. 2025;15:38242.

[DOI]
14. Golabchi A, Chao A, Tavakoli M. A systematic review of industrial exoskeletons for injury prevention: Efficacy evaluation metrics, target tasks, and supported body postures. Sensors. 2022;22(7):2714.

[DOI]
15. Hu Y, Wong Y, Wei W, Du Y, Kankanhalli M, Geng W. A novel attention-based hybrid CNN-RNN architecture for sEMG-based gesture recognition. PLoS One. 2018;13(10):e0206049.

[DOI]
16. Chen B, Chen C, Hu J, Nguyen T, Qi J, Yang B, et al. A real-time EMG-based fixed-bandwidth frequency-domain embedded system for robotic hand. Front Neurorobot. 2022;16:880073.

[DOI]
17. Okunola A, Afolabi A, Akanmu A, Jebelli H, Simikins S. Facilitators and barriers to the adoption of active back-support exoskeletons in the construction industry. J Saf Res. 2024;90:402-415.

[DOI]
18. Brunner A, van Sluijs R, Bartenbach V, Bee D, Kos M, Aryananda L, et al. Effect of a passive shoulder support exoskeleton on fatigue during working with arms over shoulder level. In: Tarnita D, Dumitru N, Pisla D, Carbone G, Geonea I, editors. New trends in medical and service robotics. Cham: Springer; 2023. p. 188-197.

[DOI]
19. Raghuraman RN, Srinivasan D. The effects of soft vs. rigid back-support exoskeletons on trunk dynamic stability and trunk-pelvis coordination in young and old adults during repetitive lifting. J Biomech. 2024;176:112348.

[DOI]
20. De Looze MP, Bosch T, Krause F, Stadler KS, O’Sullivan LW. Exoskeletons for industrial application and their potential effects on physical work load. Ergonomics. 2016;59(5):671-681.

[DOI]
21. Zhang X, Qu Y, Zhang G, Wang Z, Chen C, Xu X. Review of sEMG for exoskeleton robots: Motion intention recognition techniques and applications. Sensors. 2025;25(8):2448.

[DOI]
22. Sedighi P, Li X, Tavakoli M. EMG-based intention detection using deep learning for shared control in upper-limb assistive exoskeletons. IEEE Robot Autom Lett. 2024;9(1):41-48.

[DOI]
23. Fleischer C, Hommel G. A human: Exoskeleton interface utilizing electromyography. IEEE Trans Robot. 2008;24(4):872-882.

[DOI]
24. Bangaru SS, Wang C, Aghazadeh F. Automated and continuous fatigue monitoring in construction workers using forearm EMG and IMU wearable sensors and recurrent neural network. Sensors. 2022;22(24):9729.

[DOI]
25. Bangaru SS, Wang C, Busam SA, Aghazadeh F. ANN-based automated scaffold builder activity recognition through wearable EMG and IMU sensors. Autom Constr. 2021;126:103653.

[DOI]
26. Mudiyanselage SE, Nguyen PHD, Rajabi MS, Akhavian R. Automated workers’ ergonomic risk assessment in manual material handling using sEMG wearable sensors and machine learning. Electronics. 2021;10(20):2558.

[DOI]
27. Gong Q, Jiang X, Liu Y, Yu M, Hu Y. A flexible wireless sEMG system for wearable muscle strength and fatigue monitoring in real time. Adv Electron Mater. 2023;9(9):2200916.

[DOI]
28. Ling L, Wei L, Feng B, Lin Z, Jin L, Wang Y, et al. A lightweight multi-scale convolutional attention network for lower limb motion recognition with transfer learning. Biomed Signal Process Control. 2025;99:106803.

[DOI]
29. Chen J, Qiu J, Ahn C. Construction worker’s awkward posture recognition through supervised motion tensor decomposition. Autom Constr. 2017;77:67-81.

[DOI]
30. Sun X, Li X, Ren B, Chen J. Construction posture recognition with primitive joints extended planar normal vector quaternions. Autom Constr. 2024;161:105356.

[DOI]
31. Ferdousi A, Islam MJ, Ahmad S, Islam MR, Chowdhury MNH, Haque F, et al. Complicacy in electrode position shift and its solution in sEMG pattern recognition: A review. IEEE Sens J. 2025;25(14):26269-26288.

[DOI]
32. Zhao Y, Liu Z, Yu J, Jing S, Li H, López MB. HD-sEMG-CORE: An open-source hybrid network algorithm for efficient compression and accurate restoration of high-density surface electromyography signals. IEEE Sens J. 2025;25(3):5478-5490.

[DOI]
33. Ye Y, Shi Y, Srinivasan D, Du J. Sensation transfer for immersive exoskeleton motor training: Implications of haptics and viewpoints. Autom Constr. 2022;141:104411.

[DOI]
34. Li QX, Chan PPK, Zhou D, Fang Y, Liu H, Yeung DS. Improving robustness against electrode shift of sEMG based hand gesture recognition using online semi-supervised learning. In: 2016 International Conference on Machine Learning and Cybernetics (ICMLC); 2016 Jul 10-13; Jeju Island, South Korea. Piscataway: IEEE; 2016. p. 344-349.

[DOI]
35. Rolandino G, Zangrandi C, Vieira T, Cerone GL, Andrews B, FitzGerald JJ. HDE-array: Development and validation of a new dry electrode array design to acquire HD-sEMG for hand position estimation. IEEE Trans Neural Syst Rehabil Eng. 2024;32:4004-4013.

[DOI]
36. Bi L, Feleke AG, Guan C. A review on EMG-based motor intention prediction of continuous human upper limb motion for human-robot collaboration. Biomed Signal Process Control. 2019;51:113-127.

[DOI]
37. Bassani G, Avizzano CA, Filippeschi A. Deep learning algorithms for human activity recognition in manual material handling tasks. Sensors. 2025;25(21):6705.

[DOI]
38. Han Y, Tao Q. Lower limb movement recognition based on a hybrid deep learning model using surface electromyography. IEEE Access. 2025;13:91693-91705.

[DOI]

Copyright

© The Author(s) 2026. This is an Open Access article licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Publisher’s Note

Science Exploration remains a neutral stance on jurisdictional claims in published maps and institutional affiliations. The views expressed in this article are solely those of the author(s) and do not reflect the opinions of the Editors or the publisher.

Share And Cite

Science Exploration Style

Zhou Z, Yu Y. Surface electromyography-based action recognition towards exoskeleton control for construction tasks. J Build Des Environ. 2026;4:2025131. https://doi.org/10.70401/jbde.2026.0038

Copy completed.

Get citation

Share Link

copy

First Name:*

Please fill in the content.

Last Name:*

Please fill in the content.

Email:*

Please fill in the content.

Journal of Building Design and Environment

Surface electromyography-based action recognition towards exoskeleton control for construction tasks

Zihao Zhou

Yantao Yu

Abstract

Keywords

References

Copyright

Publisher’s Note

Share And Cite

Science Exploration Style

Download

Export Citation

Article Metrics

Article Updates

Related Articles

Contents

Science Exploration Style

Share Link

Subscribe

Journal of Building Design and Environment

Navigation

Follow us