Elsevier

Applied Ergonomics

Volume 80, October 2019, Pages 75-88
Applied Ergonomics

RGB-D ergonomic assessment system of adopted working postures

https://doi.org/10.1016/j.apergo.2019.05.004Get rights and content

Highlights

  • This paper proposes a vision-based semi-automated RULA ergonomic posture assessment system using deep learning techniques.

  • The proposed method analyzes the posture holistically and estimates body joint angles directly from a single depth image.

  • We presented a novel inverse kinematic modeling process to obtain ground truth joint angles from motion capture data.

  • This allows training learning algorithms to estimate joint angles from input images acquired using different modalities.

  • The developed system supports Kinect and ASUS Xtion depth cameras and does not rely on skeleton data from the Kinect SDK.

  • The holistic posture analysis approach ensures robustness to different forms of occlusions.

Abstract

Ensuring a healthier working environment is of utmost importance for companies and global health organizations. In manufacturing plants, the ergonomic assessment of adopted working postures is indispensable to avoid risk factors of work-related musculoskeletal disorders. This process receives high research interest and requires extracting plausible postural information as a preliminary step. This paper presents a semi-automated end-to-end ergonomic assessment system of adopted working postures. The proposed system analyzes the human posture holistically, does not rely on any attached markers, uses low cost depth technologies and leverages the state-of-the-art deep learning techniques. In particular, we train a deep convolutional neural network to analyze the articulated posture and predict body joint angles from a single depth image. The proposed method relies on learning from synthetic training images to allow simulating several physical tasks, different body shapes and rendering parameters and obtaining a highly generalizable model. The corresponding ground truth joint angles have been generated using a novel inverse kinematics modeling stage. We validated the proposed system in real environments and achieved a joint angle mean absolute error (MAE) of 3.19±1.57 and a rapid upper limb assessment (RULA) grand score prediction accuracy of 89% with Kappa index of 0.71 which means substantial agreement with reference scores. This work facilities evaluating several ergonomic assessment metrics as it provides direct access to necessary postural information overcoming the need for computationally expensive post-processing operations.

Introduction

Musculoskeletal disorders (MSDs) are a common concern across labor intensive industries. A recent statistical study performed by the Bureau of Labor Statistics (BLS) demonstrated that MSD cases account for 31% of all work-related injuries and illness cases (Bureau of Labor Statistic, 2016). These injuries are most commonly in relation to the muscular components of the Neck, Back, Arms and Legs (Luttmann et al. Organizationet al.). In addition to the personal impact these injuries can have on workers, compensation costs and days-away-from-work can greatly effect the productivity of the organization it self (Bureau of Labor Statistic, 2016; Bernard and Putz-Anderson). The manufacturing industries endeavor to constantly provide a safe working environment via the early identification and intervention of problematic procedures. Currently, proactive task planning using digital human models and virtual facilities are helping minimize risk factors of MSDs, however to ensure the maintenance of harm minimization, advancements in injury prevention technology must continue to be implemented. This is due to the complex interactions between force and frequency during automotive assembly tasks. Adopting ergonomically invalid or awkward working postures while performing these manual tasks have the potential to cause long term MSDs (Krüger and Nguyen, 2015; Bernard and Putz-Anderson; Luttmann et al. Organizationet al.). Therefore, ergonomics specialists have been investigating methods and tools to evaluate the adopted working posture and identify potential MSDs risks.

The Rapid Upper Limb Assessment (RULA) (McAtamney and Corlett, 1993) is one of the most popular ergonomic assessment tools in the industry (Plantard et al. Multon; Liebregts et al., 2016). Despite its limitations and low resolution capabilities to problematic working procedures, RULA is simple, easy to compute and does not require prior knowledge in biomechanics or ergonomics. The RULA score quantifies the exposure of the adopted posture to risk factors of MSDs with more focus on the neck, trunk and upper body limbs. It ranges from one to seven representing the level of MSD risk and suggesting an action level that describes whether a method of intervention is required (McAtamney and Corlett, 1993), with one being an acceptable posture and seven requiring immediate intervention. Automating the RULA score evaluation process has gained much attention from ergonomics researchers (Manghisi et al. Monno; Plantard et al., 2017) to overcome the intra- and inter-rater variability problem (Manghisi et al., 2017). However, developing an automated RULA based ergonomic feedback system requires estimating joint angles of the upper body parts.

Recent studies have proposed automating ergonomic assessment methods relying on computer vision and machine learning techniques (Diego-Mas and Alcaide-Marzal, 2014). In particular, the Kinect camera alongside its software development kit (SDK) have been extensively used to analyze the adopted posture and evaluate the RULA score (Plantard et al., 2015; Liebregts et al., 2016; Plantard et al., 2017; Manghisi et al., 2017; Abobakr et al., 2017a). The Kinect SDK tracks the human body and estimates the 3D Cartesian coordinates of 20 joint positions. It uses a random decision forest classifier to segment the body into parts followed by a localization algorithm to infer joint positions (Abobakr et al., 2017a). However, there are several difficulties resulting from using the Kinect SDK. First, it relies on local body part detectors, and hence may produce unrealistic skeletons in cases of occlusions due to cluttered environments (Abobakr et al., 2018; Plantard et al., 2017). Also, the Kinect SDK has a difficulty in tracking self-occluded postures that have arms crossing, trunk bending, trunk lateral flexion and trunk rotation (Manghisi et al., 2017). This requires applying preprocessing operations to correct the resulting kinematic structure as suggested in (Plantard et al., 2015; Plantard et al. Multon). Second, an additional processing stage is required to convert 3D Cartesian coordinates of body joint positions into joint angles. For instance, Plantard et al. (Plantard et al., 2017) corrected Kinect data using the method presented in (Plantard et al., 2017) and estimated missing anatomical landmarks using the approach proposed in (Bonnechere et al., 2014), to make the reconstructed skeleton compatible with the ISB recommendations (Wu et al., 2005) and compute the joint angles. Clark et al. (2012) used the inverse tangent method to convert 3D joint positions into joint angles. Although these approaches have been successful in obtaining joint angles of high quality, they may exhibit large errors from relying on the Kinect skeleton data especially in cases of occluded postures (Plantard et al., 2017; Manghisi et al., 2017). Improving the quality of the Kinect skeleton data for ergonomic studies is an open area of research (Plantard et al., 2017). This work focuses more on addressing limitations of the Kinect V1 sensor, as it uses the structured light technology which has been incorporated in a wide range of depth sensors (Abobakr et al., 2018). This allows better generalization to different depth cameras, for instance ASUS Xtion. The Kinect V2, on the other hand, uses the time of flight imaging technology which helps produce more robust skeleton data, however, it consumes more power and requires cooling (Fankhauser et al., 2015).

In this paper, we propose a skeleton-free holistic posture analysis system that accurately predicts body joint angles from a single depth image without utilizing the temporal information between subsequent images, as shown in Fig. 1. Although incorporating a temporal dynamics modeling stage can help ensure consistency of subsequent frame predictions and achieve higher frame rates, tracking algorithms require regular initialization to avoid leading to drift anomalies (Shotton et al., 2013). The fundamental building block of the proposed method is a cascade of two deep convolutional neural network (ConvNet) models. The depth sensor produces two synchronized video feeds of RGB and depth images. First, we segment the body from the background via passing the RGB image to an object instance segmentation deep ConvNet model. This network computes segmentation masks for a predefined set of objects in a given scene. We apply the obtained person's segmentation mask to the corresponding depth image. Second, depth values of the posture are encoded using a proposed depth encoding algorithm. Third, the encoded image is passed through the second ConvNet model to predict body joint angles. Finally, the estimated joint angles are used to compute the RULA score. Thus, we simplify the overall ergonomic evaluation procedure to be as simple as mapping directly predicted joint angles into a RULA score. Using this score, the MSD risk level is identified and a recommended action is suggested to decrease the risk of work-related injuries as defined in (McAtamney and Corlett, 1993). This is made possible via training our models on a large amount of highly varied synthetic training images with ground truth joint angles that have been biomechanically modeled using a novel inverse kinematics step.

The remainder of this paper is structured as follows. Section 2 describes the proposed method and the used deep ConvNet models. Section 3 presents the experiments and results. Key aspects and limitations of the proposed method are discussed in Section 4. Section 5 highlights the conclusion and future work.

Section snippets

Material and methods

We propose a vision based ergonomic posture assessment system composed of two cascaded ConvNets; an object instance segmentation network and a holistic posture analysis network. We utilize both the RGB video and depth feeds of a low cost depth sensor. The input feeds are synchronized which means that each RGB image has an associated depth image. In particular, we employed the segmentation network to detect and segment the person from an input RGB image and reject other background objects. This

Results

We have trained a deep ConvNet model on a synthetic training dataset to predict 15 body joint angles from a single depth image. The estimated joint angles are then used to compute the RULA score for the adopted posture. In this section, we evaluate the performance of the proposed method and explore the generalization capabilities on a real test dataset. Table 4 provides a detailed description of the synthetic and real datasets used in training and evaluating the proposed system. We also examine

Discussion

This paper proposed a semi-automated ergonomic assessment system of adopted working postures. The proposed method analyzes the posture holistically and estimates body joint angles directly from a single depth and RGB image pair. Hence, we do not exploit temporal dependencies or skeleton data from the Kinect SDK. The estimated joint angles are used to compute the RULA score, the MSD risk level and subsequently the urgency of intervention required to reduce the risk of injury. The RULA score is

Conclusions

This paper proposed a semi-automated holistic ergonomic posture assessment system. It is composed of an instance segmentation model that detects and segments the person in the scene and a deep convolutional neural network that we trained to estimate body joint angles directly from a single depth image. The joint angles prediction model is trained on synthetic depth images. This allows simulating a wide range of manual tasks performed by workers of different body shapes and sizes from several

Acknowledgement

This research was supported by the Institute for Intelligent Systems Research and Innovation (IISRI) at Deakin University, Australia. The project was funded via Ford's university research program (URP 2014-4055R), Ford Motor Co., USA. The data used in this project was obtained from the CMU graphics lab motion capture database, mocap.cs.cmu.edu. The database was created with funding from NSF EIA-0196217, USA.

References (56)

  • M.E. Raabe et al.

    An investigation of jogging biomechanics using the full-body lumbar spine model: model development and validation

    J. Biomech.

    (2016)
  • J.A. Reinbolt et al.

    Simulation of human movement: applications using opensim

    Procedia IUTAM

    (2011)
  • A. Seth et al.

    Opensim: a musculoskeletal modeling and simulation framework for in silico investigations and exchange

    Procedia Iutam

    (2011)
  • N. Vignais et al.

    Innovative system for real-time ergonomic feedback in industrial manufacturing

    Appl. Ergon.

    (2013)
  • N. Vignais et al.

    Physical risk factors identification based on body sensor network combined to videotaping

    Appl. Ergon.

    (2017)
  • E. Weston et al.

    A biomechanical and physiological study of office seat and tablet device interaction

    Appl. Ergon.

    (2017)
  • G. Wu et al.

    ISB recommendation on definitions of joint coordinate systems of various joints for the reporting of human joint motion-part ii: shoulder, elbow, wrist and hand

    J. Biomech.

    (2005)
  • W. Wu et al.

    Subject-specific musculoskeletal modeling in the evaluation of shoulder muscle and joint function

    J. Biomech.

    (2016)
  • A. Abobakr et al.

    Body joints regression using deep convolutional neural networks

  • A. Abobakr et al.

    A kinect-based workplace postural analysis system using deep residual networks

  • A. Abobakr et al.

    Rgb-d human posture analysis for ergonomie studies using deep convolutional neural network

  • A. Abobakr et al.

    A skeleton-free fall detection system from depth images using random decision forest

    IEEE Syst. J.

    (2018)
  • B.P. Bernard et al.

    Musculoskeletal Disorders and Workplace Factors; a Critical Review of Epidemiologic Evidence for Work-Related Musculoskeletal Disorders of the Neck, Upper Extremity, and Low Back

    (1997)
  • Nonfatal Occupational Injuries and Illnesses Resulting in Days Away from Work in 2015

    (2016)
  • J. Cohen

    A coefficient of agreement for nominal scales

    Educ. Psychol. Meas.

    (1960)
  • C. Couprie, C. Farabet, L. Najman, Y. LeCun, Indoor Semantic Segmentation Using Depth Information, arXiv preprint...
  • S.L. Delp et al.

    An interactive graphics-based model of the lower extremity to study orthopaedic surgical procedures

    IEEE Trans. Biomed. Eng.

    (1990)
  • S.L. Delp et al.

    Opensim: open-source software to create and analyze dynamic simulations of movement

    IEEE Trans. Biomed. Eng.

    (2007)
  • Cited by (52)

    • Simple method integrating OpenPose and RGB-D camera for identifying 3D body landmark locations in various postures

      2022, International Journal of Industrial Ergonomics
      Citation Excerpt :

      This study proposes an alternative motion tracking method to localize 3D body landmark locations in a simple manner when postural assessments in the workplace are performed. Musculoskeletal disorders (MSDs) are problems commonly experienced by workers across industries (Abobakr et al., 2019; Joshi and Deshpande, 2020). Improper posture is known to be one of the risk factors for developing work-related MSDs.

    • Ergonomic assessment of office worker postures using 3D automated joint angle assessment

      2022, Advanced Engineering Informatics
      Citation Excerpt :

      Finally, conclusions are presented in Section 6. Camera-based ergonomic assessments can be performed either in 2D using regular cameras [47–49] or in 3D using RGB-D cameras, such as ranged cameras [50] and Kinect cameras [44,51]. Previous studies investigated the application of camera-based ergonomic assessments on workers in different settings such as industrial [52–54], manufacturing [27,55,56], assembly [30,57], and construction [58,59].

    View all citing articles on Scopus
    View full text