Construction of any civil and structural work involves both workers and equipment interacting with each other in a synchronized pattern to accomplish a certain task. Planning, delegating and coordinating various activities in the site among the various work groups is important as it enables the comprehension of jobsite context, which in turn allows the interpretation of worker intentions, the prediction of their movements, and the detection of inappropriate interactions that are counterproductive and may cause disastrous consequences such as struck-by accidents. Therefore, recognizing construction activities and involved working groups is critical to enhancing construction safety and improving productivity. At present, majority of the existing studies use videos that only contain one activity with involved entities and rely solely on the spatial-temporal relationship among entities. However, as mentioned earlier, many workers and machines co-exist and collaborate to accomplish different activities, and not all of them are relevant to the same activity, even though they are spatially close.
There is still a critical knowledge gap, i.e. methods are needed to identify working groups and recognize corresponding activities using images/videos that contain many entities collaborating on various tasks. On this account, researchers from the Lyles School of Civil Engineering at Purdue University: Jiannan Cai (Ph.D. candidate), Yuxi Zhang (Ph.D. student), led by Professor Hubo Cai, proposed a two-step deep learning-based classification approach – working group identification followed by activity recognition, leveraging both positional and attentional cues, to recognize complex interactions and their involved entities from videos that contain different activities with multiple entities. Their work is currently published in the research journal, Automation in Construction.
In their work, spatial and attentional states of individual entities were represented numerically, and the corresponding positional and attentional cues between two entities computed. The researchers then designed the long short-term memory (LSTM) networks so as to: first, classify whether two entities belonging to the same group, and second, recognize the activities they were involved in. In addition, the newly created method was validated using two sets of construction videos.
The authors reported that the proposed framework achieved over 95% accuracy in correctly identifying the working groups and recognizing the activities. In fact, the performance obtained by integrating positional and attentional cues was much higher than that obtained using positional cues alone. Better still, dividing the group activity recognition task into a two-step cascading process yielded better performance than simply conducting a one-step activity recognition.
In summary, the study by Professor Hubo Cai and his research team presented a novel two-step deep learning-based approach that integrates positional and attentional cues to identify construction working groups and recognize corresponding group activities. Also, by identifying working groups before recognizing activities, the group-irrelevant entities are effectively excluded, which leads to improved performance of group activity recognition. In a statement to Advances in Engineering, Professor Hubo Cai highlighted that the presented approach was nearly accurate in that by leveraging both positional and attentional cues, the accuracy was reported to increase from 85% to 95% compared with cases using positional cues alone, hence improved performance.
Jiannan Cai, Yuxi Zhang, Hubo Cai. Two-step long short-term memory method for identifying construction activities through positional and attentional cues. Automation in Construction, volume 106 (2019) 102886.