Proxemics-net++: classification of human interactions in still images

This project introduces Proxemics-net++, a method capable of addressing the problem of social interaction recognition (HIR) in images from 2 approaches: 1) classification of physical interactions (proxemics) and 2) of social interactions between couples. To do so, it combines both RGB and body pose information of the couples.

Abstract

Human interaction recognition (HIR) is a significant challenge in computer vision that focuses on identifying human interactions in images and videos. HIR presents a great complexity due to factors such as pose diversity, varying scene conditions, or the presence of multiple individuals. Recent research has explored different approaches to address it, with an increasing emphasis on human pose estimation. In this work, we propose Proxemics-Net++, an extension of the Proxemics-Net model, capable of addressing the problem of recognizing human interactions in images through two different tasks: the identification of the types of “touch codes” or proxemics and the identification of the type of social relationship between pairs. To achieve this, we use RGB and body pose information together with the state-of-the-art deep learning architecture, ConvNeXt, as the backbone. We performed an ablative analysis to understand how the combination of RGB and body pose information affects these two tasks. Experimental results show that body pose information contributes significantly to proxemic recognition (first task) as it allows to improve the existing state of the art, while its contribution in the classification of social relations (second task) is limited due to the ambiguity of labelling in this problem, resulting in RGB information being more influential in this task.

Figure 1: Examples of human-human interactions. These images illustrate the great complexity inherent in the problem of recognizing human interactions in images. The images in (a) highlight situations where it is confusing to determine the type of physical contact (hand-elbow, hand-shoulder, elbow-shoulder, etc.) due to clothing and partial occlusion. In (b), the images show ambiguity in determining the type of social relationship between individuals (family, friends, co-workers, etc.) without additional context.

Code

To support the research community and encourage exploration, we have provided access to our code, instructions and demo. If you found it interesting you can try the code available in our GitHub as well as test our models in the Google Colab Demo we have prepared for you.

Citing

If you use this library in your research, you must cite:

  1. Jiménez-Velasco, I., Zafra-Palma, J., Muñoz-Salinas, R., Marín-Jiménez, M.J. Proxemics-net++: classification of human interactions in still images. Pattern Anal Applic 27, 49 (2024). 10.1007/s10044-024-01270-3. 
  2. Jiménez-Velasco, I., Muñoz-Salinas, R., & Marín-Jiménez, M. J. Proxemics-Net: Automatic Proxemics Recognition in Images. En IbPRIA 2023: Pattern Recogn. Image Anal. (pp. 402-413). 10.1007/978-3-031-36616-1_32 

Contact

If you have any further questions, please contact isajimenez@uco.es.