Reconstructing two hands from monocular RGB images is challenging due to frequent occlusion and mutual confusion. Existing methods mainly learn an entangled representation to encode two interacting hands, which are incredibly fragile to impaired interaction, such as truncated hands, separate hands, or external occlusion. This paper presents ACR (Attention Collaboration-based Regressor), which makes the first attempt to reconstruct hands in arbitrary scenarios. To achieve this, ACR explicitly mitigates interdependencies between hands and between parts by leveraging center and part-based attention for feature extraction. However, reducing the interdependence helps release the input constraint while weakening the mutual reasoning about reconstructing the interacting hands. Thus, ACR also learns cross-hand prior based on center attention to handle the interacting hands better. We evaluate our method on various types of hand reconstruction datasets. Our method significantly outperforms the best interacting-hand approaches on the InterHand2.6M dataset while yielding comparable performance with the state-of-the-art single-hand methods on the FreiHand dataset. More qualitative results on in-the-wild and hand-object interaction datasets and web images/videos further demonstrate the effectiveness of our approach for arbitrary hand reconstruction.
|
ACR takes a full-person image and uses a feature map encoder to extract hand-center maps, part segmentation maps, cross-hand prior maps, and parameter maps. Subsequently, the feature aggregator generates the final feature for the hand model regression based on these feature maps. |
@inproceedings{yu2023acr,
title = {ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction},
author = {Yu, Zhengdi and Huang, Shaoli and Chen, Fang and Breckon, Toby P. and Wang, Jue},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023}
}