TY - GEN
T1 - Accelerating a computer vision algorithm on a mobile SoC using CPU-GPU co-processing - A case study on face detection
AU - Lee, Youngwan
AU - Jang, Cheolyong
AU - Kim, Hakil
PY - 2016/5/14
Y1 - 2016/5/14
N2 - Recently, mobile devices have become equipped with sophisticated hardware components such as a heterogeneous multi-core SoC that consists of a CPU, GPU, and DSP. This provides opportunities to realize computationally-intensive computer vision applications using General Purpose GPU (GPGPU) programming tools such as Open Graphics Library for Embedded System (OpenGL ES) and Open Computing Language (OpenCL). As a case study, the aim of this research was to accelerate the Viola-Jones face detection algorithm which is computationally expensive and limited in use on mobile devices due to irregular memory access and imbalanced workloads resulting in low performance regarding the processing time. To solve the above challenges, the proposed method of this study adapted CPU-GPU task parallelism, sliding window parallelism, scale image parallelism, dynamic allocation of threads, and local memory optimization to improve the computational time. The experimental results show that the proposed method achieved a 3.3~6.29 times increased computational time compared to the well-optimized OpenCV implementation on a CPU. The proposed method can be adapted to other applications using mobile GPUs and CPUs. Copyright is held by the owner/author(s).
AB - Recently, mobile devices have become equipped with sophisticated hardware components such as a heterogeneous multi-core SoC that consists of a CPU, GPU, and DSP. This provides opportunities to realize computationally-intensive computer vision applications using General Purpose GPU (GPGPU) programming tools such as Open Graphics Library for Embedded System (OpenGL ES) and Open Computing Language (OpenCL). As a case study, the aim of this research was to accelerate the Viola-Jones face detection algorithm which is computationally expensive and limited in use on mobile devices due to irregular memory access and imbalanced workloads resulting in low performance regarding the processing time. To solve the above challenges, the proposed method of this study adapted CPU-GPU task parallelism, sliding window parallelism, scale image parallelism, dynamic allocation of threads, and local memory optimization to improve the computational time. The experimental results show that the proposed method achieved a 3.3~6.29 times increased computational time compared to the well-optimized OpenCV implementation on a CPU. The proposed method can be adapted to other applications using mobile GPUs and CPUs. Copyright is held by the owner/author(s).
KW - CPU-GPU co-processing
KW - Computer vision
KW - Mobile GPGPU
KW - OpenCL
KW - OpenGL ES 2.0
UR - http://www.scopus.com/inward/record.url?scp=84983498962&partnerID=8YFLogxK
U2 - 10.1145/2897073.2897081
DO - 10.1145/2897073.2897081
M3 - Conference contribution
AN - SCOPUS:84983498962
T3 - Proceedings - International Conference on Mobile Software Engineering and Systems, MOBILESoft 2016
SP - 70
EP - 76
BT - Proceedings - International Conference on Mobile Software Engineering and Systems, MOBILESoft 2016
PB - Association for Computing Machinery, Inc
T2 - IEEE/ACM International Conference on Mobile Software Engineering and Systems, MobileSoft 2016
Y2 - 16 May 2016 through 17 May 2016
ER -