Depth-supported video segmentation with Kinect sensor using GPU

Optical flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer (an eye or a camera) and the scene. In the context of depth supported video object segmentation with the Kinect sensor, prior work relied on the optical flow process to reduce the number of computationally demanding Metropolis iterations required to reach an equilibrium state. Accelerating the process of object segmentation was achieved by utilizing the Nvidia Graphics Processing Unit (GPU). In this study, we explore ways to restructure the segmentation flow in order to improve its throughput and quality.

Objectives

  • Develop and implement GPU-based solution of video object segmentation,

  • Solution shall combine both depth and color information derived directly from the Kinect sensor for segmenting video objects,

  • The GPU-based solution shall achieve the performance of at least 25 frames per second for a 320x256 video resolution using Nvidia GTX480 GPU or its equivalent,

  • The solution shall visually demonstrate the benefits of incorporating depth information to the segmentation process

Constraints and demonstration

  • Indoors the depth data obtained by the Kinect can suffer from the following side effects: multiple or glossy reflection, ambient light, light absorption by objects in the scene, object boundaries. Therefore, video is limited to be captured indoor and objects of interest moves only slowly in the video.

Outcomes

We evaluate the impact of several design options on performance, and present a method for video object segmentation that eliminates the need for the optical flow process without sacrificing the segmentation quality and throughput. We further improve the performance of our method based on two incremental changes. We first introduce a scaling factor for amplifying the bond between the two pixels in each frame and increasing the clarity of borderlines, which leads to reducing the number of required Metropolis iterations for the base Metropolis run to 5 from an initial value of 10, and for the relaxation Metropolis run to 10 from an initial value of 25. We then replace the Gaussian filter with the Bilateral filter before the image is passed on to the Metropolis algorithm. The complete CPU-GPU based segmentation process (starting with the capture of depth and color data from the Kinect sensor till the generation of the segmented frame) for 320x256 video sequences constantly operates around 34fps.

Demonstration

Version-1:

Video objet segmentation that eliminates the need for the optical flow process without sacrificing the segmentation quality and throughput. This solution operates at 25-27fps.

test 1: Segmentation for a scene including white box, white kettle, color tower blocks with white background
(a) with depth, (b) without depth

test 2: Segmentation for a scene including white box, white kettle, white plate with white background
(a) with depth, (b) without depth


test3: Segmentation for a scene including color tower with blue background
(a) with detph (b) without depth

 

Improvement-1: Improving the Throughput and Segmentation Quality through Bilateral Filtering

  • A scaling factor for amplifying the bond between the two pixels in each frame leads to reducing the number of required Metropolis iterations for the base Metropolis run to 5 from an initial value of 10, and for the relaxation Metropolis run to 10 from an initial value of 25.
  • Replacing the Gaussian filter with the Bilateral filter before the image is passed on to the Metropolis algorithm improves the quality of segmentation.
  • These two modifications helped increase the throughput from 25fps to 34fps.

Segmentation for a scene including color tower blocks, white kettle, white plate, with white background
(a) with depth -raw video -segmented video
(b) withouht depth -raw video, -segmented video

Improvement-2: Improving the Segmentation Quality through Depth based Coupling Matrice Computations on the GPU

We use Gaussian filter and default Metropolis iterations. Segmentation for a scene including color tower blocks, white kettle, white plate, with white background (segmented video)

Resources

  • Final Report (word / pdf )
  • Source code version-1 (tar.gz)
  • Source code version-2 (with bilatreal filter and rduced numberof metropolis iterations (tar.gz)