Lab 1.2 - Tracking

Registration tracking

Overview

In this lab you’ll implement the Lucas-Kanade tracking algorithm. The goal of this algorithm is to minimize the sum of squared error between two regions, the template \(T(\mathbf{x})\), and the warped image \(I(\mathbf{W}(\mathbf{x} ; \mathbf{p}))\):

\begin{equation} \sum_{\mathbf{x}}[I(\mathbf{W}(\mathbf{x} ; \mathbf{p}))-T(\mathbf{x})]^2 \end{equation}

\(\mathbf{p}\) is the vector of parameters that defines our warp. The most basic of which would be a simple 2-dof translation in Equation 2. Higher order warps such as an affine warp, given by Equation 3, or homography warps, given by Equation 4, can also be applied.

\begin{equation} \mathrm{W}(\mathbf{x}, \mathbf{p})=\binom{\mathrm{x}+\mathrm{p}_1}{\mathrm{y}+\mathrm{p}_2}
\end{equation} \begin{equation} \mathrm{~W}(\mathbf{x}, \mathbf{p})=\binom{\mathrm{p}_1 \mathrm{x}+\mathrm{p}_3 \mathrm{y}+\mathrm{p}_5}{\mathrm{p}_2 \mathrm{x}+\mathrm{p}_4 \mathrm{y}+\mathrm{p}_6}
\end{equation} \begin{equation} \mathrm{~W}(\mathbf{x}, \mathbf{p})=\frac{1}{1+p_7 x+p_8 y}\binom{\mathrm{p}_1 \mathrm{x}+\mathrm{p}_3 \mathrm{y}+\mathrm{p}_5}{\mathrm{p}_2 \mathrm{x}+\mathrm{p}_4 \mathrm{y}+\mathrm{p}_6} \end{equation}

The user first defines the tracking region of interest, which becomes the template. We aim to find the warped region in subsequent frames that best match our template. When a suitable match is found, we plot our tracking region and then repeat on the next frame.

This zip file contains the sample videos you can use to complete this lab. However, you are encouraged to capture your own.

Lab Introduction Slides: found here.

Important: You can obtain 2 bonus marks for each section you finish and demo while present during the lab period we introduce the lab.


Prelab Questions (5%)

The link to the prelab questions can be found here. You will need your University of Alberta email for access; they are also available under the Canvas assignment. Due Tuesday, January 20 at 5pm.


Setup

First, download the lab code and report templates. These contain the structure for your report, and the file structure for your code. You can write your code however you wish within the files; however, please ensure that it is adequately commented, and that each part of the lab can be run by running the file corresponding to that question.

You will need to use your University of Alberta email to access the below templates.

Code template: can be found here.

Report template (same document as 1.1): can be found here.


1. 2-DoF SSD Tracker (60)

Create a function simple_tracker(roi, im0, im1, max_iterations, threshold) that takes in the roi and a pair of images and returns the coordinates of the updated tracking region.

a) Apply your tracker to a pair of images.

  • Outside of your function, display the first image and define a bounding box roi to track using cv2.selectROI().
    • We obtain the template from roi and im0. This can be updated for each new frame or left to be the original template from the first frame.
  • For subsequent frames, use your function to solve for \(\mathbf{u}=[u,v]^T\) on the roi of the next frame like in optical flow. Your roi is the patch.
    • Use \(\mathbf{u}\) to update the tracked region using a 2 DoF transformation given by Equation 2.
      • Repeatedly solve for \(\mathbf{u}\) and update the tracked region until norm(u_k)/norm(u_{k-1}) < threshold or k > max_iterations, where k is the number of iterations (you can define your own end condition, this is just what we suggest).
    • Your function should return the new roi. Update and display the this bounding box overlaid onto im1 (see cv2.rectangle for plotting).
    • Note: We can choose to update the template every frame or we can keep the template we obtain from the first frame for all subsequent comparisons.

Deliverables:

  • Display an image of just the ROI and one of the full image with the updated bounding box overlaid.

Report Question 1a: Why do we have to iteratively warp our tracked region instead of just solve for \(\mathbf{u}\) once and update our bounding box?

b) Create a live implementation of your tracker.

  • Compare the results of using the same template on the first image for all subsequent frames vs updating the template for every new roi we find.

Report Question 1b: What is one benefit and drawback of updating the template every frame?

c) Apply your tracker to a video sequence that you recorded.

Deliverables:

  • Save your video with the overlaid tracking region for your submission. Save in /media of your code submission. Ensure the video is less than 10 mb.

Report Question 1c: In what cases does the tracker perform well? In what cases does the tracker perform poorly? Name one type of image processing that may improve the performance of your tracker.


Important

  • Undergraduate students: choose one of the following two questions to complete for this lab to obtain the remaining 40 marks. The other question is a bonus.

  • Grad students: complete the following two questions to obtain the remaining 60 marks.


2. Pyramidal 2-DoF SSD Tracker

See slides 57-64 of this slide deck.

a) Implement Gaussian Pyramid Tracking

Create a function pyramidal_gaussian_tracker(roi, im0, im1, levels=4, scale=2) that performs 2-DoF SSD tracking with pyramidal gaussian downsampling. You may call simple_tracker() within this function if you wish.

  • The gaussian pyramid has levels layers, with each layer having a scale factor resolution of the layer above.
  • See cv2.pyrDown() for gaussian downsampling.
  • We obtain the template from roi and im0. This can be updated for each new frame or left to be the original template from the first frame.
  • For each frame start with the top of the pyramid (coarsest) and solve for the new region. Propogate the result to the more detailed layer below and repeat for all layers.
  • Ensure the coordinates of the updated region match the scale factor of the layer below (i.e. multiply the translation by scale)
  • Update the tracked region with the bounding box of the bottom layer (the layer with no downsampling) and repeat for all frames.

b) Implement Laplacian Pyramid Tracking

Create a function pyramidal_laplacian_tracker(roi, im0, im1, levels=4, scale=2) that performs 2-DoF SSD tracking a laplacian pyramid. You may call simple_tracker() within this function if you wish.

  • First, compute a gaussian pyramid as in part a). Then, starting at the top (smallest) layer, compute the laplacian pyramid layers using the gaussian pyramid and upsampling. Note that the top layer of the laplacian pyramid should simply be the top layer of the gaussian pyramid.
  • See cv2.pyrUp() to upsample your gaussian pyramid levels.
  • Follow the same process as before, starting at the top of the pyramid, solving for the new region, and propagating the result to the layer below.
  • As before, ensure the updated region coordinates matches the scale factor of the layer below.
  • Update the tracked region with the bounding box of the bottom (largest size) layer.

Apply both pyramidal trackers to two videos that you record, one with camera motion and one with object motion.

Deliverables:

  • Save the videos with the overlaid tracking region for your submission. Save in /media of your code submission. Ensure the video is less than 10 mb.

Report Question 2a: Name one advantage and one disadvantage of implementing pyramidal downsampling to our tracker.

Report Question 2b: How do the number of pyramid levels and the chosen downsampling factor affect the trade-off between computational speed and accuracy?

Report Question 2c: Why do we use a coarse-to-fine strategy in pyramidal tracking, and are there any cases where a fine-to-coarse approach might be beneficial?

Report Question 2d: What is one advantage and disadvantage of using a laplacian pyramid rather than a gaussian pyramid? Think about the features the laplacian pyramid focuses on, and how these might affect tracking.


3. High-DoF Tracker

Create a function highdof_tracker(img0, img1, roi, max_iterations, threshold) that warps the bounding box using 4 (x, y, rotation, scale) or warp the bounding box using 6 (affine) parameters \(\mathbf{p}\) for 10 additional bonus marks.

To do this, you may implement either the additive or inverse compositional version of the Lukas-Kanade algorithm. We recommend the additive algorithm as it is simpler. Information about the difference between these can be found in the Baker and Matthews paper and in the lecture slides.

Apply your tracker to two videos that you record, one with camera motion and one with object motion.

Deliverables:

  • Save the videos with overlaid tracking regions to include with your submission. Save in /media of your code submission. Ensure they are less than 10 mb each.

Report Question 3: Name one advantage and one disadvantage of using higher order warps for our tracker.

Report Bonus (2): Explain how the inverse compositional Lukas-Kanade algorithm differs from the additive algorithm, and discuss the benefits of the inverse compositional algorithm.


Tracker Competition 🏆:

Use the tracker you developed in this lab to track a selected region of interest (ROI) in a sample video. The target object is a cereal box, and your goal is to accurately track the specified ROI throughout the video.

The video to track is the nl_cereal_s3 video in the sample videos folder.

See the top of this lab description for the formulas for the warps discussed below.

Rules

  • Originality: You must use your own implementation. Importing prebuilt tracking libraries (e.g., OpenCV trackers) is not allowed.
  • Scope: Extensions to the tracker must align with the principles and methods discussed in this lab. If you are unsure about your method, discuss with a TA.
  • Submission: Submit your code, and a short write up on how you refined your tracker and what you learned in doing so. In your report label this section Tracker Competition

Evaluation

  1. Accuracy: Tracking performance will be quantitatively evaluated.
    • The tracker will be scored based on how closely the tracked ROI aligns with the ground truth.
    • You have the following options for initializing your tracker:
      • The 4 corners of the initial bounding box (note this is a warped rectangle): (38.00, 323.00), (202.00, 308.00), (216.00, 517.00), (54.00, 540.00).
      • Use an initial set of affine warp parameters for this box, with width=164, height=213, are as follows: [p1, p2, p3, p4, p5, p6] = [1, -9.14634146e-02, 7.51173709e-02, 1.01877934e+00, 3.8000000e+01, 3.23000000e+02]. (Note that this will be slightly wrong from the original ground truth, as the ground truth is a homography; however, this warp is very close.)
      • An initial homography warp for this box, with width=164, height=213, is as follows: [p1, p2, p3, p4, p5, p6, p7, p8] = [1.04836019e+00, -1.77260986e-02, 7.73870451e-02, 1.04147608e+00, 3.80000000e+01, 3.23000000e+02, 2.39406870e-04, 4.20310040e-05]
      • Otherwise, use if you need a normal rectangle, use this rectangle as input (x,y,w,h): (38, 323, 161, 213)
  2. Bonus Marks:
    • First Place: 10% bonus marks.
    • Last Place: 1% bonus marks.
    • Bonus marks will be distributed evenly among all participants based on performance ranking.
  3. Trophy: The winner will also receive a 3D-printed trophy!

Competition Submission

  • Provide a .txt file with your best tracking results in the following format:
    frame	ulx	uly	urx	ury	lrx	lry	llx	lly
    frame00001.jpg	32.00	313.00	202.00	308.00	316.00	517.00	54.00	540.00
    frame00002.jpg	32.02	312.99	202.02	307.99	316.02	516.99	54.04	539.99
    frame00003.jpg	32.03	313.00	202.03	307.99	316.03	517.00	54.03	539.99
    
    • Each row should contain:
      • frame: Frame name (e.g., frame00001.jpg).
      • ulx, uly: Upper-left corner coordinates.
      • urx, ury: Upper-right corner coordinates.
      • lrx, lry: Lower-right corner coordinates.
      • llx, lly: Lower-left corner coordinates.
    • Name the file tracking_results_CCID.txt.

Submission Details

  • Include accompanying code used to complete each question. Ensure they are adequately commented.
  • Ensure all functions are and sections are clearly labeled in your report to match the tasks and deliverables outlined in the lab.
  • Organize files as follows:
    • code/ folder containing all scripts used in the assignment.
    • media/ folder for images, videos, and results.
  • Final submission format: a single zip file named CompVisW26_lab1.2_lastname_firstname.zip containing the above structure.
  • Your combined report for Lab 1.1 and 1.2 is due shortly after (see calendar for details). The report contains all media, results, and answers as specified in the instructions above. Ensure your answers are concise and directly address the questions.
  • Total marks for this lab is 100 for undergraduate students and 120 for graduate students. Your lab assignment grade with bonus marks is capped at 110%. Report bonus marks will be applied to the report grade, capped at 110%.

Good luck, and happy tracking! 🚀