Lab 2: 2D Video tracking in Matlab

428/615, Martin Jagersand
Submit code and report electronically through the upload link on the e-Class course webpage.
Marks may be given for any of the questions in the assignment, so please answer them all
(many only require a one sentence answer).
Demo: In your lab session.

Exercise 1 (5): Translational x-y tracking

Before you start coding yourself, experiment with the already available trackers in MTF to get an idea of how translational trackers should perform. Refer to the lab tutorial page for more details about using mexMTF for tracking.
Implement a basic translational SSD tracker in matlab. Use your x-y motion optical flow code as a starting point. For each new frame (image), iteratively update the motion vector until the norm of the update is below a certain fraction of its norm in the previous iteration. (You should also set a maximum number of iterations). Display the tracker's position using plot.

Capture a sequence of images

Let the user initialize the tracker in the first frame using getrect (drawrectangle is better, but is only available as of MATLAB R2018b)

For subsequent frames

Iteratively compute motion for the tracked region (x-y translation)

Use it to update the position of the tracker

Display the current position of the tracker

Experiment with the tracker on live video from a camera. (After updating the tracker for the current frame, capture the next frame without delay.) When does it perform well/poorly? What type of image processing operation might improve performance? What would be the advantages/disadvantages of using other warps (for example affine warps)?

Advanced exercises

Undergraduate students must choose and complete one out of the following exercises for 5 marks. Graduate students must choose and complete two exercises for 5 marks. Completing more exercises than required will give you a bonus of up to 2 marks in total.

Exercise 2: Pyramidal tracking

Implement a pyramidal version of the SSD tracker as described in lecture 4 slides 28 and 29.

Construct a Gaussian pyramid with a user specified no. of levels. Each level of the pyramid is constructed by applying Gaussian filtering and downsampling to the level below it, where the original image forms the bottom most level. The resize factor between adjacent levels will also be specified by the user.
Obtain the object motion for each level of the pyramid going from the top (coarsest) to the bottom (finest) level.
Propagate the result for each level to the one below it and use the resultant location as the starting point of the iterations for that level.
Final location of the object is given by the result in the bottom most level of the pyramid corresponding to the original image.

Exercise 3: High dimensional registration tracking

Implement a higher-dimensional tracker. Parameterize motion using either 4 (x,y,rotation,scale) or 6 (affine) parameters; You may do both if you so choose, but it is not required. For higher DOF, the forward and inverse methods will be different in the warps and Jacobians. The Baker and Matthews paper explains this. How will tracking window size affect the results? Try both camera motion and object motion sequences. Compare results.
It is likely easiest to add one dimension at a time, e.g. do x,y,rot first, then x,y,rot, scale etc.
You need a warp function to align template and pixels. You can write one or try to find one. The matlabdirs in the course dir has some:
    ~vis/matlabdirs/texturemapping
     BiLinImInt.m
    TriangImInt.m
    
There are also a number of OpenGL functions in that directory doing similar texture warps. You will need to compile OpenGL and glut with Matlab (does not work for all Matlab versions).

You can also try functions built into MATLAB, such as imwarp. For imwarp, you may need to use the transpose of your transformation matrix; Test it first to see.

Exercise 4: Learning-based Tracking

The intensity-based tracking finds the parameters of the transformation, for example a homography, that, all the pixels of a template I0 are warped to their corresponding positions in the current frame It, under the assumption of image consistency.

The prediction of such transformation can be treated as an approximation problem, and as such, This can be done by minimizing the length of the intensity error vector between the template patch and the warped current patch.

In the previous exercises, you have implemented the analytical approach which relies on the computation of the Jacobian. In this exercise, you are implementing another type of solutions when the Jacobian is not analytically computed, but learned from a set of training samples. The goal here is to fit the parameters from a large set of synthetically generated training set.

Implement any of the two approaches, described in:

F. Jurie and M. Dhome. Hyperplane approximation for template matching. 2002
D. Travis, C. Perez, A. Shademan and M. Jagersand, Realtime Registration-Based Tracking via Approximate Nearest Neighbour Search, 2013

This exercise counts as two, should you choose to complete both hyperplane and nearest neighbor tracker.

A quick description over a family of those learning-based tracking methods can be found in the lecture slides: Lec05eJurieDhomeTrack.pptx

A quick guideline for your implementation:

Write a function, “sample_region”, that takes a region defined by four corners coordinates, and sample a template image from the region. This can be done by warping the four-corners region into a rectangle region defined around origin using DLT transformation. Check Travis et al or Jurie et al for more details over how the regions are sampled. A bonus of 1 will be given here if you successfully finish this with four corners coordinates. Use simple rectangle to sample region if you are on a schedule.

Write a function, “synthesis”, that takes the input region and the coordinates of the rectangle, and randomly warps the region with small transformations. One of the good sampling strategies here is to follow up the strategy in Travis et al and use the Gaussian distributions with small parameters. Check the Section V of the paper for more details (the parameters used in nearest neighbor paper is 8 DOFs). Start with small 2 DOFs (translation), then move to 4 DOFs (rotation + scaling), 6 DOFs (affine), and 8 DOFs (homography).

Implement the learning procedure for either Hyperplane tracker, or Nearest Neighbor tracker. (Implementing both will count as 2 exercises) Check either of the papers for details in algorithm. Implement the procedure as “learn”.

Implement the incremental updating procedure as the function “update”. Again, check either of the paper in detail to understand how the learning and updating are done.

The main difficulty with this family of tracking is choosing the good samples of geometric perturbations used for training. Try to start with generating 1000 - 2000 random samples with the parameters suggested in the nearest neighbor paper. Try a moderate region size, for example, from (50, 50) to (100, 100). You can also try to implement the grid sampling procedure originally proposed in Jurie et al, as it will decrease the learning time of the trackers significantly while maintaining good performance.

For debugging, try to validate the “Static Image Motion” experiment before jumping to more complicated test.

You are encouraged to use Python + OpenCV here. Should you choose Python for your implementation, raw nearest neighbor computation can be slow. Use pyflann here to get a performance boost. Any other numerical computations should be sufficient using numpy alone.

Test your tracker with the provided sampled videos, a Python script is provided to run the tracking procedure (so you don’t have to write it on your own if you choose to work Python of course), you are also welcomed to record your own videos for testing: