Course readings

In roughly chronological order we will read this material. As indicated in the listings, some readings contain key principles, and should be read carefully. Others serve as an orientation of the field, and can be read cursory (ie on the computer rather than printing). Optional readings are starting points for those who like to their projects in a particular area, or are just interested in general.

Introduction

If you like to read an introduction and brief overview of the field consider Ch1 in Stockman and Shapiro's textbook Computer Vision.

Optic flow, motion

We'll start by studying the classic x,y optic flow, then generalize it to include translations, rotations and scale:

Chapter 4.3 in Ma et. al. in the Invitation to 3D Computer Vision book is the closest to the course.
Supplementary
Chapter 9 from Stockman and Shapiro has a basic presentation: Motion from 2D Image Sequences. Read the introductory sections cursory. Pay attention to the definitions in Section 9.3 and then closely look at the optic flow equations in Section 9.3.5. Notice that we can write image variability in more than x,y image plane translations as in Eq 1 in Hong Zhou's report (just look at Eq. 1. No need to read the rest unless you want to)
Black, M.J., Yacoob, Y., Jepson, A.D., and Fleet, D.J. (1997) Learning parameterized models of image motion. IEEE Conference on Computer Vision and Pattern Recognition, Puerto Rico, June, pp. 561-567 (compressed postscript) How to learn a low dimensional subspace which organizes the flow of individual points.
J.L. Barron, Fleet, D.J., and Beauchemin, S. (1994) Performance of optical flow techniques International Journal of Computer Vision, 12(1):43-77. An overview paper for those interested in what other optic flow techniques there are. (Optional/cursory reading sufficient)
J.L. Barron, and Beauchemin, S. Computation of optical flow Complementary (optional or cursory) reading to lectures.

Tracking

A good treatment on tracking with a coverage similar to the lectures is
S. Baker and I. Matthews Lucas-Kanade 20 Years On: A Unifying Framework Part 1: The Quantity Approximated, the Warp Update Rule, and the Gradient Descent Approximation IJCV 2004.
The core of SSD tracking is described at page 2-4.
The rest of the paper explains four variations: Algorithms can be additive (what we talked about in class) or compositional (an alternative). Then they can be forward or inverse. The inverse algorithms are more efficient. We talked about the inverse additive algorithms.
Efficient Region Tracking With Parametric Models of Geometry and Illumination (Greg Hager and Peter Belhumeur). IEEE PAMI, 20(10), pp. 1125-1139, 1998. This is the original paper on the inverse additive algorithm. Reading either Hager's or Baker's paper is sufficient to follow the lectures.
An adaptive model for tracking objects by color Focus on how an image region can be characterized by it's color, and how to use PCA to find a best fit ellipsoid for an arbitrary region. Rest of paper can be read cursory.
Incremental Focus of Attention. In this paper Kentaro achieves robust tracking by employing several modalities (color, edges, SSD etc) in a hierarchy. cursory reading sufficient.
The tracking library XVision, underlying the Matlab mexVison interface we use in the lab is described in Sam Lang's MSc thesis, and an older description can be found in Toyama and Hager: XVision a Portable Substrate...
Supplementary (graduate student material. Ugrad: voluntary)
- SSD tracking can also be applied to whole meshes, as in Ian Matthews ans Simon Bakers work on face tracking , or to the whole image data, as in this example in medical image registration [Stefanescu 04] R. Stefanescu, X. Pennec , N. Ayache: Grid Powered Nonlinear Image Registration with Locally Adaptive Regularization, Medical Image Analysis, 2004
- Joint likelihood methods for mitigating visual tracking disturbances Chris Rasmussen takes a different approach to achieving better tracking by applying probabilistic filtering and estimation. (Optional reading)
  Robust visual trackers
- Incremental Learning for Robust Visual Tracking IJCV 77(1):125-141, 2008. Matlab source code for the algorithm can be obtained here.
- Visual Tracking with Online Multiple Instance LearningCVPR 2009. No matlab code found yet.
- Tracking-Learning-Detection IEEE PAMI, 34(7), pp. 1409-1422, 2012. Matlab source code can be obtained here. You may still need this page to help you install your code.
- Real Time Robust L1 Tracker Using Accelerated Proximal Gradient Approach CVPR 2012. Matlab source code can be found here.
- Real-time Compressive Tracking ECCV 2012. Matlab source code can be found here.
  Tracking datasets
- Metaio dataset: a dataset and evaluation methodology for template based tracking algorithms. Access it by clicking here.
- Online Object Tracking: A Benchmark CVPR 2013. They compared the state of arts trackers and also provided evaluation tools. Feel free to download trackers and evaluation tools here.
- VOT 2014: workshop on visual tracking challenge, please go to this page for datasets.
- PETS 2014: performance evaluation of tracking and surveillance, this is also a famous dataset. You can access it by clicking here.
- IEEE ICRA 2015 Video tracking for arm and hand manipulation tasks http://webdocs.cs.ualberta.ca/~vis/trackDB/

Geometric and mathematical basics

Chapter 3.3.2 in Ma et. al. in the Invitation to 3D Computer Vision book is a compact mathematical formulation.
Review of 3D and 3D egometry: You can also read a bit of a review on general geometric transforms in the Szeliski book 2D first: Ch2 - 2.1.2, later next week 3D 2.1.3 to page 42. We will only use rotation matrices and Euler angles, though we will use skew symmetric matrices in the exponential notation in a bit of a different way. Szeliski book Ch 2
Viewing geometry in the formulation common to undergraduate courses, see e.g. Shapiro and Stockman Ch12.5ch 12 ( local link ) and ch 13 ( local link ). Read 12.1-4 Cursory if interested. Focus on the perspective imaging model in 12.5- until page 435. Ch12.6 covers the basic stereo camera experiment you will do in the lab. See Fig 12.24 Homogeneous coordinates and the basic transforms are covered in the beginning of ch 13, and applied to the camera in 13.3. Read 13.4 and on cursory if interested.

Non-Euclidean geometry

Hartley and Zisserman (Ch 1 read through cursorly) Ch 2 and 3. (First week Ch 2 until Ch 2.6 (p47)) Read carefully the basic projective geometry and transforms. Notice how elegantly and naturally homogeneous coordinates fit this framework (Ch2). The conics and dual conics (in Ch2) and quadrics (ch3) can be left to the second reading as we wont need them immediately. Focus instead on the classes of transforms.
Hartley and Zisserman Ch 4: Estimation of transforms. The most important concepts include: Understanding the DLT and coordinate normalization. Understanding the difference between geometric and algebraic error.

Camera models

Hartley and Zisserman Ch 6 and 7. Read Ch 6 thoroughly. Notice how projective geometry naturally captures and lets us derive the relationship between finite focal length perspective and infinite focal length affine cameras. Ch7 is essentially similar to Ch4 but applied to the 3x4 projective transform.

Multiple view geometry

Hartley and Zisserman Ch9, 10 in Detail. For Ch 11 coverage of the normalized linear fundamental matrix computation is sufficient. You will find the techniques and issues similar to those of Ch 4.
We will then jump to the N-view factorization methods in Ch 18. We will finish with an orientation/ cursory treatment of auto-calibration (Ch 19).
The treatment of computational methods for N-view factorization is a bit brief in the text book. A collection of the original papers are available here. Particularly the affine factorization one in the beginning of the list are very reader friendly.
Supplementary
- Bundler, a structure-from-motion (SfM) system for unordered image collections written in C/C++. Please check here for detail. You may also want to read Bundle adjustment - a modern synthesis from which they derived the optimization engine for the system.
- PTAM (Parallel Tracking and Mapping), a camera tracking system for augmented reality. You can check the project website here for detail. Also you may be interested in reading there paper Parallel Tracking and Mapping for Small AR Workspaces.
- DTAM, Dense Tracking and Mapping in Real-Time. Unlike the PTAM, which relies on feature extraction, this relies on dense, every pixel methods. There is an OpenDTAM on github, please check here.
- KinectFusion provides 3D object scanning and model creation using a Kinect sensor. You may be interested in reading there paper KinectFusion: Real-time dense surface mapping and tracking. You may want to download the Windows SDK here and check the documents here.

Practical matlab exercises on geometry

Below are some exercises on the more mathematical geometry chapter. They are more practical and use already implemented help routines in Matlab (in the matlab libraries on the lab machines).

Exercises 1-6

Ch 1: Matlab intro
Ch 2: p 3-5 Geom, Proj
Ch 6, p12-13 Multiple view geom (topic later in course)

Exercises 7-8

Ch 7: Factorization and Cross rations (included exam 1)
Ch 8: Auticalibration: Not in course.

Various exercises on geometry and cameras.

Radiometry, reflectance and Image cues

Radiometry: Modeling light and reflectance
David Forsyth and Jean Ponce: Vision Chapter 2. External link.

Szeliski's book also has some of the material on radiometry in sections 2.2.1., 2.2.2. and BRDF estimation in section 12.7.1

Another slide deck from Getech with more details on photometric stereo

Coursera video lecture
Supplementary
Spherical harmonics :
Ravi Ramamoorthi and Pat Hanrahan: On the Relationship between Radiance and Irradiance: Determining the illumination from images of a convex Lambertian object, J. Opt. Soc. Am. A, 18(10), 2001
Basri, R and Jacobs: Lambertian Reflectance and Linear Subspaces, PAMI, 25(2), 2003
Ravi Ramamoorthi: Analytic PCA construction for theoretical analysis of lighting variability in images of a Lambertian object, PAMI, 24(2), 2002
Illumination cone :
Georghiades, AS Belhumeur, PN Kriegman: From few to many: illumination cone models for face recognition under variable lighting and pose, PAMI 23(6), 2001

Shading and image cues

David Forsyth and Jean Ponce: Computer Vision - Chapter 3.

Szeliski's book has a short section about SFS and photometric stereo - 12.1.1.
Also some sections related to stereo and correlation scores in chapter 11 - section 11.1 on Epipolar geometry, rectification and section 11.3 on similarity measures.
Supplementary
Integrating normals (in Fourier space)
Frankfort, Chelappa: A method for enforcing integrability in SFS algorithms, PAMI 1998
Uncalibrated photometric stereo
Peter N. Belhumeur and David J. Kriegman and Alan L. Yuille: The Bas-Relief Ambiguity, IJCV 1999
SFS - well posed
Emmanuel Prados: Shape From Shading, chapter in In Mathematical Models of Computer Vision: The Handbook. Editors: N. Paragios, Y. Chen and O. Faugeras; Springer, 2005.

Multi-view methods - general

Surface representations - related sections from Szeliski's book - 12.3, 12.4, 12.5

Voxel carving

Szeliski's book has some material about volumetric recosntruction and shape from silhouette in chapter 11, section 11.6.
Supplementary
Good review paper
Gregory G. Slabaugh and W. Bruce Culbertson and Thomas Malzbender and Ronald W. Schafer: A Survey of Methods for Volumetric Scene Reconstruction from Photographs 2001
Voxel coloring
S. M. Seitz and C. R. Dyer: Photorealistic Scene Reconstruction by Voxel Coloring CVPR 1997
Space carving
Kiriakos N. Kutulakos and Steven M. Seitz: A Theory of Shape by Space Carving, IJCV 2000

Graph cuts

Szeliski's book has some material about global optimization for multiview stereo including dynamic programming in chapter 11, section 11.5
Geometric graph
Paris, Sylvain and Sillion, Francois and Quan, Long: A Surface Reconstruction Method Using Global Graph Cut Optimization, IJCV 2005
Graph cuts and continuous hypersurfaces
Boykov, Y. and Kolmogorov, V. : Computing geodesics and minimal surfaces via graph cuts, ICCV 2003
Binary graphs
V. Kolmogorov and R. Zabih : What energy functions can be minimized via graph cuts, PAMI 2004
Approximation for multi-labeling (alpha expansion)
Yuri Boykov and Olga Veksler and Ramin Zabih: Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001
Application to multi-view recosntruction (account for occlusions)
Vladimir Kolmogorov and Ramin Zabih: Multi-camera Scene Reconstruction via Graph Cuts, ECCV 2002

Variational and level set methods

Szeliski's book has a small sections on level sets for 3D recosntruction - section 12.5.1.
Geometric interpretation of variational surface problems (also good overview for related theory)
Solem J.E. Variational Problems and Level Set Methods in Computer Vision - Theory and Applications, PhD thesis, ,Dept of Mathematics, Lund University, 2007
Variational stereo (some proof for the evolution flows in the Appendix)
J.A. Sethian: Variational Methods for Shape Reconstruction in Computer Vision,Ph.D. thesis, Electrical Engineering Department, Washington University, August 2003.
Level set: popular introduction
J.A. Sethian: Level Sets methods - an act of violence, AmerSci '97
Level sets : first application in CV - stereo
O. Faugeras and R. Keriven: Variational principles, surface evolution, PDEs, level set methods, and the stereo problem, TR 2006 (similar to IEEE Trans. Image Processing 1998)
Level sets for shape from specular reflectance
S. Soatto and A. Yezzi and H. Jin: Tales of Shape and Radiance in Multi-view Stereo, ICCV 2003
Level sets: One of the first efficient numerical methods
J.A. Sethian: A Fast Marching Level Set Method for Monotonically Advancing Fronts, Proc Natl Acad Sci 1996
Variational disparity map - anisotropic regularizer
Luc Robert and Rachid Deriche: Dense Depth Map Reconstruction: A Minimization and Regularization Approach which Preserves Discontinuities, ECCV 96
Variational disparity map - wide baseline
C. Strecha and T. Tuytelaars and L.J. Van Gool: Dense matching of multiple wide-baseline views, ICCV 2003
Variational mesh
P. Fua and Y. Leclerc: Object-centered surface reconstruction: combining multi-image stereo shading, Image Understanding Workshop 1993
Variational mesh and reflectance
N. Birkbeck, D. Cobzas, P. Sturm, M. Jagersand: Variational Shape and Reflectance Estimation under Changing Light and Viewpoints, ECCV 2006

Modeling dynamic scenes

Integrate both stereo and motion flow constraints into the minimization
Zhang and Chandra Kambhamettu: Integrated 3D Scene Flow and Structure Recovery from Multiview Image Sequences , CVPR 2000
Spacetime stereo
Li Zhang, Brian Curless, and Steven M. Seitz Spacetime Stereo: Shape Recovery for Dynamic Scenes, CVPR 2003
3D Scene flow
S. Vedula, S. Baker, P. Rander, R. Collins, and T. Kanade: Three-Dimensional Scene Flow, ICCV 1999
Carving in 6D
S. Vedula, S. Baker, S. Seitz, and T. Kanade: Shape and Motion Carving in 6D CVPR 2000
Modeling with surfels
Rodrigo L. Carceroni, Kiriakos N. Kutulakos: Multi-View Scene Capture by Surfel Sampling: From Video Streams to Non-Rigid 3D Motion, Shape Reflectance , IJCV 2002
Variational method for modeling dynamic scenes
J.-P. Pons, R. Keriven and O. Faugeras: Modelling dynamic scenes by registering multi-view image sequences, CVPR 2005

Hand-eye coordination

The basics of the vision-based specification and control are in Dodds Z. J�gersand M. Hager G. Toyama K. A Hierarchical Vision Architecture for Robotic Manipulation Tasks In Proc. of Int. conf. on Computer Vision Systems 99.
For those interested in more readings you can get a good introduction to the field in Chapter 1 and 2 of Alexa's thesis. The visual-motor coordinate transforms and basics of image-based visual servoing are described in 3, 3.1, 3.2, and 3.3 of this tutorial to uncalibrated control.
The details of visual alignment specifications are treated in Task Specification Languages for Uncalibrated Visual Servoing Zach Dodds, PhD thesis, Yale University, 2000. (186 pages, 13M)
The high level visual task composition and execution are described in M. Jägersand Image Based Visual Simulation and Tele-Assisted Robot Control In IROS '97, Proc. New Trends in Image Based Visual Servoing.
Misc other references: IROS 2004 Tutorial on Visual Servoing 2002 summerschool Matlab toolbox

General resources

Michael T. Heath: Scientific Computing. An Introductory Survey, 2nd Ed McGraw Hill 2002
Heath provides a mathematically careful treatment of core topics.
Cleve Moler Numerical Computing with MATLAB SIAM books 2004.
A more hands on introduction to numerical computing in Matlab
Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors. Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide SIAM, Philadelphia, 2000.
Dahlquist, Bjork: Numerical Methods, Prentice hall
Golub, van Loan: Matrix Computations, Johns Hopkins press.

Course readings

Introduction

Optic flow, motion

Supplementary

Tracking

Supplementary (graduate student material. Ugrad: voluntary)

Robust visual trackers

Tracking datasets

Geometric and mathematical basics

Non-Euclidean geometry

Camera models

Multiple view geometry

Supplementary

Practical matlab exercises on geometry

Radiometry, reflectance and Image cues

Supplementary

Shading and image cues

Supplementary

Multi-view methods - general

Voxel carving

Supplementary

Graph cuts

Variational and level set methods

Modeling dynamic scenes

Hand-eye coordination

General resources

Supplementary readings