Reconstruction

Pinhole Camera

The pinhole camera model has some limitations because we don't know where along a ray an object lies.

Parallax

Multiple eyes are an evolutionary trait because they help us see depth. Objects that are closer to us move faster than objects farther away.

Correspondence pipeline

Feature detection. If sparse correspondences are enough, choose points for which we will search for correspondences
Feature description. For each point, compute a descriptor, which is a vector. This helps us match features.
Feature matching. Find the best matches for each image.
Use these correspondences for downstream tasks. ****

What makes a good feature point?

Edges, because they are invariant to lighting and noise. They are easy to differentiate because of gradient magnitude.

Repeatability/ invariance
- The same point can be seen in multiple views
Saliency / distinctiveness

Corner detection

The main idea is that translating a patch should have large change in intensity.

Suppose we have a window $W$ that we can shift by $(u,v)$. At each patch, we have a vector $\phi_0$ which is a list of intensities. To see how much the intensity shifts, we calculate the distance between the initial and final vectors. Let $E$ be the change in appearance if the window moves by $u, v$.

$$ \phi_0 = [I(0,0), I(0, 1), \dots, I(n,n)]\\ \phi_1 = [I(u, v), I (u, 1+v), \dots, I(n+u, n+v)]\\ E(u,v) = \|\phi_0 -\phi_1\|2^2\\ =\sum{(x,y)\in W}[I(x,y) - I(x+u, y+v)]^2. $$

To compute this $I( x+u, y+v)$, we can use a Taylor series approximation.

$$ f(x) = f(x_0) + Df(x_0)(x-x_0) + \frac{1}{2}D^2f(x_0)(x-x_0)^2 + \dots $$