As we own quite powerful process capabilities, it’s possible to put a webcam pointing to the ground and to track points in order not to drift along x and y. In order to do this, we could use the Lucas-Kanade algorithm. Here are the main features that would be used.
We have two pictures in gray scale, A and B, taken at different times separated of δt.
We want to know the movement vector of some interesting pixels, which can for example either be chosen in a random way, or found using a Sobel filter, (outline detection), or in a heavier but optimal way that we will explain in the end of this article. Of course, a high number of interesting pixels would lead to a better movement estimation but it would also slow down computing time: we have to find a compromise when thinking embedded systems.
We take a searching window for each interesting pixel found with a size of 2*ωx+1 by 2*ωy+1 (in pixels). Once again, a larger window would allow a bigger movement detection, but would take more iterations (this problem is mostly resolved by using pyramidal form, that we will explain later).
We just have to find the movement which minimizes the error:
We begin by obtaining the derivative of the error in comparison with the move:
Then, we apply a Taylor development (order 1) around the move [0, 0] (valid if we consider move is small, and we will admit it):
From now, we can briefly stop computations to consider some facts.
First, A(x, y) – B(x, y) looks like the picture’s derivative at point (x, y), which we will note dI(x, y).
Then, B’s partial derivatives in x and y look like a gradient. As we can consider the window to be big enough in comparison with the movement vector, windowed picture’s derivative will be the same in image B and in image A. Therefore, we have the following gradient:
for any (x, y) belonging to the window:
With those notations, we obtain:
Finally, a slightly transformation gives us:
If we use:
we can rewrite the previous equation:
As, let us not forget about it, the goal is to cancel out error’s derivative, we get:
G is invertible if gradient of A is defined, i.e if the window is contained into the picture, and if we have a high enough contrast into the window (see last part about good pixels choice).
We have now a way to compute a ‘most coherent’ movement vector, which allows us to transit from A to B in a given window for a given pixel.
k iterations in order to sharp the result
Previous computations are just a single iteration of the algorithm. We can consolidate the result by applying them several times, until, for example, the new found norm of movement is less than a chosen epsilon.
For an iteration index k, if we have a result , we transform B this way:
and then we compute:
We can see now how useful it was to compute gradient of A rather than gradient of B because we won’t have to compute it again through iterations.
We finally get the final move:
Larger move problem: the Pyramidal form
We saw in first section that using windows wasn’t enough to get a good compromise between process speed and large moves detection.
A way to fix it is the use of pyramids.
We take m pyramidal levels (often 3) and we apply the whole algorithm at each level from Lm to 0.
At each level, we build a new picture at a 2^L less accurate resolution.
We keep a same sized window and just transform coordinates of the tracked pixels in order to remain coherent with new resolution. To obtain the final move, we just have to sum all rescaled found moves:
Here is a small illustration of the procedure:
Here we want to track Lena’s nose from the first picture to the second one.
We use a 4 levels (Lm = 3) pyramidal form, and in the end we will just have to sum rescaled moves to get the whole one.
Some pixels are easier to follow than others
We can choose the best pixels to be followed by thinking about the use we want to have with them.
Here, it will be the inverse of G which will be important, we therefore need a not null determinant for each pixel of the window, or in a more mathematical way, we must have our minimal eigenvalue to be maximal.
We can act that way:
- Compute G and its minimal eigenvalue λ for each pixel of picture A
- Save biggest eigenvalue λ_max of the whole picture (and manage to be not too close of the side in order to get rid of any window problem)
- Keep all pixels which have a λ superior of a percentage of λ_max (e.g 5 or 10%)
- Keep only those whose λ is the maximum in a 3×3 neighborhood
- Keep a minimal distance between each pixel (e.g 5 or 10 pixels)
- Finally, remove minimal λ pixel until we reach the maximal amount of tracked pixel
This procedure is quite heavy, mostly because we should repeat it for each new image couple. That is why we could for example keep tracked pixels from a image to another or always keep the same reference picture and its pixels if we just want to know how far we moved from a given spot.
We saw the version of Lucas and Kanade algorithm which is implemented in OpenCV library. This algorithm is easy to understand and easily customizable in order to be adapted to the most exigent embedded systems.