Computer Vision PART I
I have complicated feelings about CV
Only because that the previous courses are more detailed on the general view, hope this part may change a aspect. I wish to know more about real implementation, rather than well-packaged MATLAB function.
Lec 1. Intro
More topic than ever.
The part for segmentation is earlier while in practice, it seems regular.
And again with many detailed requirements on report.
Distance measurement
Disparity map:
for each pixel in the left picture with coordinates $(x_0,y_0)$,
a pixel is searched for in the right picture with coordinates $(x_0-d,y_0)$ .
Z - distance
f - focal length
L - base length
D - disparity
x’ and x’’ - coordinates of the object in the image plane in the right and left pictures of the stereo pair, respectively.
Some packages and notations
No need to recite I suppose, nothing particular.
Lec 2. Segmentation
Scoping
- Selection of connected areas on binary images
- Neighbourhood:
- Coss : 4-connectivity
- Square: 8-connectivity
- Neighbourhood:
The connected area of the image is the area (a set of points):
- all points of which have the same value;
- between any two points from the given area there is a continuous path consisting of points that also belong to the given area and are “neighbors” at the same time.
Available algo for selecting
- Forest fire (Y&U collision forms possible).
- Two pass algo (avoid collision)
Segmentation
Dividing an image into non-overlapping areas, each of which is represented by a color or texture of the same type.
R – is the entire area of the frame. Segmentation is the process of partitioning R into such a setofconnectedregions 𝑅! ,𝑖=1,…,𝑛,thatthefollowingbasicconditionsaremetfor them:
- $R=\cup R_{i=1,\dots,n}$ – splitting areas completely cover the frame;
- $R_i\cap R_j=\phi,\forall i\ne j$ – the partitioning areas do not intersect in pairs;
- $Pred(𝑅_i) = 𝑇𝑅𝑈𝐸, 𝑖 = 1, … , 𝑛$, where Pred(𝑅) is the Boolean homogeneity predicate of the area;
- $Pred(𝑅_i\cup R_j)=FALSE,\forall i\ne j $ – the pairwise union of any two regions of the given partition does not satisfy the same homogeneity condition.
Merging areas:
Perform presegmentation of the image into “starting” areas using a non- iterative (single) method.
Determine the criterion for the merging of two neighboring regions.
Iteratively find and merge all pairs of neighboring regions that satisfy the
merge criterion.
If no pair of candidates for union is found, stop and exit the algorithm.
Splitting of regions
The partitioning begins with the representation of the entire image as a simple area, which does not always meet the uniformity condition.
During the segmentation process, the current areas of the image are sequentially split in accordance with the specified uniformity conditions.
The methods of merging and splitting areas do not always lead to the same segmentation results, even if they use the same homogeneity criterion.
Basic Image Segmentation Methods
Threshold image segmentation by brightness levels
The main issue is the definition of the segmentation threshold.
Range Threshold Segmentation
Instead of threshold, we use a range to discriminate.
Multithreshold segmentation
Instead of single range, we use multiple ones.
k-means segmentation algorithm
1. Specify the number of classes k into which the image should be divided. All pixels are considered as a set of vectors $x_i,\ i=1,\dots,p$.
2. Determine k-vectors 𝑚”,𝑗=1,…,𝑘, which are declared as initial centers of clusters. Choose the values 𝑚” , 𝑗 = 1, … , 𝑘 (for example, randomly).
3. Update the values of the mean vectors 𝑚”,𝑗=1,…,𝑘,(clustercenters).For this:
* calculate the distance from each $x_i,\ i=1,\dots,p$ to each $m_j,\ j=1,\dots,k$ * assign each $x_i$ to the cluster j*, the distance to the center of which mj* is minimal * recalculate the average values mj for all clusters
4. Repeat 2,3 until the cluster centers stop changing.
Weber segmentation algo
See page 17/39
Weber principle: a person does not distinguish between gray levels between [𝐼(𝑛),𝐼(𝑛) +𝑊(𝐼(𝑛))].
Iterative algo of Vezhnevets
See page 19/39
Segmentation by skin color
Advantages: skin color is independent of face orientation; pixel color analysis is computationally efficient.
Task: finding a criterion for evaluating the proximity of the color of each pixel to the skin tone.
Development of skin color model:
Accumulation of training data using images that indicate “skin” and “non-skin” areas.
Based on this data, statistics of skin tones are accumulated.
Processing of the obtained statistics and selection of skin color model parameters for subsequent use; selection of criteria for evaluating whether pixels belong to the “skin” area;
Image processing using the obtained criteria.
Texture Segmentation
Approaches to texture segmentation:
- Statistical - allows you to characterize the texture of the area as smooth, rough and grainy.
Structural - define and describe the relative position of the simplest repeating image elements, for example, segments of parallel lines passing at a constant step, cells on a chessboard.
Spectral.
Segmentation by the morphological watershed method
A grayscale image is a digital terrain model, where the brightness values are heights relative to a certain level, i.e. image is a matrix of heights.
- If it rains on such an area, many pools are formed. Water fills small pools, then overflows from overflowing pools and the pools combine into larger pools according to the heights of the water level.
- The places where pools merge are marked as watershed lines. As a result, the entire area may be flooded.
- The result of segmentation depends on the moment when the water supply stops. If the process is stopped early, the image will be segmented into small areas, if it is stopped late, into very large ones.
Area Detectors
IBR detector (Intensity-extrema based regions)
It is necessary to go from the points of the local brightness extremum 𝐼$ along the rays, calculating some value 𝑓.
As soon as the peak of the value 𝑓 is found, it is necessary to stop. This point will be the boundary of the region.
The areas on a pair of similar images may differ, so we describe ellipses around them. If the ellipses are turned into circles, then we get complete similarity up to rotation.
MSER detector (Maximally Stable Extreme Regions)
Solves the problem of invariance of keypoints when scaling.
MSER detector algorithm:
Sort the set of all image pixels in ascending/descending order of intensity.
Construction of a pyramid of connected components. For each pixel of the sorted set, perform the following sequence of actions:
updating the list of points included in the component;
updating the areas of the next components, as a result of which the pixels of the
previous level will be a subset of the pixels of the next level.
For all components, search for local minima (we find pixels that are present in this component, but are not part of the previous ones). The set of local level minima corresponds to the extreme region in the image.
Lec 3. Hough Transform
1. Voting points method
Idea: finding locus of points satisfying the given criteria, then find the intersection of locus
Locus is a set of points whose location satisfies the specified conditions
Generalization of the method:
Points of each circle “voted” in favor of the possible position of the vertex
There are two “votes” at the intersection of the circles; in this case, this
point “won”, since it gained the maximum number of “votes”
The rest of the points of the plane received zero or one “ vote ”
The shape of the “voting curve” is determined by a priori knowledge about the voting object (in the example, the sides of the triangle are given)
vertex is located in the circles intersection area)
• Task: detection of a circle of known radius in a binary set of points
• Solution:
- The set of centers of all circl es of radius 𝑅 passing through each point forms a circle of radius 𝑅 around that point
The locus of the points is a circle of the same size with the center at
the voting point
The best solution for finding the position of the center of a circle
with given radius is the intersection point of the maximum number of voting circles
2. Hough transform for stright lines
The classic Hough Transform (HT) was proposed in 1962 for finding lines in a binary
image.
Idea: convert the set of points to a parameter space
Let a straight line can be defined as
𝑌 = 𝑘𝑋 + 𝑏
𝑋cosθ + 𝑌sinθ = ρ
Since a straight line on a plane is characterized by two parameters, the parameter space has the dimension 𝑛 = 2
The classical Hough transform uses parameters (𝜌, 𝜃)
Let the contour image be the set of points (𝑥, 𝑦) in the original space 𝐸 = (𝑋, 𝑌)
The set of lines passing through each point (𝑥, 𝑦) can be depicted as a set of points
(𝜌, 𝜃) in the space {𝜌, 𝜃}
The function of mapping a point in Hough space is called a “response function”
- Genera idea of the method
- For each point of the parameter space, the number of votes given for it is summed up (the number of points in the source image space that generate response in the parameter space that passes through a given point (𝜌, 𝜃))
- Each point of the source space generates a sinusoidal response function in the parameter space
- Any two sinusoids in the parameter space intersect at the point (𝜌, 𝜃) only if the point of the source image space which generated them lie on a straight line
Properties of the Hough transform
- The Hough transform is invariant to translation, scaling, and rotation
- Since straight lines under any projective transformations of three-dimensional space always pass only into straight lines (in the degenerate case, into points), so the Hough transform makes it possible to detect lines invariantly not only to planar affine transformations, but also to the group of projective transformations in 3D space
Transformation efficiency
- Under projective transformations, the straight line always turns into a straight line; therefore, the parameters space of low dimension (𝑛 = 2) is formed
- Each pixel of the source image is processed only once, and further calculations are performed only for pixels that carry useful information (i.e., outline). The fewer the number of pixels that have the useful information, the higher is the computational efficiency
Disadvantages of Hough transform:
- Manual selection of phase space sampling
- Noise leads to blurring of maxima
- Evenly spaced points can lead to random peaks in the parameters space
- Missing parts of source image data results in blurry values in the accumulator
Optimization of the Hough transform
Filter the unnecessary features: for lines it is worth taking points at the
edges only with a large gradient
Choose the correct grid resolution (sampling step):
Too rough: several close lines will vote for same cell
Too shallow: can skip lines because noisy points will vote for different cellsTo beter find the maxima, you can smooth the values in the accumulator
It is necessary to mark the votes: which point corresponds to which line
In order not to enumerate all possible angles, the effect of the gradient can be taken into an account. Since when detecting the contour, the value of the gradient has already been calculated
Methods for parametrizing the straight lines
As is known, images in digital systems are defined on a discrete rectangular grid, which allows only some appropriate discrete parameterization of the family of straight lines.
Consider the natural set of straight lines generated by an integer grid 𝑁×𝑁 of points containing $𝑁^2$ elements.
Any two different points of the lattice define a line, so the size of the set will be$𝑁^2(𝑁^2−1)$ lines.
Many lines will be defined multiple times by their different line segments if they contain more than two points of the original grid.
Ways to parameterize the Hough transform
1. Perimeter points (𝒎, 𝒏)
Lines are described by a pair of end points lying on the perimeter of the 𝑁×𝑁 grid.
Obviously, the number of points will be equal to 4𝑁 or (taking into account
symmetry) 1/2 Q 4𝑁×4𝑁 = 8𝑁$ lines
Since a quarter of these points lie on the same straight line (side of the square), the
final size of the parameter array will be 6𝑁$
Advantage: Application in the case of an image divided into smaller areas makes it easy to connect lines passing through several such areas, since they close to each other along the perimeter
Disadvantage: information about the angular position of straight lines is not explicitly stored
2. Perimeter point and angle 𝒂, 𝒏
- One point of intersection of the straight line with the grid perimeter 𝑛(0 ≤ 𝑛 < 𝑁) is used
- andanangledefinedbytheperimeterpoint𝑎 (–𝑁+1≤𝑎≤𝑁−1) sothat straight line passing through the grid center and point 𝑎 is parallel to a given one
- The accumulator array contains 4$𝑁^2$ elements
3. Tilt and offset 𝒂, 𝒅
The angle 𝑎 is defined by the direction of a line from the grid center to a perimeter point
The displacement of the line vertically or horizontally from the center is fixed using the distance from the center to the intersection of the straight line with the 𝑂у or 𝑂х axes
Generates 3$𝑁^2$ or 4$𝑁^2$ accumulator cells
The 𝑎, 𝑑 -parametrization is closely related to the (ρ, θ) - parametrization and with
the parameters of the equation 𝑦 = 𝑎𝑥 + 𝑏, where 𝑎 is interpreted as the slope of a straight line
3. Hough transform for circles
The described algorithm can work in exactly the same way when to detect any curve that can be described on the plane by a finite number of parameters: 𝐹 =$(𝑎_1,𝑎_2,…,𝑎_n,𝑥,𝑦)$
In the previously considered problem of finding circles of a given radius 𝑅, there was
a family of curves defined by a two-parameter equation $(𝑥 − 𝑥_0)^2+(𝑦 − 𝑦_0)^2 = 𝑅^2$
It is necessary to search for the maximum of the accumulator function 𝐴(𝑥, 𝑦) in the
parameter space (𝑥, 𝑦)
In this case, the parameter space practically coincides with the source image one
(𝑥, 𝑦)
4. Hough transform for an arbitrary shape
Algorithm for finding circles of a given radius on halftone images
- Detect edge pixels that surround the perimeter of an object. A gradient operator can be used, such as the Sobel operator
Voting contour points are ones with a high gradient
For each detected edge pixel, an estimate of the position and
orientation of the contour is used to estimate the center of a circle of radius 𝑅 by moving a distance 𝑅 from the edge pixel in the direction of the normal to the contour (in the direction of the gradient vector).
Repeating this operation for each edge pixel will find a set of positions of the possible center points, which can be averaged to determine the exact location of the center
If the radius of the circle is unknown or is varying, it is necessary to include 𝑅 as an additional parameter in the parametric accumulator space
The local maxima search procedure should determine the radius as well as the position of the center by considering changes along the third dimension of the parametric space
If the size of the detected circle is out of interest and you only want to find its center, then you may not increase the dimension of the parameter space
For every possible direction to the “center”, the contour point votes with a ray in that direction.
As a result, all possible positions of the “center” will be involved at any scale of the object, which will allow searching for circles regardless of their radius
- After finding the potential centers of the circles, it is necessary to find the radius of the circles with the centers at the found points
Analyzing the accumulator function when searching for geometric primitives
Direct search for a fixed number of local maxima (one global maximum) in the parameter space. There are various ways to find such maxima
Threshold segmentation of the accumulator function and subsequent analysis of connected regions of the parameter space
If we chose the threshold equal to be equal to the value of the minimum of local maxima, then the second method will give the same result with the first one
The problem of the optimal choice of the threshold for a particular image remains
Short lines (curve segments) will give relatively low peaks in accumulator function if compared to long ones. They will only be detected if it is known to be present in the image prior to setting the threshold
How to avoid thresholding?
- At each stage of the analysis, we search for single global maximum of the accumulator function
- After finding the global maximum we subtract the income of all source image points that resulted in this line from the parameter space
- Then start searching for the next global maximum
The result: a greater sensitivity to short lines and robustness to noise
Summary
- Hough transform modifications provide invariant detection of geometric primitives and objects in the image with a high degree of noise immunity and significant accuracy in determining the location and orientation parameters
- The described algorithms do not detect grayscale objects themselves, but their outlines. Objects that do not have a well-defined outline cannot be detected using locus
Lec 4. Image Features
Marr Paradigm.
Image analysis goes from «iconic» data representation to «symbolic» representation
Steps of image processing:
- Image preprocessing
- Primary image segmentation
- Detecting the geometric structure
- Detecting of the relative structure and semantics of the scene
Types of possible features
Point, edge, erea, line, corner
I. Feature points
A feature point 𝑝 of an image 𝐼 is a point with a characteristic (special) neighborhood. This point different from all other points in a certain neighborhood
II. Feature point detectors
Corner Detectors
Corners are well repeatable and distinguishable.
The main property of a corner: the image gradient has two dominant directions in the area around the corner.
Moravec Detector
Measure the change in pixel intensity 𝐼 𝑥, 𝑦 by shifting a square window centered in 𝑥, 𝑦 to one pixel in each of eight principal directions (2 horizontal, 2 vertical and 4 diagonal).
Sigma Filter
where f - source image
H - operator of feature detection
$*$ - operation of applying the operator
FAST Detector
Features from Accelerated Segment Test
For each pixel 𝑃 of the image a neighbouring Bresenham circle around this pixel is considered. In case if 7x7 kernel is used, then it would contain 16 pixels.
A pixel is considered to be a corner point if it has 𝑁 adjacent pixels on the circle whose intensities satisfy the state condition 𝑑 or 𝑏.
Harris Detector
Let the variation in image intensities depend on the shift (𝑢, 𝑣). It can be evaluated according to the formula:
where 𝐼 𝑥,𝑦 – intensity in a point 𝑥,𝑦 ,
𝑤(𝑥, 𝑦) – window function,
𝐼(𝑥+𝑢,𝑦+𝑣) – shift of intensity.
The window function 𝑤 (𝑥, 𝑦) can be defined either in a binary form or in the form of a Gaussian function.
By expanding the difference in intensities squared into a second-order Taylor series at the point (𝑥, 𝑦) (bilinear interpolation) for small shifts, we obtain the following approximation
Algorithm:
Calculate the gradient of the image in each pixel.
Calculate the matrix 𝐌 from the window around each pixel.
Calculate the response of the corner 𝐑.
Cut off weak corners at threshold 𝐑.
Find the local maxima of the response function in a neighborhood
of a given radius.
Choose 𝑵 the strongest local maxima.
Forstner Detector
Detector Invariance
• Invariance Property – If apply a geometric (affine) or photometric (affine intensity change 𝐼 → 𝑎𝐼 + 𝑏) transformation to the image, then the detector must find the same set of points.
- Light invariance:
- To light shifting (𝐼 → 𝐼 + 𝑏);
- To light scaling (𝐼 → 𝑎𝐼).
- Rotation Invariance
- Scaling Invariance
III. Blob detectors
LoG (Laplacian of Gaussian) Detector
- Blobs are blob-shaped groups of connected pixels that share a common property and are different from surrounding regions
- Blob centers are feature points
- Blobs are described by 4 parameters
- center coordinates (x, y)
- scale
- direction
Gaussian is a Gauss function which are used to blur images.
$f$ - edge
$\frac{d}{dx}g$ - Derivative of Gaussian
$f*\frac{d}{dx}g$ - Edge = maximum of derivative
$f*\frac{d^2}{dx^2}g$ - Edge = zero transition of the second derivative
«Edge» is a burst of function
«Blob» is a combination of two bursts
The magnitude of the Laplacian of Gaussian response reaches a
maximum in the center of the blob if the size of the Laplacian “corresponds” to the size of the blob
Search for blob characteristic size: convolution with Laplacian at several scales and search for maximum response
The response of the Laplacian fades if zooming scale
Multiscale blob detector: convolution of a blurry image using the normalized Laplace filter at several scales and the choice of scale with the maximum Laplacian response.
DoG (Difference of Gaussians)
Method for the approximate calculation of the Laplacian of a Gaussian: the search for the difference of two Gaussians with different scales:
Harris-Laplace Detector
Selecting the corners in the image, considering the characteristic size: looking for points that maximize the response of the Harris corner in the image and the Laplacian response in scale.