Classification II
Last week we covered the univariate Gaussian classifier for image segmentation. This week, we will extend this classifier to utilize multiple features using a multivariate Gaussian classifier. The multivariate case is a straight forward generalization of the univariate case to higher dimensions, so this note will be quite similar the Classification I notes.
Multivariate Gaussian classifier
As before, the task is to assign a class to every pixel in an image . Each pixel will now be associated with observable (or computable) features, which will be collected in a vector . We will use these features simultainously to determine in which class belongs, or equivivalently, in which class the feature vector will belong. We will let denote the set of feature vectors associated with the image such that each will have a unique feature vector . Note that even though each pixel has an assosiated feature vector, the value of the feature vector can be shared by multiple pixels.
Probability model
For each we will model its associated feature vector as a continuous random vector which range is in . Likewise, we will model the class as a discrete random variable which range is in . will have an a priori probability density which is the probability of labeling as belonging to before any features are observed. Likewise, we have the probability density of , , which, by total probability, can be written as
where is the joint probability of taking the value and taking the value .
We now define the likelihood which is the probability of taking the value of if it indeed was in class . With this, we can now define the a posteriori probability density
Note that the denominator is not dependent on the particular class , and we can view this as simply a normalization factor.
In this classification, we will label the pixel with the class which produces the largest posterior probability given the observations . We define the discrimination function to be the mapping from an observed feature vector to a class ,
Gaussian distribution
The previous section was very similar to the corresponding discussion in the notes about the univariate classifier. This section is also a straight forward generalization of the univariate case.
We assume that is uniformly distributed over ,
where are the number of classes in .
The conditional likelihood will be a multivariate gaussian
where is the feature mean vector of class and is the feature covariance matrix of class , and these are the variables that are to be estimated in the training process. denotes the transpose of the vector , and the normal vectors are column vectors such that the gaussian kernel is just a scalar.
Note that the discrimination function is not dependent on the marginal density of , so, as mentioned above, we will not specify it further. Since we have now described our distributions, we can revisit the discrimination function
where the last representation is what is actually implemented.
Training
To be able to determine and , we need to have a training set for each class label
As with the univariate case, we can obtain these training features using a classification mask that masks out pixels belonging one class, for every class. The difference is that, in stead of gathering one feature from one feature image for each pixel, we now gather a feature vector from an ensamble of feature images for each pixel. So if pixel is included in the mask for class , we gather the feature value at in every feature image, and collect them in a vector which is then included in . This is done for all pixels in the training mask for all the different classses.
We can now estimate the mean as
and the covariance matrix as
where denotes the number of elements in the set. As before, we use hat notation to emphasize that these quantities are estimates for the true mean and covariance of the class (which is unknown).
The procedure for classifying an image is then to classify each pixel by evaluating the function for the feature vector assosiated with the pixel, and using the mean and covariance estimators obtained from the training.
For details on classification, dataset partition and evaluation, I refer to the notes on the univariate gaussian classifier.
Singular covariance matrix
When classifying, one can encounter situations when the classifier breaks down, this section will cover one such case.
By definition, the covariance matrix is non-negative definite. To see this, consider an arbitrary set of vectors with a sample mean , and arbitrary vector , then
which is true iff is non-negative definite.
However, it can still be singular (e.i. not invertible), a property it has if and only if its determinant is zero. This occurs if, for a class two or more of the feature images from that class are linearly dependent. To see this, consider a column of the covariance matrix from class
where is the entry (corresponding to feature ) of the feature vector , and likewise with . Now, if a set of columns are to be linearly dependent, it implies that the the entries are also linearly dependent. Linear dependent columns is one of many equivalent properties of a singular matrix, and the original statement is shown.
Eigenvalues and eigenvectors
Without diving too deep into the fascinating world of eigenvectors and eigenvalues, we will present the basic concept, and how to compute it. In this course, we will mostly apply this theory to visualize feature clusters, via the eigenvectors and eigenvalues of the estimated covariance matrix of the class feature vectors. The covariance matrix contain information about the spread of the data, and since it is symmetric, we can gain information about its appearance (and hence, the appearance of the data) by studying the eigenvalues, and eigenvectors of the covariance matrix. I will not go into detail of why this is so, but rather refer the readers to this excellent explanation, and some beautiful visualizations. In the next paragraphs, I will mostly discuss how to actually compute the eigenvalues and eigenvvectors.
An eigenvector is a characteristic vector of a linear transformation that does not change its direction when the linear transformation is applied to it. A square matrix can represent a linear transformation, and in this case, we get the expression for its corresponding eigenvector and eigenvalue
This is equivalent to finding the roots of the equation
which has a non-zero solution if and only if the determinant of the matrix is zero (where denotes the identity)
Interested readers is referred to wikipedia, where the above paragraph is paraphrased from. This section is mostly included for reference, as we will be computing eigenvalues and eigenvectors in the assignments, and there it is assumed known.
Computation in 2D
In the 2D case, with
we need to solve
which, when applying the determinant evaluates to
which is a regular second order equation w.r.t. , with solutions
Evidently, we get two solutions, which we call and , and these are the eigenvalues of . We can substitute these back in equation (1), and find the corresponding eigenvectors and .
Note that in the 2D case, we cannot be sure to have real eigenvalues.