Home
2. Context in Image Classification a. Lower Order Markov Chains3. Context in Text Recognition a. A Quick Bit on Compound Decision Theory
|
2. Context in Image Classificationd. Dependence Trees ChowA dependence tree is used to apply a tree dependence to approximate probability distributions. A dependence tree indicates the dependence of pixels on other pixels.For example in the figure above shows that for a pixel xk, which pixels are dependent on the others. For example the pixel x2 is dependent on pixel x1, and pixels x3, x4, and x5 are dependent on x2. As we did in Markov chains consider the following, the joint probability distribution P(X) where X is a vector of N discrete random variables . In order to determine the probability distribution for a certain configuration of X, an approximation can be used, which is the product of lower order probability distributions. This is valid, because any product approximation is also a valid probability distribution. Assume once again that we
have a predefined dependence tree for each class of object that we are
looking to define. This dependence tree method will help classify
images into there predefined classes.
Consider a probability distribution permissible as an approximation of the following form: . M=is an unknown permutation of integers where at least on of the variables in represented. The function j(i) is called the dependence tree of the distribution and represents the mapping that is required in order to define the dependence tree. Let xi represent a
point in the plane that defines a point in the plane. Consider now
two variables xq and xp such that the
mapping function yields the result that p=j(q). Therefore
the dependency function has told us that xq is dependent
on xp. This is defined in the dependence tree graph
by an arrow pointing xq from to xq.
If
j(q)=0 the there will be no arrow pointing away from the variable
xq.
Optimum ApproximationIn order to determine dependencies, an approximation of the form discussed above must be discussed. Consider again a probability distribution of n discrete variables, such that . The mutual information between the approximation, Pa(X) of the distribution and the actual distribution P(X) can be defined by:and has the property that . The mutual information between two probability distributions is defined as is the measure of closeness in approximating P by using the approximated distribution Pa. I(P, Pa) can be viewed as the difference in information content between the two probability distributions. The measure is always positive if the two distributions ae different and zero is they are the same. The mutual information is easily applied when the distribution is a product expansion. We will use this measure to define a approximating a probability distribution as a dependence tree. We know that if we were to do an exhaustive search for a set of N variables, then there are a possibility of NN-2 with N vertices. Therefore for any reasonable value of N this would take an enormous amount of time therefore, we are better off approximating the distribution. Therefore for an Nth-order probability distribution, P, we wish to find the distribution of tree dependence Pa. LEt us first consider the following two definitions in order to aid with the optimization problem: Definition 1: The mutual information between two random variables xi and xj is defined as follows: . If we use the mapping of j(i) then the mutual information between the random, ie: I(xi,xj) is variable and its mapping will be useful quantities. Definition 2: What we would like to define is a maximum weight dependence where each branch has the maximum possible weight that it can have. This means that the mutual information between xi and xj is a maximum. The probability distribution of tree distribution for a tree dependence Pt(X) is an optimum approximation to P(X) if and only if its dependence tree is of maximum weight. If we expand using Pt(X) and P(X) as the functions which we are trying to find the mutual information of we can expand the following: -----(*) We can also define H(X) as the following as well as the mutual information function as the following.:
Now if we substitute into (*) we get the following to define our mutual information function: . H(xi) and H(X) are both independent of the dependence tree, and I(P,Pt) is non-negative, therefore if we are able to minimize the closeness measure, or the mutual information function, then we can maximize the branch weight . Of course we don't want to search through every possibility of branch weight in order to figure out what the optimal tree is. Therefore we must use an optimization procedure. Optimization Procedure The problem is now to construct a dependence tree with maximum weights for each branch of the tree. 1) Read in samples of and image from the image to be classified. 2) Calculate the pairwise Mutual Information measures I(xi,.xj) must be directly evaluated 3) Find the maximum measure of the mutual information of the an make those the first nodes of the dependence trees. Keep adding branches as we know the maximum mutual information measures. The problem is to make the dependence tree of maximum total branch weight. -order the branches from according to decreasing weights, where the weighting is the Pairwise Mutual Information measure presented aboveAn Example and an Animation Consider the following probability distribution for four pixels: The numbers here are fairly irrelevant, but what is most important, is to understand the procedure needed in order to create a dependence tree. A Binary Probability Distribution
From this the following Pairwise Mutual Information values can be calculated: I(x1,x2)=0.079
The following animation will demonstrate how to construct a dependence tree.
Previous : 2b.Markov
Meshes
|