The Use of Context in Pattern Recognition

a. Lower Order Markov Chains
b. Hilbert Space Filling Curves
c. Markov Meshes
d. Dependence Trees

a. A Quick Bit on Compound Decision Theory
b. Dictionary Look-up Methods

2. Context in Image Classification

d. Dependence Trees ^Chow

A dependence tree is used to apply a tree dependence to approximate probability distributions. A dependence tree indicates the dependence of pixels on other pixels.

For example in the figure above shows that for a pixel x_k, which pixels are dependent on the others. For example the pixel x₂is dependent on pixel x₁, and pixels x₃, x₄, and x₅ are dependent on x₂. As we did in Markov chains consider the following, the joint probability distribution P(X) where X is a vector of N discrete random variables . In order to determine the probability distribution for a certain configuration of X, an approximation can be used, which is the product of lower order probability distributions. This is valid, because any product approximation is also a valid probability distribution.

Assume once again that we have a predefined dependence tree for each class of object that we are looking to define. This dependence tree method will help classify images into there predefined classes.

Consider a probability distribution permissible as an approximation of the following form:

M=is an unknown permutation of integers where at least on of the variables in represented. The function j(i) is called the dependence tree of the distribution and represents the mapping that is required in order to define the dependence tree.

Let xi represent a point in the plane that defines a point in the plane. Consider now two variables x_q and x_p such that the mapping function yields the result that p=j(q). Therefore the dependency function has told us that x_qis dependent on x_p. This is defined in the dependence tree graph by an arrow pointing x_q from to x_q. If j(q)=0 the there will be no arrow pointing away from the variable x_q.

Optimum Approximation

In order to determine dependencies, an approximation of the form discussed above must be discussed. Consider again a probability distribution of n discrete variables, such that

. The mutual information between the approximation, P_a(X) of the distribution and the actual distribution P(X) can be defined by:

and has the property that .

The mutual information between two probability distributions is defined as is the measure of closeness in approximating P by using the approximated distribution P_a. I(P, P_a) can be viewed as the difference in information content between the two probability distributions. The measure is always positive if the two distributions ae different and zero is they are the same. The mutual information is easily applied when the distribution is a product expansion. We will use this measure to define a approximating a probability distribution as a dependence tree.

We know that if we were to do an exhaustive search for a set of N variables, then there are a possibility of N^N-2 with N vertices. Therefore for any reasonable value of N this would take an enormous amount of time therefore, we are better off approximating the distribution. Therefore for an Nth-order probability distribution, P, we wish to find the distribution of tree dependence P_a. LEt us first consider the following two definitions in order to aid with the optimization problem:

Definition 1:

The mutual information between two random variables x_i and x_j is defined as follows:

If we use the mapping of j(i) then the mutual information between the random, ie: I(x_i,x_j) is variable and its mapping will be useful quantities.

Definition 2:

What we would like to define is a maximum weight dependence where each branch has the maximum possible weight that it can have. This means that the mutual information between x_i and x_jis a maximum. The probability distribution of tree distribution for a tree dependence P_t(X) is an optimum approximation to P(X) if and only if its dependence tree is of maximum weight.

If we expand using P_t(X) and P(X) as the functions which we are trying to find the mutual information of we can expand the following:

-----(*)

We can also define H(X) as the following as well as the mutual information function as the following.:

Now if we substitute into (*) we get the following to define our mutual information function:

H(x_i) and H(X) are both independent of the dependence tree, and I(P,P_t) is non-negative, therefore if we are able to minimize the closeness measure, or the mutual information function, then we can maximize the branch weight .

Of course we don't want to search through every possibility of branch weight in order to figure out what the optimal tree is. Therefore we must use an optimization procedure.

Optimization Procedure

The problem is now to construct a dependence tree with maximum weights for each branch of the tree.

1) Read in samples of and image from the image to be classified.

2) Calculate the pairwise Mutual Information measures I(x_i,.x_j) must be directly evaluated

3) Find the maximum measure of the mutual information of the an make those the first nodes of the dependence trees. Keep adding branches as we know the maximum mutual information measures. The problem is to make the dependence tree of maximum total branch weight.

-order the branches from according to decreasing weights, where the weighting is the Pairwise Mutual Information measure presented above
-for some branch b_iis less than or equal to the weight for some branch b_j where i<j
-select b₁ and b₂ and add b₃ so that it does not form a cycle with b₁ and b₂
-unique solutions are possible if all the branch wieghts are different, but several solutions are possible if there are some weights that are the same

An Example and an Animation

Consider the following probability distribution for four pixels:

The numbers here are fairly irrelevant, but what is most important, is to understand the procedure needed in order to create a dependence tree.

A Binary Probability Distribution

x₁x₂x₃x₄ P(x₁x₂x₃x₄) P(x₁)P(x₂)P(x₃)P(x₄)

0000 0.100 0.046

0001 0.100 0.046

0010 0.050 0.056

0011 0.050 0.056

0100 0.000 0.056

0101 0.000 0.056

0110 0.100 0.068

0111 0.050 0.068

1000 0.050 0.056

1001 0.100 0.056

1010 0.000 0.068

1011 0.000 0.068

1100 0.050 0.068

1101 0.050 0.068

1110 0.150 0.083

1111 0.150 0.083

From this the following Pairwise Mutual Information values can be calculated:

I(x₁,x₂)=0.079
I(x₁,x₃)=0.00005
I(x₁,x₄)=0.0051
I(x₂,x₃)=0.189
I(x₂,x₄)=0.0051
I(x₃,x₄)=0.0051

The following animation will demonstrate how to construct a dependence tree.

Previous : 2b.Markov Meshes
Next: 3. Context in Text Recognition

The Use of Context in Pattern Recognition

2. Context in Image Classification

d. Dependence Trees Chow

Optimum Approximation

d. Dependence Trees ^Chow