More on Orthogonality and Correlation of Features



In the last section I raised an important, but possibly confusing issue. Intuitively, we think in the following direction- the amount of information I can obtain from the certain system (check some links on independence, uncorrelation, etc...) grows if I make orthogonal measurements. In other words, if I measure attributes that are mutually correlated then, I am not gaining more information, since some pieces of information I get are the same for given attributes. In some cases, this natural reasoning is justified as, for example, in Communications Theory, when the signal is transmitted along the temporal axis. For uncorrelated signals, it is true that we are getting more information, with respect to overall signal space. See the following two figures:

In the figure on the left, the case of uncorrelated signals is shown. We have more degrees of freedom for the choice of the successive signal, since there are no constraints regarding the previous signal (x- previous, y-latter), and, hence, we have more available information power. With correlated signals (figure on the right), we are restricted for the value of the successive signals.

But, in pattern recognition, we are NOT observing correlations in the overall set of objects. We are observing relations in each class individually, because we are not only concerned with mutual relationships in feature set, but also with localization of different classes in the feature space, thus, we would have no information given just the correlations in the superset of classes of objects.

Now, I am going to show that informativeness of joint, strongly correlated features (or, object attributes) is greater then the sum of informativeness of individual features (meaning, if they were uncorrelated)... Of course, you'll have to ''turn the page'', again.




<< >>