;
This web site and all of its contents Copyright © 2002 Peer Science LLC. All rights reserved

Feature Subset Selection

One of the unique capabilities the Peer Science technical team has developed over the years that differentiates it from other data analysis companies is its feature selection capability. Pattern recognition systems make their decisions based upon predictor variables (a.k.a. independent inputs, or features) associated with an object or event. Therefore,the selection of which features are to be provided to the recognizer is critically important to its performance. For example, if you wanted to detect patterns of fraud and abuse of service providers in a medical insurance database, you might consider such features as:
It would be easy to come up with hundreds of other possible features. Some would be good predictors of fraud; others would not. Generally, 10 to 50 known features (individual pieces or chunks of data) associated with each example are selected from a potential feature set of as many as 200 to 800. But what is the best set of features -- that is, which subset of all the possible features produce the best model, and give you the most accurate prediction?

One of the breakthroughs the Peer Science team has made in the pattern recognition area has been to develop a methodology that is used to select the optimal feature set given a specific problem to be solved. In fact, Dr. DeRouin, Peer Science's CTO, wrote his doctoral thesis on this topic. Unlike statistical techniques, such as principal components analysis (closely related to Singular Value Decomposition) which focus on data representation and its orthogonality, the automated saliency technique uses the performance of the classifier as the indicator of feature strength. Thus, our techniques optimize the overall performance of the system rather than optimizing the data representation.

This methodology allows Peer Science to build pattern recognition systems that have a performance edge over any other system available. This feature selection capability is provided as our eXtreme FeatureSelector product, so that you, too, can have this edge.