|
One of the unique capabilities the Peer Science technical team has developed over the years that differentiates
it from other data analysis companies is its feature selection capability. Pattern recognition systems make their decisions
based upon predictor variables (a.k.a. independent inputs, or features) associated with an object or event. Therefore,the selection of
which features are to be provided to the recognizer is critically important to its performance. For example, if you wanted to detect
patterns of fraud and abuse of service providers in a medical insurance database, you might consider such features as:
It would be easy to come up with hundreds of other possible features. Some would be good predictors of fraud; others would not.
Generally, 10 to 50 known features (individual pieces or chunks of data) associated with each example are selected from a potential
feature set of as many as 200 to 800. But what is the best set of features -- that is, which subset of all the possible features
produce the best model, and give you the most accurate prediction?
One of the breakthroughs the Peer Science team has made in the pattern recognition area has been to develop a methodology that is used
to select the optimal feature set given a specific problem to be solved. In fact, Dr. DeRouin, Peer Science's CTO, wrote his doctoral
thesis on this topic. Unlike statistical techniques, such as principal components analysis (closely related to Singular Value
Decomposition) which focus on data representation and its orthogonality, the automated saliency technique uses the performance of
the classifier as the indicator of feature strength. Thus, our techniques optimize the overall performance of the system rather than
optimizing the data representation.
This methodology allows Peer Science to build pattern recognition systems that have a performance edge over any
other system available. This feature selection capability is provided as our
eXtreme FeatureSelector product, so that you, too, can have this edge.
|