Pre-Grant Publication Number: 20100262568
Filing Date: April 10, 2009Priority Date: April 10, 2008
Inventors: Anton Schwaighofer, Joaquin Quinonero Candela, Thomas Borchert, Thore Graepel, Ralf Herbrich
Assignee(s): Microsoft Corporation
Current U.S. Classification: 706, 706/012000, 706/050000
Abstract

A scalable clustering system is described. In an embodiment the clustering system is operable for extremely large scale applications where millions of items having tens of millions of features are clustered. In an embodiment the clustering system uses a probabilistic cluster model which models uncertainty in the data set where the data set may be for example, advertisements which are subscribed to keywords, text documents containing text keywords, images having associated features or other items. In an embodiment the clustering system is used to generate additional features for associating with a given item. For example, additional keywords are suggested which an advertiser may like to subscribe to. The additional features that are generated have associated probability values which may be used to rank those features in some embodiments. User feedback about the generated features is received and used to revise the feature generation process in some examples.