Title: Bayes Clusterer Edward R. Dougherty and Marcel Brun Department of Electrical Engineering Texas A&M University Email: edward@ee.tamu.edu Abstract: The theory of pattern classification is based on designing an operator on feature random variables that outputs a label. There exists a joint feature-label distribution whose conditional distributions with respect to the labels determine the feature classes. Within this framework, there exists an error criterion, an optimal (Bayes) classifier, and a theory of learning. For clustering, there has historically been little theory; rather, it has developed as a collection of ad hoc algorithms applied to collections of data points. Not being grounded on a distributional theory, it has lacked an error criterion, except in the case of some model-based approaches. Consequently, optimality has been absent. Moreover, there has been no learning, just the application of an algorithm based on some heuristic understanding of empirical relations among the data. This talk discusses a mathematical theory of cluster operators that includes a general error criterion, optimality, and learning. It focuses on the distributional setting and the Bayes (optimal) clusterer.