20.04.2012

A. Borusan

## 07.05.2012, 16 Uhr c.t. TU Berlin, EN building, seminar room EN 719 (7th floor), Einsteinufer 17, 10587 Berlin: "A Projection and Probability Estimation Method for Knowledge Discovery" (Adam Stanski, Datenmeer GmbH)

A key ingredient to modern data analysis is probability estimation. Theoretically, it could be used to answer the major questions arising in problems like regression, ranking, classification, clustering, or outlier detection. However, it is well known that the curse of dimensionality prevents a proper estimation of probability in high dimensions. The problem is typically circumvented by using a fixed set of assumptions about the data, e.g., by assuming partial independence of features, data on a manifold or a customized kernel. These fixed assumptions limit the applicability of a method.

In this talk we propose a framework that uses a flexible set of assumptions instead. It allows to tailor a model to various problems by means of 1d-decompositions. The approach achieves a fast runtime and is not limited by the curse of dimensionality as all estimations are performed in 1d-space.

The wide range of applications is demonstrated at the example of an industrial project. Its goal was to discover patterns and relations in a large, heterogeneous database. The problem was solved with a new data mining software that allows the fully automatic discovery of patterns. The software, which is publicly available for evaluation and academic usage, is demonstrated during the talk.