## Group Member

## News

Book chapter submitted on Copula Eigenfaces with Attributes.

## links

- SNF project Copula Distributions in Machine Learning

## About me

I am a PhD candidate in the Biomedical Data Analysis group of Prof. Dr. Volker Roth at the University of Basel. In 2012, I received my Master's degree in Information technology and electrical engineering from the Swiss Federal Institute of Technology in Zurich (ETHZ). During my studies, I completed internships in the research and development divisions of Albis technologies, Zürich and Siemens Building Technologies, Zug. My research interests include Copula models, information theory, archetypal analysis, signal processing as well as applications in neuroscience and computer vision. Currently, I try to detect directed information in electroencephalograms. My leisure activities include hiking and playing ultimate frisbee.

## My projects

- Copula Distributions in Machine Learning
- Network reconstruction for diagnosis and early risk assessment for neurodegenerative diseases

## Contact Information

University of Basel

Dinu Kaufmann

Department of Mathematics and Computer Science

Room 06.003

Spiegelgasse 1

CH - 4051 Basel, Switzerland

**E-mail:** dinu.kaufmann@unibas.ch

**Phone:** + 41 61 207 0542

## List of Publications

- Bayesian Markov Blanket Estimation. Dinu Kaufmann, Sonali Parbhoo, Aleksander Wieczorek, Sebastian Keller, David Adametz, Volker Roth -- In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS) 2016, Cadiz, Spain. JMLR: W&CP volume 51.

Graphical models are used for representing dependencies among several random variables. In a typical application, the network of dependencies is unknown and the goal is to identify the dependencies from observations. In a high-dimensional setting, identifying the full network can be diffucult. Also, identifying the full network may also be undesirable or irrelevant because one is not interested in parts of the network. Consider the example in gene analysis where the dependency between only a few clinical factors and hundreds of genetic markers is required. In such situations it is advisable to reduce the focus on estimating a sub-network as opposed to estimating an entire network of all the associations. Here, we are looking at undirected networks and focus on a specific sub-network, namely on the Markov blanket. This is the set of variables that, when conditioned on, render the variables of interest conditionally independent of the rest of the network. The goal is to identify the Markov blanket, i.e. identifying the nodes among a large set of candidates which are the neighbours of the query variables.

We provide a Bayesian perspective of estimating the Markov blanket in an undirected network. This view enables the computation of a posterior distribution and thus offers a means to assess the (un-)certainty of the network. This contrasts with the maximum likelihood approach of the graphical lasso which only provides a point estimate of the network. We show, how to avoid a limiting inversion when estimating a network in the context of a Markov blanket. Our posterior distribution has an analytic form and can be efficiently sampled from. Overall, we show that the Markov blanket can be estimated efficiently without explicitly inferring the entire network. - Copula Archetypal Analysis. Kaufmann, Dinu and Keller, Sebastian and Roth, Volker. In Pattern Recognition--GCPR 2015, pp 117--128, 2015. Erratum: Fig. 6, right panel

Finding structure in heterogenous and high-dimensional data sets can be challenging but basis transformations to compact representations often faciliate the analysis of data. Archetypal analysis is a data-adaptive compression technique which represents the data in a lower dimensional manifold. As a special virtue, it represents the data while keeping extremal characteristics of the data set. When combining different data sources, the data is often normalised or transformed to enable a meaningful analysis. However, finding a suitable transformation can be demanding because structure within the data set emerge or vanish depending on the transformations. We will use the copula framework to give a principled way to approach this problem. Using this framework, we present an algorithm, which is invariant against monotone transformations. This property can be useful in many situations, since this class of transformations is rather large. Copula archetypal analysis is presented as providing a unified method for absorbing monotone transformations. Moreover, the Gaussian coupla is motivated to be a justified approximation for the probabilistic and generative model we consider. Moreover, we highlight additional benefits which come with the copula extension, such as missing value imputation and robustness to outliers. - Copula Eigenfaces - Semiparametric Principal Component Analysis for Facial Appearance Modeling. Bernhard Egger, Dinu Kaufmann, Sandro Schoenborn, Volker Roth and Thomas Vetter. International Conference on Computer Graphics Theory and Applications--GRAPP 2016, in press, 2016.

Parametric Appearance Models (PAM) describe objects in an image in terms of pixel intensities. In the context of faces, Active Appearance Models and 3D Morphable Models are established PAMs to model appearance and shape. The dominant method for learning the parameters of a PAM is principal component analysis (PCA). PCA is used to describe the dependency and variance in the data. The method requires that the observed data be Gaussian-distributed. We show that this requirement is not fulfilled in the context of analysis and synthesis of facial appearance. The model mismatch leads to unnatural artifacts which are severe to human perception. In order to prevent these artifacts, we propose to use a semiparametric Gaussian copula model, where dependency and variance are modeled separately. The model enables us to use arbitrary marginal distributions and hence relax the restrictive Gaussian requirement of the data distribution. The new flexibility provides scale invariance and robustness to outliers as well as a higher specificity in generated images. Moreover, the new model makes possible a combined analysis of facial appearance and shape data.

The separation of marginals and dependency pattern enhances the model flexibility. In practice, the proposed model can easily enhance the performance obtained by PCA in existing pipelines. We showed qualitatively that the copula extension models facial appearance better than PCA. This finding is supported by a quantitative evaluation using specificity as a model metric. Moreover, the copula model enables to add further data to the model: Age, weight, size, and other data like social attributes can be incorporated in the model in an unified way. To demonstrate this feature, we showed that the inclusion of shape also increased the specificity of the model.