PECOK: a convex optimization approach to variable clustering

Florentina Bunea; Christophe Giraud; Martin Royer; Nicolas Verzelen

Pré-Publication, Document De Travail Année : 2016

PECOK: a convex optimization approach to variable clustering

(1) , (2) , (2) , (3)

1
2
3

Florentina Bunea

Fonction : Auteur

Cornell University [New York]

Christophe Giraud

Fonction : Auteur
PersonId : 8435
IdHAL : christophe-giraud
ORCID : 0000-0002-9320-6328
IdRef : 074599798

Université Paris-Saclay

Martin Royer

Fonction : Auteur
PersonId : 169527
IdHAL : martinroyer
ORCID : 0000-0002-5911-1907

Université Paris-Saclay

Nicolas Verzelen

Fonction : Auteur
PersonId : 737715
IdHAL : nicolas-verzelen
ORCID : 0009-0009-3411-0076
IdRef : 137391293

Mathématiques, Informatique et STatistique pour l'Environnement et l'Agronomie

Résumé

The problem of variable clustering is that of grouping similar components of a $p$-dimensional vector $X=(X_{1},\ldots,X_{p})$, and estimating these groups from $n$ independent copies of $X$. When cluster similarity is defined via $G$-latent models, in which groups of $X$-variables have a common latent generator, and groups are relative to a partition $G$ of the index set $\{1, \ldots, p\}$, the most natural clustering strategy is $K$-means. We explain why this strategy cannot lead to perfect cluster recovery and offer a correction, based on semi-definite programing, that can be viewed as a penalized convex relaxation of $K$-means (PECOK). We introduce a cluster separation measure tailored to $G$-latent models, and derive its minimax lower bound for perfect cluster recovery. The clusters estimated by PECOK are shown to recover $G$ at a near minimax optimal cluster separation rate, a result that holds true even if $K$, the number of clusters, is estimated adaptively from the data. We compare PECOK with appropriate corrections of spectral clustering-type procedures, and show that the former outperforms the latter for perfect cluster recovery of minimally separated clusters.

Domaines

Statistiques [math.ST]

Nicolas Verzelen : Connectez-vous pour contacter le contributeur

https://hal.inrae.fr/hal-02966884

Soumis le : mercredi 14 octobre 2020-15:02:56

Dernière modification le : jeudi 14 mars 2024-03:14:06

Dates et versions

hal-02966884 , version 1 (14-10-2020)

Identifiants

HAL Id : hal-02966884 , version 1
ARXIV : 1606.05100

Citer

Florentina Bunea, Christophe Giraud, Martin Royer, Nicolas Verzelen. PECOK: a convex optimization approach to variable clustering. 2020. ⟨hal-02966884⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRA AGROPOLIS INSTITUT-AGRO-MONTPELLIER INRAE INRAEOCCITANIEMONTPELLIER MISTEA MATHNUM

31 Consultations

0 Téléchargements

PECOK: a convex optimization approach to variable clustering

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager