A family of unsupervised sampling algorithms
Résumé
Three algorithms for unsupervised sampling are introduced. They are easy to tune, scalable and yield a small size sample. They are based on the same concepts: they combine density and distance, they use the farthest first traversal that allows for runtime optimization, they yield a coreset and they are driven by a single user parameter. DIDES gives priority to distance while density is also managed. In DENDIS, density is of first concern while space coverage is ensured. The two of them are tuned by a meaningful parameter called granularity. The lower its value the higher the sample size. The third algorithm in the family, ProTraS, aims to explicitly design a coreset. The sampling cost is the unique parameter and stopping criterion. In this chapter their common properties and differences are studied.