Synthetic Data for Cluster Analysis#

repliclust is a Python package for generating synthetic data sets with clusters. It is based on data set archetypes, high-level geometric blueprints that allow you to sample many data sets with the same overall geometric structure.

Note

This project forms part of the author’s PhD thesis at Caltech.