Namespace for everything in the cluster library. More...
Classes | |
| struct | counter_iterator |
| Counting output iterator that records how many times an output iterator was assigned to, but ignores the value stored. More... | |
| struct | matrix_distance |
| Adaptor for passing a matrix by reference to template functions that take a callable distance function. More... | |
| struct | lazy_distance_functor |
| Functor for computing distance lazily from an object array and a distance metric. More... | |
| struct | id_pair |
| MPI-packable struct for an MPI-packable type plus its object id. More... | |
| class | kmedoids |
| Implementations of the classic clustering algorithms PAM and CLARA, from Finding Groups in Data, by Kaufman and Rousseeuw. More... | |
| class | multi_gather |
| Asynchronous, some-to-some gather operation used by parallel clustering algorithms to simultaneously send members of sample sets to a set of distributed worker processes. More... | |
| class | par_kmedoids |
| This class implements the CAPEK and XCAPEK scalable parallel clustering algorithms. More... | |
| struct | par_partition |
| par_partition represents a partitioning of a distributed data set. More... | |
| struct | partition |
| This represents a partitioning of a data set. More... | |
| struct | trial |
| This struct represents parameters for a single trial run of kmedoids. More... | |
| class | trial_generator |
| Class to generate a set of trials for clustering. More... | |
Typedefs | |
| typedef boost::numeric::ublas::symmetric_matrix < double > | dissimilarity_matrix |
| Packed repersentation of symmetric dissimilarity matrix. | |
| typedef size_t | medoid_id |
| More descriptive type for medoid index. | |
| typedef size_t | object_id |
| More descriptive type for object index. | |
| typedef std::vector< std::set < object_id > > | cluster_list |
| Explicit representation of a clustering. | |
Functions | |
| template<typename D > | |
| double | bic (const partition &p, D distance, size_t M) |
| Directly computes the BIC from a partition object based on the cluster centroids and the number of clusters. | |
| template<typename SizeIterator , typename DissimIterator > | |
| double | bic (size_t k, SizeIterator cluster_sizes, DissimIterator sum2_dissim, size_t dimensionality) |
| This version of the BIC assumes some precomputed information. | |
| template<class T > | |
| counter_iterator< T > | counter (T &ref) |
| Adaptor for creating type-inferred counters. | |
| template<class T , class D > | |
| void | build_dissimilarity_matrix (const std::vector< T > &objects, D dissimilarity, dissimilarity_matrix &mat) |
| Computes a dissimilarity matrix from a vector of objects. | |
| template<class T , class D > | |
| void | build_dissimilarity_matrix (const std::vector< T > &objects, const std::vector< size_t > &subset, D dissimilarity, dissimilarity_matrix &mat) |
| Computes a dissimilarity matrix from a subset of a vector of objects. | |
| template<class T , class D > | |
| lazy_distance_functor< T, D > | lazy_distance (const std::vector< T > &objs, D dist) |
| Type-inferred syntactic sugar for constructing lazy_distance_functor. | |
| template<class T > | |
| id_pair< T > | make_id_pair (const T &elt, int id) |
| Helper function for making arbitrary id_pairs with type inference. | |
| template<class T > | |
| std::ostream & | operator<< (std::ostream &out, const id_pair< T > &p) |
| Print out an id_pair as a tuple of its element and its source rank. | |
| std::ostream & | operator<< (std::ostream &out, const par_partition &par) |
| Right now this just uses parition::operator<<() by making a partition with this par_partition's cluster_ids and medoid_ids vectors and outputting it. | |
| ostream & | operator<< (ostream &out, const cluster_list &clusters) |
| double | mirkin_distance (const cluster_list &c1, const cluster_list &c2) |
| Mirkin distance bt/w two clusterings. | |
| std::ostream & | operator<< (std::ostream &out, const partition &km) |
| For convenience. | |
| double | mirkin_distance (const partition &c1, const partition &c2) |
| Convenience overload for comparing partition objects directly. | |
| void | expand (cluster_list &list, size_t level=1) |
| Expand a cluster_list by l levels. | |
| std::ostream & | operator<< (std::ostream &out, const partition::member_writer &mw) |
| std::ostream & | operator<< (std::ostream &out, const cluster_list &list) |
| Prints out nicely formatted clustering. | |
| template<typename D > | |
| double | total_dissimilarity (const partition &p, D dist) |
| Compute the total dissimilarity between all objects and their medoids. | |
| template<typename D > | |
| double | total_dissimilarity (const partition &p, D dist, medoid_id m) |
| Compute the total dissimilarity between all objects in a particular cluster and its medoid. | |
| template<typename D > | |
| double | total_squared_dissimilarity (const partition &p, D dist) |
| Compute the total squared dissimilarity between all objects and their medoids. | |
| template<typename D > | |
| double | total_squared_dissimilarity (const partition &p, D dist, medoid_id m) |
| Compute the total squared dissimilarity between all objects in a particular cluster and its medoid. | |
| template<class OutputIterator , class Random > | |
| void | random_subset (size_t numElements, size_t sample_size, OutputIterator out, Random &random) |
| This is Knuth's algorithm R for taking a sample of indices from 0 to numElements. | |
| long | get_time_seed () |
| Returns a reasonably distributed seed for random number generators. | |
Namespace for everything in the cluster library.
| typedef std::vector< std::set<object_id> > cluster_list |
Explicit representation of a clustering.
Instead of a vecto of representative ids, this has k sets of object_ids indicating which objects are in a particular cluster. You can convert a partition to a cluster_list with to_cluster_list().
Definition at line 60 of file partition.h.
| typedef boost::numeric::ublas::symmetric_matrix<double> dissimilarity_matrix |
Packed repersentation of symmetric dissimilarity matrix.
Definition at line 48 of file dissimilarity.h.
| typedef size_t medoid_id |
More descriptive type for medoid index.
Definition at line 51 of file partition.h.
| typedef size_t object_id |
More descriptive type for object index.
Definition at line 52 of file partition.h.
| double cluster::bic | ( | const partition & | p, |
| D | distance, | ||
| size_t | M | ||
| ) |
Directly computes the BIC from a partition object based on the cluster centroids and the number of clusters.
| [in] | p | A partition object describing the clustering to be evaluated. |
| [in] | distance | A distance function callable on two indices from the partition p. |
| [in] | M | Dimensionality parameter -- degrees of freedom in the input dataset. |
| double cluster::bic | ( | size_t | k, |
| SizeIterator | cluster_sizes, | ||
| DissimIterator | sum2_dissim, | ||
| size_t | dimensionality | ||
| ) |
This version of the BIC assumes some precomputed information.
This is useful for parallel implementations, where it is more efficient to compute some global sums as a reduction rather than aggregating a full partition to one process. Here, we assume that the sizes of the distributed clusters as well as the total squared intra-cluster dissimilarity (between each object and its medoid) is known.
| [in] | k | Number of clusters in the clustering. Same as k from k-means or k-medoids. |
| [in] | cluster_sizes | Start of range of k sizes. *cluster_sizes .. *(cluster_sizes + k) should be the sizes of clusters 1 to k |
| [in] | sum2_dissim | Sum of squared dissimilarities of each object w.r.t. its nearest medoid. |
| [in] | dimensionality | Dimensionality of clustered data. e.g., 2 for 2-dimensional points. |
| void cluster::build_dissimilarity_matrix | ( | const std::vector< T > & | objects, |
| D | dissimilarity, | ||
| dissimilarity_matrix & | mat | ||
| ) |
Computes a dissimilarity matrix from a vector of objects.
| [in] | objects | Vector of any type T. |
| [in] | dissimilarity | A dissimilarity measure that gives the distance between two T's. Needs to be callable on (T, T). |
| [out] | mat | Output parameter. Dissimiliarity matrix is stored here. |
Definition at line 59 of file dissimilarity.h.
| void cluster::build_dissimilarity_matrix | ( | const std::vector< T > & | objects, |
| const std::vector< size_t > & | subset, | ||
| D | dissimilarity, | ||
| dissimilarity_matrix & | mat | ||
| ) |
Computes a dissimilarity matrix from a subset of a vector of objects.
| objects | Vector of any type T. |
| subset | Indirection vector. Contains indices into objects for elements to be compared. |
| dissimilarity | A dissimilarity measure that gives the distance between two T's. Needs to be callable(T, T). |
| mat | Output parameter. Dissimiliarity matrix is stored here. |
Definition at line 83 of file dissimilarity.h.
| counter_iterator<T> cluster::counter | ( | T & | ref ) |
Adaptor for creating type-inferred counters.
Example Usage:
#include <algorithm> // construct two sets set<int> s1, s2; // insert some things so that their intersection has 2 ints. s1.insert(1); s1.insert(2); s1.insert(3); s2.insert(2); s2.insert(3); s2.insert(4); // Compute intersection, but throw away the values and just // store the size in the count variable. size_t count; set_intersection(s1.begin(), s1.end(), s2.begin(), s2.end(), counter(count)); // now count == 2, since 2 items were inserted // by <code>set_intersection.
| void expand | ( | cluster_list & | list, |
| size_t | level = 1 |
||
| ) |
Expand a cluster_list by l levels.
That is, replace each index i in the cluster_list with indices in [2^l * i ... 2^l * (i+1) - 1]
Definition at line 170 of file partition.cpp.
| long cluster::get_time_seed | ( | ) | [inline] |
| lazy_distance_functor<T,D> cluster::lazy_distance | ( | const std::vector< T > & | objs, |
| D | dist | ||
| ) |
Type-inferred syntactic sugar for constructing lazy_distance_functor.
Definition at line 124 of file dissimilarity.h.
| id_pair<T> cluster::make_id_pair | ( | const T & | elt, |
| int | id | ||
| ) |
| double mirkin_distance | ( | const cluster_list & | c1, |
| const cluster_list & | c2 | ||
| ) |
Mirkin distance bt/w two clusterings.
Definition at line 125 of file partition.cpp.
| double mirkin_distance | ( | const partition & | c1, |
| const partition & | c2 | ||
| ) |
Convenience overload for comparing partition objects directly.
Definition at line 162 of file partition.cpp.
| std::ostream& cluster::operator<< | ( | std::ostream & | out, |
| const id_pair< T > & | p | ||
| ) |
| std::ostream& cluster::operator<< | ( | std::ostream & | out, |
| const partition::member_writer & | mw | ||
| ) | [inline] |
Definition at line 141 of file partition.h.
| std::ostream & operator<< | ( | std::ostream & | out, |
| const par_partition & | par | ||
| ) |
Right now this just uses parition::operator<<() by making a partition with this par_partition's cluster_ids and medoid_ids vectors and outputting it.
Definition at line 110 of file par_partition.cpp.
| ostream& cluster::operator<< | ( | ostream & | out, |
| const cluster_list & | clusters | ||
| ) |
Definition at line 113 of file partition.cpp.
| std::ostream & operator<< | ( | std::ostream & | out, |
| const partition & | p | ||
| ) |
For convenience.
Definition at line 154 of file partition.cpp.
| std::ostream& cluster::operator<< | ( | std::ostream & | out, |
| const cluster_list & | list | ||
| ) |
Prints out nicely formatted clustering.
| void cluster::random_subset | ( | size_t | numElements, |
| size_t | sample_size, | ||
| OutputIterator | out, | ||
| Random & | random | ||
| ) |
This is Knuth's algorithm R for taking a sample of indices from 0 to numElements.
We sample size indices from this (the superset) and put them in the subset's mapping.
| numElements | total number of elements to select from |
| sample_size | number of elements to select |
| out | destination for selected elements |
| random | model of STL Random Number Generator. must be callable as random(N), returning a random number in [0,N). |
| double cluster::total_dissimilarity | ( | const partition & | p, |
| D | dist, | ||
| medoid_id | m | ||
| ) |
Compute the total dissimilarity between all objects in a particular cluster and its medoid.
Definition at line 182 of file partition.h.
| double cluster::total_dissimilarity | ( | const partition & | p, |
| D | dist | ||
| ) |
Compute the total dissimilarity between all objects and their medoids.
Definition at line 170 of file partition.h.
| double cluster::total_squared_dissimilarity | ( | const partition & | p, |
| D | dist, | ||
| medoid_id | m | ||
| ) |
Compute the total squared dissimilarity between all objects in a particular cluster and its medoid.
Definition at line 210 of file partition.h.
| double cluster::total_squared_dissimilarity | ( | const partition & | p, |
| D | dist | ||
| ) |
Compute the total squared dissimilarity between all objects and their medoids.
Definition at line 196 of file partition.h.
Muster.
Copyright © 2010, Lawrence Livermore National Laboratory, LLNL-CODE-433662.