Implementations of the classic clustering algorithms PAM and CLARA, from Finding Groups in Data, by Kaufman and Rousseeuw. More...

#include <kmedoids.h>

Inheritance diagram for kmedoids:

[legend]

Collaboration diagram for kmedoids:

[legend]

List of all members.

Public Member Functions
	kmedoids (size_t num_objects=0)
	Constructor.
	~kmedoids ()
double	average_dissimilarity () const
	Get the average dissimilarity of objects w/their medoids for the last run.
void	set_sort_medoids (bool sort_medoids)
	Set whether medoids will be sorted by object id after clustering is complete.
void	set_epsilon (double epsilon)
	Set tolerance for convergence.
void	pam (const dissimilarity_matrix &distance, size_t k, const object_id *initial_medoids=NULL)
	Classic K-Medoids clustering, using the Partitioning-Around-Medoids (PAM) algorithm as described in Kaufman and Rousseeuw.
double	xpam (const dissimilarity_matrix &distance, size_t max_k, size_t dimensionality)
	Classic K-Medoids clustering, using the Partitioning-Around-Medoids (PAM) algorithm as described in Kaufman and Rousseeuw.
template<class T , class D >
void	clara (const std::vector< T > &objects, D dmetric, size_t k)
	CLARA clustering algorithm, as per Kaufman and Rousseuw and R.
template<class T , class D >
void	center_medoids (const std::vector< T > &objects, D distance)
	Takes existing clustering and reassigns medoids by taking closest medoid to mean of each cluster.
template<class T , class D >
double	xclara (const std::vector< T > &objects, D dmetric, size_t max_k, size_t dimensionality)
	K-Agnostic version of CLARA.
void	set_init_size (size_t sz)
void	set_max_reps (size_t r)
void	set_xcallback (void(*)(const partition &part, double bic))
	Set callback function for XPAM and XCLARA. default is none.
Protected Types
typedef boost::mt19937	random_type
typedef boost::random_number_generator < random_type, unsigned long >	rng_type
	Randomness source for this algorithm.
Protected Member Functions
void	init_medoids (size_t k, const dissimilarity_matrix &distance)
	KR BUILD algorithm for assigning initial medoids to a partition.
double	cost (medoid_id i, object_id h, const dissimilarity_matrix &distance) const
	Total cost of swapping object h with medoid i.
template<class DM >
double	assign_objects_to_clusters (DM distance)
	Assign each object to the cluster with the closest medoid.
Protected Attributes
random_type	random
	Type for RNG used in this algorithm.
rng_type	rng
std::vector< medoid_id >	sec_nearest
double	total_dissimilarity
	Index of second closest medoids. Used by PAM.
bool	sort_medoids
	Total dissimilarity bt/w objects and their medoid.
double	epsilon
	Whether medoids should be canonically sorted by object id.
size_t	init_size
	Normalized sensitivity for convergence.
size_t	max_reps
	initial sample size (before 2*k)
void(*	xcallback )(const partition &part, double bic)
	initial sample size (before 2*k)

Detailed Description

Implementations of the classic clustering algorithms PAM and CLARA, from Finding Groups in Data, by Kaufman and Rousseeuw.

Definition at line 60 of file kmedoids.h.

Member Typedef Documentation

typedef boost::mt19937 random_type [protected]

Definition at line 262 of file kmedoids.h.

typedef boost::random_number_generator<random_type, unsigned long> rng_type [protected]

Randomness source for this algorithm.

Adaptor for STL algorithms.

Definition at line 266 of file kmedoids.h.

Constructor & Destructor Documentation

kmedoids ( size_t num_objects = 0 )

Constructor.

Can optionally specify number of objects to be clustered. and this will start out with all of them in one cluster.

The random number generator associated with this kmedoids object is seeded with the time in microseconds since the epoch.

Definition at line 55 of file kmedoids.cpp.

~kmedoids ( )

Definition at line 68 of file kmedoids.cpp.

Member Function Documentation

double assign_objects_to_clusters ( DM distance ) [inline, protected]

Assign each object to the cluster with the closest medoid.

Returns:: Total dissimilarity of objects w/their medoids.

Parameters:

distance a callable object that computes distances between indices, as a distance matrix would. Algorithms are free to use real distance matrices (as in PAM) or to compute lazily (as in CLARA medoid assignment).

Definition at line 296 of file kmedoids.h.

double average_dissimilarity ( ) const

Get the average dissimilarity of objects w/their medoids for the last run.

Definition at line 70 of file kmedoids.cpp.

void center_medoids	(	const std::vector< T > &	objects,
		D	distance
	)		`[inline]`

Takes existing clustering and reassigns medoids by taking closest medoid to mean of each cluster.

This is O(n) and can give better representatives for CLARA clusterings. This is needed to apply gaussian-model BIC criteria to clusterings.

Template Parameters:

T	To use this function, T needs to support algebraic operations. Specifically, T must support enough to construct a mean: addition `T + T = T` scalar division `T / c = T`

Definition at line 195 of file kmedoids.h.

void clara	(	const std::vector< T > &	objects,
		D	dmetric,
		size_t	k
	)		`[inline]`

CLARA clustering algorithm, as per Kaufman and Rousseuw and R.

Ng and J. Han, "Efficient and Effective Clustering Methods for Spatial Data Mining."

Template Parameters:

T	Type of objects to be clustered.
D	Dissimilarity metric type. D should be callable on (T, T) and should return a double.

Parameters:

objects	Objects to cluster
dmetric	Distance metric to build dissimilarity matrices with
k	Number of clusters to partition

Definition at line 130 of file kmedoids.h.

double cost	(	medoid_id	i,
		object_id	h,
		const dissimilarity_matrix &	distance
	)		const `[protected]`

Total cost of swapping object h with medoid i.

Sums costs of this exchagne for all objects j.

Definition at line 131 of file kmedoids.cpp.

void init_medoids	(	size_t	k,
		const dissimilarity_matrix &	distance
	)		`[protected]`

KR BUILD algorithm for assigning initial medoids to a partition.

Definition at line 86 of file kmedoids.cpp.

void pam	(	const dissimilarity_matrix &	distance,
		size_t	k,
		const object_id *	initial_medoids = `NULL`
	)

Classic K-Medoids clustering, using the Partitioning-Around-Medoids (PAM) algorithm as described in Kaufman and Rousseeuw.

Parameters:

distance	dissimilarity matrix for all objects to cluster
k	number of clusters to produce
initial_medoids	Optionally supply k initial object ids to be used as initial medoids.

See also:: build_dissimilarity_matrix(), a function to automatically construct a dissimilarity matrix given a vector of objects and a distance function.

Definition at line 157 of file kmedoids.cpp.

void set_epsilon ( double epsilon )

Set tolerance for convergence.

This is relative error, not absolute error. It will be multiplied by the mean distance before it is used to test convergence. Defaults to 1e-15; may need to be higher if there exist clusterings with very similar quality.

Definition at line 78 of file kmedoids.cpp.

void set_init_size ( size_t sz ) [inline]

Definition at line 254 of file kmedoids.h.

void set_max_reps ( size_t r ) [inline]

Definition at line 255 of file kmedoids.h.

void set_sort_medoids ( bool sort_medoids )

Set whether medoids will be sorted by object id after clustering is complete.

Defaults to true.

Definition at line 74 of file kmedoids.cpp.

void set_xcallback ( void(*)(const partition &part, double bic) xpc )

Set callback function for XPAM and XCLARA. default is none.

Definition at line 82 of file kmedoids.cpp.

double xclara	(	const std::vector< T > &	objects,
		D	dmetric,
		size_t	max_k,
		size_t	dimensionality
	)		`[inline]`

K-Agnostic version of CLARA.

This uses the BIC criterion as described in bic.h to run clara() a number of times and to select a best run of clara() from the trials. This will be slower than regular clara(). In particular, it's O(n*max_k).

Parameters:

[in]	objects	Objects to cluster
[in]	dmetric	Distance metric to build dissimilarity matrices with
[in]	max_k	Max number of clusters to find.
[in]	dimensionality	Dimensionality of objects, used by BIC.

Definition at line 235 of file kmedoids.h.

double xpam	(	const dissimilarity_matrix &	distance,
		size_t	max_k,
		size_t	dimensionality
	)

Classic K-Medoids clustering, using the Partitioning-Around-Medoids (PAM) algorithm as described in Kaufman and Rousseeuw.

Runs PAM from 1 to max_k and selects the best k using the bayesian information criterion. Sets this partition to the best partition found using PAM from 1 to k.

Based on X-Means, see Pelleg & Moore, 2000.

Parameters:

distance	dissimilarity matrix for all objects to cluster
max_k	Upper limit on number of clusters to find.
dimensionality	Number of dimensions in clustered data, for BIC.

Returns:: the best BIC value found (the bic value of the final partitioning).

See also:: build_dissimilarity_matrix(), a function to automatically construct a dissimilarity matrix given a vector of objects and a distance function.

Definition at line 218 of file kmedoids.cpp.

Member Data Documentation

double epsilon [protected]

Whether medoids should be canonically sorted by object id.

Definition at line 272 of file kmedoids.h.

size_t init_size [protected]

Normalized sensitivity for convergence.

Definition at line 273 of file kmedoids.h.

size_t max_reps [protected]

initial sample size (before 2*k)

Definition at line 274 of file kmedoids.h.

random_type random [protected]

Type for RNG used in this algorithm.

Definition at line 263 of file kmedoids.h.

rng_type rng [protected]

Definition at line 267 of file kmedoids.h.

std::vector<medoid_id> sec_nearest [protected]

Definition at line 269 of file kmedoids.h.

bool sort_medoids [protected]

Total dissimilarity bt/w objects and their medoid.

Definition at line 271 of file kmedoids.h.

double total_dissimilarity [protected]

Index of second closest medoids. Used by PAM.

Definition at line 270 of file kmedoids.h.

void(* xcallback)(const partition &part, double bic) [protected]

initial sample size (before 2*k)

Callback for each iteration of xpam. is called with the current clustering and its BIC score.

Definition at line 278 of file kmedoids.h.

The documentation for this class was generated from the following files:

src/kmedoids.h
src/kmedoids.cpp

kmedoids Class Reference

Public Member Functions

Protected Types

Protected Member Functions

Protected Attributes

Detailed Description

Member Typedef Documentation

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation