Functions

src/clustering.c File Reference

#include "clustering.h"
#include "linalg.h"
#include <time.h>
#include "eeg.h"

Go to the source code of this file.

Functions

Dendrogram * agglomerative_clustering (const Array *distmat, LinkageFunction distfct)
 Agglomerative clustering.
double cluster_between_scatter (const Array *distmat, const Clusters *c)
 get within-scatter between clusters. (sum of distances of means of clusters)
int cluster_compare (const Clusters *c1, const Clusters *c2)
void cluster_copy (Clusters *dest, const Clusters *src)
void cluster_free (Clusters *c)
Clusterscluster_init (int K, int maxN)
void cluster_print (FILE *o, const Clusters *c)
double cluster_within_scatter (const Array *distmat, const Clusters *c)
 get within-scatter within clusters. compute$ W_k = \sum_{r=1}^{k}\frac{1}{2n_r}D_r$ where $ D_r = \sum_{i,i'\in C_r}d_{ii'} $ for a given cluster assignment.
double dgram_dist_completelinkage (const Array *d, const Dendrogram *c1, const Dendrogram *c2)
 complete linkage clustering.

\[ d_{CL}(G, H) = \max_{i\in G, j\in H} d_{ij} \]


double dgram_dist_singlelinkage (const Array *d, const Dendrogram *c1, const Dendrogram *c2)
 single linkage clustering.

\[ d_{SL}(G, H) = \min_{i\in G, j\in H} d_{ij} \]


void dgram_free (Dendrogram *t)
Dendrogram * dgram_get_deepest (Dendrogram *c)
Dendrogram * dgram_init (int val, Dendrogram *left, Dendrogram *right)
int dgram_num_leaves (const Dendrogram *t)
 return the number of leaves in the Dendrogram.
void dgram_preorder (const Dendrogram *t, int *vals, int *n)
 preorder traversal of Dendrogram.
void dgram_print (Dendrogram *t)
void dgram_print_node (Dendrogram *t)
Arraydgram_to_matlab (const Dendrogram *dgram)
 convert a Dendrogram to a N-1 x 3 array as used by MATLAB.
Clusterskmedoids (const Array *distmat, int K, OptArgList *optargs)
Clusterskmedoids_repeat (const Array *distmat, int K, int repeat)

Function Documentation

Dendrogram* agglomerative_clustering ( const Array distmat,
LinkageFunction  distfct 
)

Agglomerative clustering.

The function was tested against MATLAB's hierarchical cluster-analysis.

Parameters:
distmat distance matrix for the N objects
distfct function giving the between sub-cluster distance; defined are dgram_dist_singlelinkage(), dgram_dist_completelinkage(), dgram_dist_averagelinkage()
Returns:
Dendrogram

Definition at line 640 of file clustering.c.

double cluster_between_scatter ( const Array distmat,
const Clusters c 
)

get within-scatter between clusters. (sum of distances of means of clusters)

Parameters:
d distance matrix (NxN)
c the partioning of d
Returns:
the between-scatter

Definition at line 604 of file clustering.c.

int cluster_compare ( const Clusters c1,
const Clusters c2 
)

Compare whether two cluster structs are "identical", meaning that each cluster contains the same arguments (sequence does not matter). Superficial tests are first carried out, then intense comparison.

Definition at line 193 of file clustering.c.

void cluster_copy ( Clusters dest,
const Clusters src 
)

assumes that dest is already allocated

Definition at line 242 of file clustering.c.

void cluster_free ( Clusters c  ) 

Definition at line 255 of file clustering.c.

Clusters* cluster_init ( int  K,
int  maxN 
)

Definition at line 264 of file clustering.c.

void cluster_print ( FILE *  o,
const Clusters c 
)

Definition at line 280 of file clustering.c.

double cluster_within_scatter ( const Array distmat,
const Clusters c 
)

get within-scatter within clusters. compute$ W_k = \sum_{r=1}^{k}\frac{1}{2n_r}D_r$ where $ D_r = \sum_{i,i'\in C_r}d_{ii'} $ for a given cluster assignment.

Parameters:
d - distance matrix (NxN)
c - cluster-assigment (Clusters - struct)
Returns:
the within-scatter

Definition at line 567 of file clustering.c.

double dgram_dist_completelinkage ( const Array d,
const Dendrogram *  c1,
const Dendrogram *  c2 
)

complete linkage clustering.

\[ d_{CL}(G, H) = \max_{i\in G, j\in H} d_{ij} \]

Definition at line 879 of file clustering.c.

double dgram_dist_singlelinkage ( const Array d,
const Dendrogram *  c1,
const Dendrogram *  c2 
)

single linkage clustering.

\[ d_{SL}(G, H) = \min_{i\in G, j\in H} d_{ij} \]

Definition at line 833 of file clustering.c.

void dgram_free ( Dendrogram *  t  ) 

frees the complete Dendrogram referred to by t (all children)

Definition at line 768 of file clustering.c.

Dendrogram* dgram_get_deepest ( Dendrogram *  c  ) 
Returns:
a pointer to the deepest non-leaf node in the Dgram c (left and right tree are non-NULL)

Definition at line 956 of file clustering.c.

Dendrogram* dgram_init ( int  val,
Dendrogram *  left,
Dendrogram *  right 
)

Allocate memory for a single Dendrogram Node and set its value to val. left and right can be set to NULL.

Definition at line 754 of file clustering.c.

int dgram_num_leaves ( const Dendrogram *  t  ) 

return the number of leaves in the Dendrogram.

Definition at line 996 of file clustering.c.

void dgram_preorder ( const Dendrogram *  t,
int *  vals,
int *  n 
)

preorder traversal of Dendrogram.

Store all non-negative elements in val and return the number of these elements in n[0];

Parameters:
t the tree
vals output
n output (number of elements in vals)

Definition at line 810 of file clustering.c.

void dgram_print ( Dendrogram *  t  ) 

Definition at line 989 of file clustering.c.

void dgram_print_node ( Dendrogram *  t  ) 

Definition at line 1008 of file clustering.c.

Array* dgram_to_matlab ( const Dendrogram *  dgram  ) 

convert a Dendrogram to a N-1 x 3 array as used by MATLAB.

This can be used for conveniently plotting a dendrogram in MATLAB.

From the MATLAB-manual about the output format: (http://www.mathworks.com/access/helpdesk/help/toolbox/stats/linkage.html)

	 The output, Z, is an (m-1)-by-3 matrix containing cluster tree
	 information. The leaf nodes in the cluster hierarchy are the objects
	 in the original data set, numbered from 1 to m. They are the singleton
	 clusters from which all higher clusters are built. Each newly formed
	 cluster, corresponding to row i in Z, is assigned the index m+i, where
	 m is the total number of initial leaves.
	 
	 Columns 1 and 2, Z(i,1:2), contain the indices of the objects that
	 were linked in pairs to form a new cluster. This new cluster is
	 assigned the index value m+i. There are m-1 higher clusters that
	 correspond to the interior nodes of the hierarchical cluster tree.
	 
	 Column 3, Z(i,3), contains the corresponding linkage distances between
	 the objects paired in the clusters at each row i.
	 
	 For example, consider a case with 30 initial nodes. If the tenth
	 cluster formed by the linkage function combines object 5 and object 7
	 and their distance is 1.5, then row 10 of Z will contain the values
	 (5, 7, 1.5). This newly formed cluster will have the index
	 10+30=40. If cluster 40 shows up in a later row, that means this newly
	 formed cluster is being combined again into some bigger cluster.
	 
Parameters:
dgram a Dendrogram
N number of terminal
Returns:
a N-1 x 3 Array in the format described above

Definition at line 740 of file clustering.c.

Clusters* kmedoids ( const Array distmat,
int  K,
OptArgList optargs 
)

do K-Medoids clustering on the distance-matrix.

Parameters:
dist ance matrix
K number of medoids to compute
optargs may contain:

  • seed=int seed for initializing the cluster-configuration (if 0, use time(NULL))
Returns:
clusters - function-allocated memory. contains K cluster-structs.

Definition at line 79 of file clustering.c.

Clusters* kmedoids_repeat ( const Array distmat,
int  K,
int  repeat 
)

run the kmedoids function a couple of times and pick the best in terms of within scatter.

Parameters:
dist ance matrix
K number of medoids to compute
repeat number of repetitions
Returns:
clusters - function-allocated memory. contains K cluster-structs.

Definition at line 35 of file clustering.c.