python 编写聚类指标purity纯度和jaccard函数_jaccard系数 python实现评价聚类结果

作者：IT小白 | 2024-03-15 14:07:09

踩

jaccard系数 python实现评价聚类结果

自编purity纯度和jaccard函数，最后运算速度都挺快的，另外，似乎用scipy中的混淆矩阵也可以编写scipy，而且要比我写的jaccard简便一些，可能是把我写的一些封装了吧。

from sklearn import datasets
from sklearn.utils.linear_assignment_ import linear_assignment
import seaborn as sns
import matplotlib.pyplot as plt
import copy
from sklearn.metrics import confusion_matrix
from sklearn import metrics
from sklearn.cluster import KMeans
import pandas as pd
import numpy as np
1
2
3
4
5
6
7
8
9
10

纯度

def purity(cluster, label):
    cluster = np.array(cluster)
    label = np. array(label)
    indedata1 = {}
    for p in np.unique(label):
        indedata1[p] = np.argwhere(label == p)
    indedata2 = {}
    for q in np.unique(cluster):
        indedata2[q] = np.argwhere(cluster == q)

    count_all = []
    for i in indedata1.values():
        count = []
        for j in indedata2.values():
            a = np.intersect1d(i, j).shape[0]
            count.append(a)
        count_all.append(count)

    return sum(np.max(count_all, axis=0))/len(cluster)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

jaccard

def jaccard(cluster, label):
    dist_cluster = np.abs(np.tile(cluster, (len(cluster), 1)) -
                          np.tile(cluster, (len(cluster), 1)).T)
    dist_label = np.abs(np.tile(label, (len(label), 1)) -
                        np.tile(label, (len(label), 1)).T)
    a_loc = np.argwhere(dist_cluster+dist_label == 0)
    n = len(cluster)
    a = (a_loc.shape[0]-n)/2
    same_cluster_index = np.argwhere(dist_cluster == 0)
    same_label_index = np.argwhere(dist_label == 0)
    bc = same_cluster_index.shape[0]+same_label_index.shape[0]-2*n-2*a
    return a/(a+bc)
1
2
3
4
5
6
7
8
9
10
11
12

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/IT小白/article/detail/241543