当前位置:   article > 正文

python 编写聚类指标purity纯度和jaccard函数_jaccard系数 python实现评价聚类结果

jaccard系数 python实现评价聚类结果

自编purity纯度和jaccard函数,最后运算速度都挺快的,另外,似乎用scipy中的混淆矩阵也可以编写scipy,而且要比我写的jaccard简便一些,可能是把我写的一些封装了吧。

from sklearn import datasets
from sklearn.utils.linear_assignment_ import linear_assignment
import seaborn as sns
import matplotlib.pyplot as plt
import copy
from sklearn.metrics import confusion_matrix
from sklearn import metrics
from sklearn.cluster import KMeans
import pandas as pd
import numpy as np
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

纯度

def purity(cluster, label):
    cluster = np.array(cluster)
    label = np. array(label)
    indedata1 = {}
    for p in np.unique(label):
        indedata1[p] = np.argwhere(label == p)
    indedata2 = {}
    for q in np.unique(cluster):
        indedata2[q] = np.argwhere(cluster == q)

    count_all = []
    for i in indedata1.values():
        count = []
        for j in indedata2.values():
            a = np.intersect1d(i, j).shape[0]
            count.append(a)
        count_all.append(count)

    return sum(np.max(count_all, axis=0))/len(cluster)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19

jaccard

def jaccard(cluster, label):
    dist_cluster = np.abs(np.tile(cluster, (len(cluster), 1)) -
                          np.tile(cluster, (len(cluster), 1)).T)
    dist_label = np.abs(np.tile(label, (len(label), 1)) -
                        np.tile(label, (len(label), 1)).T)
    a_loc = np.argwhere(dist_cluster+dist_label == 0)
    n = len(cluster)
    a = (a_loc.shape[0]-n)/2
    same_cluster_index = np.argwhere(dist_cluster == 0)
    same_label_index = np.argwhere(dist_label == 0)
    bc = same_cluster_index.shape[0]+same_label_index.shape[0]-2*n-2*a
    return a/(a+bc)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/IT小白/article/detail/241543
推荐阅读
相关标签
  

闽ICP备14008679号