技术干货 | 基于 MindSpore 实现图像分割之豪斯多夫距离_图像分割surface distance

作者：我家自动化 | 2024-07-06 20:55:22

踩

图像分割surface distance

今天带来的内容是Hausdorff distance 豪斯多夫距离的原理介绍及MindSpore的实现代码。

当我们评价图像分割的质量和模型表现时，经常会用到各类表面距离的计算。比如：

· Mean surface distance 平均表面距离

· Hausdorff distance 豪斯多夫距离（也被称为max_surface_distance 最大表面距离MSD）

· Surface overlap 表面重叠度

· Surface dice 表面dice值

· Volumetric dice 三维dice值

等等，都可以称为表面距离系列了。今天简单的讲解一下Hausdorff distance 豪斯多夫距离。

Hausdorff distance介绍

关于这个Hausdorff Distance 豪斯多夫距离的计算，网上资料真的好多好多，但感觉有些乱糟糟的，几乎都是互相抄袭，讲的也不是很清楚，更有甚者很多人介绍的是错的。这里我按照我的思路来讲解一下（当然我这也是采百家之长，进行一个总结，好吧，其实也是借鉴）。

之前的文章已经讲过Dice系数了（基于MindSpore高效完成图像分割，实现Dice！），Dice对mask的内部填充比较敏感，而Hausdorff distance 对分割出的边界比较敏感，所以主要是用Hausdorff distance使用在图像分割任务中。

Hausdorff distance原理及公式

Hausdorff distance是描述两组点集之间相似程度的一种量度，它是两个点集之间距离的一种定义形式：假设有两组集合A={a1,…,ap},B={b1,…,bq},则这两个点集合之间的Hausdorff distance定义为：

这里,

· 式(1)称为双向Hausdorff distance，是Hausdorff distance的最基本形式；

· 式(2)中的h(A，B)和h(B，A)分别称为从A集合到B集合和从B集合到A集合的单向Hausdorff距离。即h(A,B)实际上首先对点集A中的每个点ai到距离此点ai最近的B集合中点bj之间的距离‖ai-bj‖进行排序，然后取该距离中的最大值作为h(A,B)的值；

· h(B,A)同理可得。

· 由式(1)知，双向Hausdorff距离H(A，B)是单向距离h(A，B)和h(B,A)两者中的较大者，它度量了两个点集间的最大不匹配程度。

图示及计算流程总结

刚说了那么多，是不是也不是很清楚，只看公式确实是一件不好玩的事情，那我用网上常用的图来说明一下，还有一个比较简单清晰的计算流程。给定两个点集合A{ a0, a1, … }和B{ b0, b1, b2, …}

· 取A集合中的一点a0，计算a0到B集合中所有点的距离，保留最短的距离d0

· 遍历A集合中所有点，图中一共两点a0和a1，计算出d0和d1

· 比较所有的距离{ d0, d1 }，选出最长的距离d1

· 这个最长的距离就是h，它是A→B的单向豪斯多夫距离，记为h( A, B )

· 对于A集合中任意一点a，我们可以确定，以点a为圆心，h为半径的圆内部必有B集合中的

· 交换A集合和B集合的角色，计算B→A的单向豪斯多夫距离h( B, A )，选出h( A, B )和h( B, A )中最长的距离，就是A,B集合的双向豪斯多夫距离

看完公式，配合着图和总结的计算流程，是不是就请出许多了，哈哈哈哈，反正我是这样看懂的。

MindSpore 代码实现

好了，原理已经讲完，话不多说，我们开始上代码。使用的是MindSpore框架实现的代码。


"""HausdorffDistance."""
 
from collections import abc
from abc import ABCMeta
from scipy.ndimage import morphology
import numpy as np
from mindspore.common.tensor import Tensor
from mindspore._checkparam import Validator as validator
from .metric import Metric
 
 
class _ROISpatialData(metaclass=ABCMeta):
    # 产生感兴趣区域（ROI）。支持裁剪和空间数据。应提供空间的中心和大小，如果没有，则必须提供ROI的起点和终点坐标。
    def __init__(self, roi_center=None, roi_size=None, roi_start=None, roi_end=None):
 
        if roi_center is not None and roi_size is not None:
            roi_center = np.asarray(roi_center, dtype=np.int16)
            roi_size = np.asarray(roi_size, dtype=np.int16)
            self.roi_start = np.maximum(roi_center - np.floor_divide(roi_size, 2), 0)
            self.roi_end = np.maximum(self.roi_start + roi_size, self.roi_start)
        else:
            if roi_start is None or roi_end is None:
                raise ValueError("Please provide the center coordinates, size or start coordinates and end coordinates"
                                 " of ROI.")
            # ROI起始坐标
            self.roi_start = np.maximum(np.asarray(roi_start, dtype=np.int16), 0)
            # ROI终点坐标
            self.roi_end = np.maximum(np.asarray(roi_end, dtype=np.int16), self.roi_start)
 
    def __call__(self, data):
        sd = min(len(self.roi_start), len(self.roi_end), len(data.shape[1:]))
        slices = [slice(None)] + [slice(s, e) for s, e in zip(self.roi_start[:sd], self.roi_end[:sd])]
        return data[tuple(slices)]
 
 
class HausdorffDistance(Metric):
    def __init__(self, distance_metric="euclidean", percentile=None, directed=False, crop=True):
        super(HausdorffDistance, self).__init__()
        string_list = ["euclidean", "chessboard", "taxicab"]
        distance_metric = validator.check_value_type("distance_metric", distance_metric, [str])
        # 计算Hausdorff距离的参数，支持欧氏、"chessboard"、"taxicab"三种测量方法。
        self.distance_metric = validator.check_string(distance_metric, string_list, "distance_metric")
        # 表示最大距离分位数，取值范围为0-100，它表示的是计算步骤4中，选取的距离能覆盖距离的百分比
        self.percentile = percentile if percentile is None else validator.check_value_type("percentile",
        # 可分为定向和非定向Hausdorff距离，默认为非定向Hausdorff距离，指定percentile参数得到Hausdorff距离的百分位数。percentile, [float])
        self.directed = directed if directed is None else validator.check_value_type("directed", directed, [bool])
        self.crop = crop if crop is None else validator.check_value_type("crop", crop, [bool])
        self.clear()
 
    def _is_tuple_rep(self, tup, dim):
        # 通过缩短或重复输入返回包含dim值的元组。
        result = None
        if not self._is_iterable_sequence(tup):
            result = (tup,) * dim
        elif len(tup) == dim:
            result = tuple(tup)
 
        if result is None:
            raise ValueError(f"Sequence must have length {dim}, but got {len(tup)}.")
 
        return result
 
    def _is_tuple(self, inputs):
        # 判断是否是一个元组
        if not self._is_iterable_sequence(inputs):
            inputs = (inputs,)
 
        return tuple(inputs)
 
    def _is_iterable_sequence(self, inputs):
        if isinstance(inputs, Tensor):
            return int(inputs.dim()) > 0
        return isinstance(inputs, abc.Iterable) and not isinstance(inputs, str)
 
    def _create_space_bounding_box(self, image, func=lambda x: x > 0, channel_indices=None, margin=0):
        data = image[[*(self._is_tuple(channel_indices))]] if channel_indices is not None else image
        data = np.any(func(data), axis=0)
        nonzero_idx = np.nonzero(data)
        margin = self._is_tuple_rep(margin, data.ndim)
 
        box_start = list()
        box_end = list()
        for i in range(data.ndim):
            if nonzero_idx[i].size <= 0:
                raise ValueError("did not find nonzero index at the spatial dim {}".format(i))
            box_start.append(max(0, np.min(nonzero_idx[i]) - margin[i]))
            box_end.append(min(data.shape[i], np.max(nonzero_idx[i]) + margin[i] + 1))
        return box_start, box_end
 
    def _calculate_percent_hausdorff_distance(self, y_pred_edges, y_edges):
        
        surface_distance = self._get_surface_distance(y_pred_edges, y_edges)
 
        if surface_distance.shape == (0,):
            return np.inf
 
        if not self.percentile:
            return surface_distance.max()
        # self.percentile表示最大距离分位数，取值范围为0-100，它表示的是计算步骤4中，选取的距离能覆盖距离的百分比
        if 0 <= self.percentile <= 100:
            return np.percentile(surface_distance, self.percentile)
 
        raise ValueError(f"percentile should be a value between 0 and 100, get {self.percentile}.")
 
    def _get_surface_distance(self, y_pred_edges, y_edges):
        # 使用欧式方法求表面距离
        if not np.any(y_pred_edges):
            return np.array([])
 
        if not np.any(y_edges):
            dis = np.inf * np.ones_like(y_edges)
        else:
            if self.distance_metric == "euclidean":
                dis = morphology.distance_transform_edt(~y_edges)
            elif self.distance_metric == "chessboard" or self.distance_metric == "taxicab":
                dis = morphology.distance_transform_cdt(~y_edges, metric=self.distance_metric)
 
        surface_distance = dis[y_pred_edges]
 
        return surface_distance
 
    def clear(self):
        """清除历史数据"""
        self.y_pred_edges = 0
        self.y_edges = 0
        self._is_update = False
 
    def update(self, *inputs):
        """
        更新输入数据
        """
        if len(inputs) != 3:
            raise ValueError('HausdorffDistance need 3 inputs (y_pred, y, label), but got {}'.format(len(inputs)))
        y_pred = self._convert_data(inputs[0])
        y = self._convert_data(inputs[1])
        label_idx = inputs[2]
 
        if y_pred.size == 0 or y_pred.shape != y.shape:
            raise ValueError("Labelfields should have the same shape, but got {}, {}".format(y_pred.shape, y.shape))
 
        y_pred = (y_pred == label_idx) if y_pred.dtype is not bool else y_pred
        y = (y == label_idx) if y.dtype is not bool else y
 
        res1, res2 = None, None
        if self.crop:
            if not np.any(y_pred | y):
                res1 = np.zeros_like(y_pred)
                res2 = np.zeros_like(y)
 
            y_pred, y = np.expand_dims(y_pred, 0), np.expand_dims(y, 0)
            box_start, box_end = self._create_space_bounding_box(y_pred | y)
            cropper = _ROISpatialData(roi_start=box_start, roi_end=box_end)
            y_pred, y = np.squeeze(cropper(y_pred)), np.squeeze(cropper(y))
 
        self.y_pred_edges = morphology.binary_erosion(y_pred) ^ y_pred if res1 is None else res1
        self.y_edges = morphology.binary_erosion(y) ^ y if res2 is None else res2
        self._is_update = True
 
    def eval(self):
        """
        计算定向或者非定向的Hausdorff distance.
        """
        # 要先执行clear操作
        if self._is_update is False:
            raise RuntimeError('Call the update method before calling eval.')
 
        # 计算A到B的距离
        hd = self._calculate_percent_hausdorff_distance(self.y_pred_edges, self.y_edges)
        # 计算定向的豪斯多夫
        if self.directed:
            return hd
        # 计算非定向的豪斯多夫
        hd2 = self._calculate_percent_hausdorff_distance(self.y_edges, self.y_pred_edges)
        # 如果计算的是定向的，那直接返回hd，如果是计算非定向，那hd和hd2谁大返回谁
        return max(hd, hd2)

使用方法如下：


import numpy as np
from mindspore import Tensor
from mindspore.nn.metrics import HausdorffDistance
 
x = Tensor(np.array([[3, 0, 1], [1, 3, 0], [1, 0, 2]]))
y = Tensor(np.array([[0, 2, 1], [1, 2, 1], [0, 0, 1]]))
metric = HausdorffDistance()
metric.clear()
metric.update(x, y, 0)
distance = metric.eval()
print(distance)
 
1.4142135623730951

每个batch（比如两组数据）进行计算的时候如下：


import numpy as np
from mindspore import Tensor
from mindspore.nn.metrics import HausdorffDistance
 
x = Tensor(np.array([[3, 0, 1], [1, 3, 0], [1, 0, 2]]))
y = Tensor(np.array([[0, 2, 1], [1, 2, 1], [0, 0, 1]]))
metric = HausdorffDistance()
metric.clear()
metric.update(x, y, 0)
 
x1 = Tensor(np.array([[3, 0, 1], [1, 3, 0], [1, 0, 2]]))
y1 = Tensor(np.array([[0, 2, 1], [1, 2, 1], [0, 0, 1]]))
metric.update(x1, y1, 0)
 
distance = metric.eval()
print(distance)

self.percentile 参数说明

表示最大距离分位数，取值范围为0-100，它表示的是计算步骤4中，选取的距离能覆盖距离的百分比，一般是选取了95%，那么在计算步骤4中选取的不是最大距离，而是将距离从大到小排列后，取排名为5%的距离。这么做的目的是为了排除一些离群点所造成的不合理的距离，保持整体数值的稳定性。所以Hausdorff distance也被称为Hausdorff distance-95%。

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/我家自动化/article/detail/793955