计算每行出现在numpy.array中的次数_随笔

计算每行出现在numpy.array中的次数

您可以使用另一个问题的答案来获得唯一项目的计数。

使用结构化数组的另一种选择是使用一种void类型的视图，该视图将整行连接到单个项目中：

a = np.array([[1, 1, 1, 0, 0, 0],   [0, 1, 1, 1, 0, 0],   [0, 1, 1, 1, 0, 0],   [1, 1, 1, 0, 0, 0],   [1, 1, 1, 1, 1, 0]])b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))_, idx = np.unique(b, return_index=True)unique_a = a[idx]>>> unique_aarray([[0, 1, 1, 1, 0, 0],       [1, 1, 1, 0, 0, 0],       [1, 1, 1, 1, 1, 0]])

编辑添加了np.ascontiguousarray以下@seberg的建议。如果数组不是连续的，这会使方法变慢。

编辑可以通过执行以下 *** 作来稍微加快上述速度，也许是以清楚为代价的：

unique_a = np.unique(b).view(a.dtype).reshape(-1, a.shape[1])

另外，至少在我的系统上，性能方面与lexsort方法相当，甚至更好：

a = np.random.randint(2, size=(10000, 6))%timeit np.unique(a.view(np.dtype((np.void, a.dtype.itemsize*a.shape[1])))).view(a.dtype).reshape(-1, a.shape[1])100 loops, best of 3: 3.17 ms per loop%timeit ind = np.lexsort(a.T); a[np.concatenate(([True],np.any(a[ind[1:]]!=a[ind[:-1]],axis=1)))]100 loops, best of 3: 5.93 ms per loopa = np.random.randint(2, size=(10000, 100))%timeit np.unique(a.view(np.dtype((np.void, a.dtype.itemsize*a.shape[1])))).view(a.dtype).reshape(-1, a.shape[1])10 loops, best of 3: 29.9 ms per loop%timeit ind = np.lexsort(a.T); a[np.concatenate(([True],np.any(a[ind[1:]]!=a[ind[:-1]],axis=1)))]10 loops, best of 3: 116 ms per loop

在numpy 1.9中，有一个

return_counts

可选的关键字参数，因此您可以简单地执行以下 *** 作：

>>> my_arrayarray([[1, 2, 0, 1, 1, 1],       [1, 2, 0, 1, 1, 1],       [9, 7, 5, 3, 2, 1],       [1, 1, 1, 0, 0, 0],       [1, 2, 0, 1, 1, 1],       [1, 1, 1, 1, 1, 0]])>>> dt = np.dtype((np.void, my_array.dtype.itemsize * my_array.shape[1]))>>> b = np.ascontiguousarray(my_array).view(dt)>>> unq, cnt = np.unique(b, return_counts=True)>>> unq = unq.view(my_array.dtype).reshape(-1, my_array.shape[1])>>> unqarray([[1, 1, 1, 0, 0, 0],       [1, 1, 1, 1, 1, 0],       [1, 2, 0, 1, 1, 1],       [9, 7, 5, 3, 2, 1]])>>> cntarray([1, 1, 3, 1])

在早期版本中，您可以按照以下方式进行 *** 作：

>>> unq, _ = np.unique(b, return_inverse=True)>>> cnt = np.bincount(_)>>> unq = unq.view(my_array.dtype).reshape(-1, my_array.shape[1])>>> unqarray([[1, 1, 1, 0, 0, 0],       [1, 1, 1, 1, 1, 0],       [1, 2, 0, 1, 1, 1],       [9, 7, 5, 3, 2, 1]])>>> cntarray([1, 1, 3, 1])

欢迎分享，转载请注明来源：内存溢出

原文地址:https://54852.com/zaji/5642723.html

计算每行出现在numpy.array中的次数

发表评论

评论列表（0条）