python – MATLAB矩阵乘法性能比NumPy快5倍_python

概述我在MATLAB& amp;中设置了两个相同的测试.关于矩阵乘法与广播的 Python.对于 Python,我使用了NumPy,对于MATLAB,我使用了使用BLAS的 mtimesx库. MATLAB close all; clear;N = 1000 + 100; % a few initial runs to be trimmed off at the enda = 100;b = 我在MATLAB& amp;中设置了两个相同的测试.关于矩阵乘法与广播的 Python.对于 Python,我使用了NumPy,对于MATLAB,我使用了使用BLAS的 mtimesx库.

MATLAB

close all; clear;N = 1000 + 100; % a few initial runs to be trimmed off at the enda = 100;b = 30;c = 40;d = 50;A = rand(b,c,a);B = rand(c,d,a);C = zeros(b,a);times = zeros(1,N);for ii = 1:N    tic    C = mtimesx(A,B);    times(ii) = toc;endtimes = times(101:end) * 1e3;plot(times);grID on;Title(median(times));

Python

import timeitimport numpy as npimport matplotlib.pyplot as pltN = 1000 + 100  # a few initial runs to be trimmed off at the enda = 100b = 30c = 40d = 50A = np.arange(a * b * c).reshape([a,b,c])B = np.arange(a * c * d).reshape([a,d])C = np.empty(a * b * d).reshape([a,d])times = np.empty(N)for i in range(N):    start = timeit.default_timer()    C = A @ B    times[i] = timeit.default_timer() - starttimes = times[101:] * 1e3plt.plot(times,linewidth=0.5)plt.grID()plt.Title(np.median(times))plt.show()

>我的Python是使用OpenBLAS从pip安装的默认Python.
>我正在使用英特尔NUC i3.

MATLAB代码在1ms运行,而Python在5.8ms运行,我无法弄清楚为什么,因为它们似乎都在使用BLAS.

编辑

来自Anaconda：

In [7]: np.__config__.show()mkl_info:    librarIEs = ['mkl_rt']    library_dirs = [...]    define_macros = [('SCIPY_MKL_H',None),('HAVE_CBLAS',None)]    include_dirs = [...]blas_mkl_info:    librarIEs = ['mkl_rt']    library_dirs = [...]    define_macros = [('SCIPY_MKL_H',None)]    include_dirs = [...]blas_opt_info:    librarIEs = ['mkl_rt']    library_dirs = [...]    define_macros = [('SCIPY_MKL_H',None)]    include_dirs = [...]lapack_mkl_info:    librarIEs = ['mkl_rt']    library_dirs = [...]    define_macros = [('SCIPY_MKL_H',None)]    include_dirs = [...]lapack_opt_info:    librarIEs = ['mkl_rt']    library_dirs = [...]    define_macros = [('SCIPY_MKL_H',None)]    include_dirs = [...]

从numpy使用pip

In [2]: np.__config__.show()blas_mkl_info:NOT AVAILABLEblis_info:NOT AVAILABLEopenblas_info:    library_dirs = [...]    librarIEs = ['openblas']    language = f77    define_macros = [('HAVE_CBLAS',None)]blas_opt_info:    library_dirs = [...]    librarIEs = ['openblas']    language = f77    define_macros = [('HAVE_CBLAS',None)]lapack_mkl_info:NOT AVAILABLEopenblas_lapack_info:    library_dirs = [...]    librarIEs = ['openblas']    language = f77    define_macros = [('HAVE_CBLAS',None)]lapack_opt_info:    library_dirs = [...]    librarIEs = ['openblas']    language = f77    define_macros = [('HAVE_CBLAS',None)]

编辑2
我试图用np.matmul(A,B,out = C)代替C = A @ B,并且时间缩短2倍,例如大约11ms.这真的很奇怪.

解决方法您的MATLAB代码使用浮点数组,但NumPy代码使用整数数组.这使得时间有显着差异.对于MATLAB和NumPy之间的“苹果到苹果”比较,Python / NumPy代码也必须使用浮点数组.

然而,这不是唯一重要的问题.在NumPy github网站上的issue 7569(以及issue 8957年)中讨论的NumPy存在缺陷. “堆叠”数组的矩阵乘法不使用快速BLAS例程来执行乘法.这意味着具有两个以上维度的数组的乘法可能比预期慢得多.

2-d数组的乘法确实使用快速例程,因此您可以通过在循环中将各个2-d数组相乘来解决此问题.令人惊讶的是,尽管有Python循环的开销,但在很多情况下,它比@,matmul或einsum更快地应用于完整堆叠阵列.

这是NumPy问题中显示的函数的变体,它在Python循环中执行矩阵乘法：

def xmul(A,B):    """    Multiply stacked matrices A (with shape (s,m,n)) by stacked    matrices B (with shape (s,n,p)) to produce an array with    shape (s,p).    Mathematically equivalent to A @ B,but faster in many cases.    The arguments are not valIDated.  The code assumes that A and B    are numpy arrays with the same data type and with shapes described    above.    """    out = np.empty((a.shape[0],a.shape[1],b.shape[2]),dtype=a.dtype)    for j in range(a.shape[0]):        np.matmul(a[j],b[j],out=out[j])    return out

我的NumPy安装也使用MKL(它是Anaconda发行版的一部分).下面是A @ B和xmul(A,B)的时序比较,使用浮点值数组：

In [204]: A = np.random.rand(100,30,40)In [205]: B = np.random.rand(100,40,50)In [206]: %timeit A @ B4.76 ms ± 6.37 µs per loop (mean ± std. dev. of 7 runs,100 loops each)In [207]: %timeit xmul(A,B)582 µs ± 35.9 µs per loop (mean ± std. dev. of 7 runs,1000 loops each)

即使xmul使用Python循环,它也只需要A @ B的1/8.

总结

以上是内存溢出为你收集整理的python – MATLAB矩阵乘法性能比NumPy快5倍全部内容，希望文章能够帮你解决python – MATLAB矩阵乘法性能比NumPy快5倍所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错，欢迎将内存溢出网站推荐给程序员好友。

欢迎分享，转载请注明来源：内存溢出

原文地址:https://54852.com/langs/1192140.html

python – MATLAB矩阵乘法性能比NumPy快5倍

发表评论

评论列表（0条）