Hướng dẫn dùng sparse array python

Một ví dụ rất hữu ích và thích hợp là trong sự trợ giúp!

import scipy.sparse as sp
help(sp)

Điều này mang lại:

Example 2
---------

Construct a matrix in COO format:

>>> from scipy import sparse
>>> from numpy import array
>>> I = array([0,3,1,0])
>>> J = array([0,3,1,2])
>>> V = array([4,5,7,9])
>>> A = sparse.coo_matrix((V,(I,J)),shape=(4,4))

Cũng cần lưu ý rằng các hàm tạo khác nhau (một lần nữa từ sự trợ giúp):

    1. csc_matrix: Compressed Sparse Column format
    2. csr_matrix: Compressed Sparse Row format
    3. bsr_matrix: Block Sparse Row format
    4. lil_matrix: List of Lists format
    5. dok_matrix: Dictionary of Keys format
    6. coo_matrix: COOrdinate format (aka IJV, triplet format)
    7. dia_matrix: DIAgonal format

To construct a matrix efficiently, use either lil_matrix (recommended) or
dok_matrix. The lil_matrix class supports basic slicing and fancy
indexing with a similar syntax to NumPy arrays.  

Ví dụ của bạn sẽ đơn giản như sau:

S = sp.csr_matrix(A)

3 hữu ích 2 bình luận chia sẻ

Depending on the indexing, it might be easier to construct the extractor/indexing matrix with the coo style of inputs:

In [129]: from scipy import sparse
In [130]: M = sparse.csr_matrix(np.arange(16).reshape(4,4))
In [131]: M
Out[131]: 
<4x4 sparse matrix of type ''
    with 15 stored elements in Compressed Sparse Row format>
In [132]: M.A
Out[132]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

A square extractor matrix with the desired "diagonal" values:

In [133]: extractor = sparse.csr_matrix(([1,1],([0,3],[0,3])))
In [134]: extractor
Out[134]: 
<4x4 sparse matrix of type ''
    with 2 stored elements in Compressed Sparse Row format>

Matrix multiplication in one direction selects columns:

In [135]: [email protected]
Out[135]: 
<4x4 sparse matrix of type ''
    with 7 stored elements in Compressed Sparse Row format>
In [136]: _.A
Out[136]: 
array([[ 0,  0,  0,  3],
       [ 4,  0,  0,  7],
       [ 8,  0,  0, 11],
       [12,  0,  0, 15]])

and in the other, rows:

In [137]: [email protected]
Out[137]: 
<4x4 sparse matrix of type ''
    with 7 stored elements in Compressed Sparse Row format>
In [138]: _.A
Out[138]: 
array([[ 0,  1,  2,  3],
       [ 0,  0,  0,  0],
       [ 0,  0,  0,  0],
       [12, 13, 14, 15]])
In [139]: extractor.A
Out[139]: 
array([[1, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 1]])

M[[0,3],:] does the same thing, but with:

In [140]: extractor = sparse.csr_matrix(([1,1],([0,1],[0,3])))
In [142]: ([email protected]).A
Out[142]: 
array([[ 0,  1,  2,  3],
       [12, 13, 14, 15]])

Row and column sums are also performed with matrix multiplication:

In [149]: (4,int)
Out[149]: array([ 6, 22, 38, 54])

Sometimes, while working with large sparse matrices in Python, you might want to select certain rows of sparse matrix or certain columns of sparse matrix. As we saw earlier, there are many types of sparse matrices available in SciPy in Python. Each of the sparse matrix type is optimized for specific operations.

We will see examples of slicing a sparse matrix by row and column. Basically, we will create a random sparse matrix and select a subset of rows or columns from sparse matrix using Scipy/NumPy in Python.

Let us load the modules needed.

from scipy import sparse
import numpy as np
from scipy import stats

Let us create a sparse random matrix using SciPy’s sparse module’s random function. Here we generate sparse random matrix of size 5 x 5 containing random numbers from Poisson distribution.

A = sparse.random(5, 5,
                  density=0.5,
                  data_rvs=stats.poisson(10, loc=10).rvs)

We can see the content of the sparse matrix with print statement and todense() function.

print(A.todense())

[[ 0. 18. 23. 19.  0.]
 [ 0. 20. 23.  0. 14.]
 [ 0.  0.  0. 17. 17.]
 [17.  0. 25.  0. 20.]
 [ 0. 22.  0.  0.  0.]]

Let us say we are interested in rows or columns with even indices.

select_ind = np.array([0,2,4])

How to Select Rows from a Sparse Matrix?

We can subset our original sparse matrix using slice operation. The thing to note is that sparse.random function creates sparse matrix in COO format by default. However, COO matrix is not slice operations friendly.

So we first convert the COO sparse matrix to CSR (Compressed Sparse Row format) matrix using tocsr() function. And then we can slice the sparse matrix rows using the row indices array we created.

A.tocsr()[select_ind,:]

<3x5 sparse matrix of type ''
	with 6 stored elements in Compressed Sparse Row format>

We can see that after slicing we get a sparse matrix of size 3×5 in CSR format. To see the contents of the sliced sparse matrix, we can use todense() function. Now we have just three rows instead of five.

A.tocsr()[select_ind,:].todense()

matrix([[ 0., 18., 23., 19.,  0.],
        [ 0.,  0.,  0., 17., 17.],
        [ 0., 22.,  0.,  0.,  0.]])

How to Select Columns from a Sparse Matrix?

We can do the same for slicing columns of a sparse matrix. We will have to first convert to CSR or CSC matrix and then using slice operation for selecting the columns we are interested in.

Let us use tocsr() like before and select the columns with even indices.

A.tocsr()[:,select_ind].todense()

matrix([[ 0., 23.,  0.],
        [ 0., 23., 14.],
        [ 0.,  0., 17.],
        [17., 25., 20.],
        [ 0.,  0.,  0.]])

Another option to slice rows or columns of a sparse matrix that is not big is to convert to a dense matrix and slice rows/columns. Obviously this approach is not efficient or possible when the sparse matrix dimension is large.