Hướng dẫn cosine similarity between two string lists python - tính tương tự cosine giữa hai danh sách chuỗi python

Đầu tiên xây dựng một từ điển [đây là thuật ngữ kỹ thuật cho một danh sách tất cả các từ riêng biệt trong một tập hợp hoặc kho văn bản].

vocab = {}
i = 0

# loop through each list, find distinct words and map them to a
# unique number starting at zero

for word in A:
    if word not in vocab:
        vocab[word] = i
        i += 1


for word in B:
    if word not in vocab:
        vocab[word] = i
        i += 1

Từ điển vocab hiện đang ánh xạ từng từ thành một số duy nhất bắt đầu từ 0. Chúng tôi sẽ sử dụng các số này làm chỉ số vào một mảng [hoặc vectơ].

Trong bước tiếp theo, chúng tôi sẽ tạo một thứ gọi là vectơ tần số thuật ngữ cho mỗi danh sách đầu vào. Chúng tôi sẽ sử dụng một thư viện có tên numpy ở đây. Đó là một cách rất phổ biến để thực hiện loại tính toán khoa học này. Nếu bạn quan tâm đến sự tương tự cosine [hoặc các kỹ thuật học máy khác], thì đó là thời gian của bạn.

import numpy as np

# create a numpy array [vector] for each input, filled with zeros
a = np.zeros[len[vocab]]
b = np.zeros[len[vocab]]

# loop through each input and create a corresponding vector for it
# this vector counts occurrences of each word in the dictionary

for word in A:
    index = vocab[word] # get index from dictionary
    a[index] += 1 # increment count for that index

for word in B:
    index = vocab[word]
    b[index] += 1

Bước cuối cùng là tính toán thực tế về độ tương tự cosin.

# use numpy's dot product to calculate the cosine similarity
sim = np.dot[a, b] / np.sqrt[np.dot[a, a] * np.dot[b, b]]

Biến sim hiện chứa câu trả lời của bạn. Bạn có thể rút từng biểu hiện phụ này ra và xác minh rằng chúng phù hợp với công thức ban đầu của bạn.

Với một chút tái cấu trúc kỹ thuật này có khả năng mở rộng khá nhiều [số lượng lớn các danh sách đầu vào, với số lượng từ tương đối lớn]. Đối với Corpora thực sự lớn [như Wikipedia], bạn nên kiểm tra các thư viện xử lý ngôn ngữ tự nhiên được thực hiện cho loại điều này. Đây là một vài cái tốt.

  1. NLTK
  2. GENSIM
  3. Spacy

Xem thảo luận

Cải thiện bài viết

Lưu bài viết

  • Đọc
  • Bàn luận
  • Xem thảo luận

    Cải thiện bài viết

    Lưu bài viết

    Đọc is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them.
    Similarity = [A.B] / [||A||.||B||] where A and B are vectors.

    Bàn luận

    1. Open terminal[Linux].
    2. sudo pip3 install nltk
    3. python3
    4. import nltk
    5. nltk.download[‘all’]

    Độ tương tự cosine là thước đo sự tương đồng giữa hai vectơ khác không của không gian sản phẩm bên trong đo cosin của góc giữa chúng. B là vectơ.

    Tương tự cosine và mô -đun công cụ NLTK được sử dụng trong chương trình này. Để thực hiện chương trình này, NLTK phải được cài đặt trong hệ thống của bạn. Để cài đặt mô -đun NLTK, hãy làm theo các bước bên dưới - It is used for tokenization. Tokenization is the process by which big quantity of text is divided into smaller parts called tokens. word_tokenize[X] split the given sentence X into words and return list.

    Các chức năng được sử dụng: In this program, it is used to get a list of stopwords. A stop word is a commonly used word [such as “the”, “a”, “an”, “in”].

    nltk.tokenize: Nó được sử dụng để mã hóa. Mã thông báo là quá trình mà số lượng lớn văn bản được chia thành các phần nhỏ hơn được gọi là mã thông báo. word_tokenize[X] Chia câu X đã cho thành các từ và danh sách trả về.

    nltk.corpus: Trong chương trình này, nó được sử dụng để có được một danh sách các từ dừng. Một từ dừng là một từ thường được sử dụng [chẳng hạn như là The The The, một, A A, một, một trong những người khác].

    Dưới đây là triển khai Python -

    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    7
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    8
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    9

    # use numpy's dot product to calculate the cosine similarity
    sim = np.dot[a, b] / np.sqrt[np.dot[a, a] * np.dot[b, b]]
    
    0
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    8
    # use numpy's dot product to calculate the cosine similarity
    sim = np.dot[a, b] / np.sqrt[np.dot[a, a] * np.dot[b, b]]
    
    2

    from

    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    0
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    1
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    2

    from

    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    4
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    1
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    6

    # use numpy's dot product to calculate the cosine similarity
    sim = np.dot[a, b] / np.sqrt[np.dot[a, a] * np.dot[b, b]]
    
    3
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    8
    # use numpy's dot product to calculate the cosine similarity
    sim = np.dot[a, b] / np.sqrt[np.dot[a, a] * np.dot[b, b]]
    
    5

    1. Open terminal[Linux].
    2. sudo pip3 install nltk
    3. python3
    4. import nltk
    5. nltk.download[‘all’]
    4
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    8
    1. Open terminal[Linux].
    2. sudo pip3 install nltk
    3. python3
    4. import nltk
    5. nltk.download[‘all’]
    6
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    8
    1. Open terminal[Linux].
    2. sudo pip3 install nltk
    3. python3
    4. import nltk
    5. nltk.download[‘all’]
    8

    # use numpy's dot product to calculate the cosine similarity
    sim = np.dot[a, b] / np.sqrt[np.dot[a, a] * np.dot[b, b]]
    
    6
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    8
    # use numpy's dot product to calculate the cosine similarity
    sim = np.dot[a, b] / np.sqrt[np.dot[a, a] * np.dot[b, b]]
    
    8

    vocab1

    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    8
    similarity:  0.2886751345948129
    
    1
    similarity:  0.2886751345948129
    
    2
    similarity:  0.2886751345948129
    
    3
    similarity:  0.2886751345948129
    
    4
    # use numpy's dot product to calculate the cosine similarity
    sim = np.dot[a, b] / np.sqrt[np.dot[a, a] * np.dot[b, b]]
    
    6__

    numpy3

    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    8 numpy5

    similarity:  0.2886751345948129
    
    2
    similarity:  0.2886751345948129
    
    3
    similarity:  0.2886751345948129
    
    4 numpy9

    sim0

    similarity:  0.2886751345948129
    
    6
    similarity:  0.2886751345948129
    
    3
    similarity:  0.2886751345948129
    
    4 sim4sim55____76

    sim0sim8sim9word_tokenize[X]0word_tokenize[X]1

    sim0

    similarity:  0.2886751345948129
    
    6
    similarity:  0.2886751345948129
    
    3
    similarity:  0.2886751345948129
    
    4 word_tokenize[X]6sim5word_tokenize[X]1

    sim0sim8from1word_tokenize[X]0word_tokenize[X]1

    from4

    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    8 word_tokenize[X]0

    similarity:  0.2886751345948129
    
    2 from8
    similarity:  0.2886751345948129
    
    4
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    00
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    01
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    022

    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    04
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    05
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    06
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    8
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    08
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    09
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    10

    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    11
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    8 from4
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    14
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    15
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    16
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    17
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    18
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    09
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    17
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    21
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    09
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    09__

    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    26
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    01
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    28
    import numpy as np
    
    # create a numpy array [vector] for each input, filled with zeros
    a = np.zeros[len[vocab]]
    b = np.zeros[len[vocab]]
    
    # loop through each input and create a corresponding vector for it
    # this vector counts occurrences of each word in the dictionary
    
    for word in A:
        index = vocab[word] # get index from dictionary
        a[index] += 1 # increment count for that index
    
    for word in B:
        index = vocab[word]
        b[index] += 1
    
    29

    Output:

    similarity:  0.2886751345948129
    


    Làm thế nào để bạn tìm thấy sự tương đồng cosin giữa hai chuỗi trong Python?

    Sự tương tự cosine là thước đo sự tương đồng giữa hai vectơ khác không của không gian sản phẩm bên trong đo cosin của góc giữa chúng. Sự tương đồng = [A.B] / [|| a ||. || b ||] trong đó a và b là vectơ.Similarity = [A.B] / [||A||. ||B||] where A and B are vectors.

    Làm thế nào để bạn tìm thấy sự tương đồng giữa hai danh sách trong Python?

    Phương pháp được áp dụng chính thức để tính toán sự tương đồng giữa các danh sách là tìm các yếu tố riêng biệt và cả các yếu tố phổ biến và tính toán thương số của nó.Kết quả sau đó được nhân với 100, để có được tỷ lệ phần trăm.finding the distinct elements and also common elements and computing it's quotient. The result is then multiplied by 100, to get the percentage.

    Làm thế nào để bạn xác định sự tương đồng cosine trong Python?

    Ví dụ Python..
    def cosine_similarity [x, y]:.
    Nếu len [x]! = len [y]:.
    dot_product = np.chấm [x, y].
    độ magive_x = np.sqrt [np. sum [x ** 2]].
    độ magive_y = np.sqrt [np. sum [y ** 2]].
    cosine_similarity = dot_product / [magnitude_x * magnitude_y].

    Làm thế nào để bạn tìm thấy sự tương đồng giữa hai tệp văn bản?

    Cách đơn giản nhất để tính toán sự tương đồng giữa hai tài liệu bằng cách sử dụng Word nhúng là tính toán vectơ centroid tài liệu.Đây là vectơ trung bình của tất cả các vectơ từ trong tài liệu.compute the document centroid vector. This is the vector that's the average of all the word vectors in the document.

    Bài Viết Liên Quan

    Chủ Đề