programming python

Xóa các từ khỏi khung dữ liệu Python

Công cụ sau đây trực quan hóa những gì máy tính đang làm từng bước khi nó thực thi chương trình nói trên

Trình chỉnh sửa mã Python

Có một cách khác để giải quyết giải pháp này?

Trước. Viết chương trình Python để kiểm tra số thập phân với độ chính xác là 2.
Tiếp theo. Viết chương trình Python để xóa vùng ngoặc đơn trong một chuỗi.

Mức độ khó của bài tập này là gì?

Dễ dàng trung bình khó

Kiểm tra kỹ năng Lập trình của bạn với bài kiểm tra của w3resource

con trăn. Lời khuyên trong ngày

giải nén

Bạn có thể giải nén các bộ sưu tập với ký hiệu 1 sao. *

Bạn có thể giải nén các bộ sưu tập đã đặt tên bằng ký hiệu 2 sao. **

Vì vậy, nếu một hàm lấy iterables làm đối số thì bạn có thể chuyển iterables với ký hiệu dấu sao cho các hàm đó

Đối với ấn phẩm này, tập dữ liệu đã xử lý Amazon Unlocked Mobile từ nền tảng thống kê “Kaggle” đã được sử dụng. Hơn nữa, tôi sẽ sử dụng trạng thái cuối cùng của chuỗi ví dụ và các bảng tần số đã lưu mà tôi đã tạo trong bài viết trước của mình. Bạn có thể tải xuống tất cả các tệp từ “Kho lưu trữ GitHub” của tôi

2 Nhập thư viện và dữ liệu

import pandas as pd
import numpy as np

import pickle as pk

import warnings
warnings.filterwarnings["ignore"]


from bs4 import BeautifulSoup
import unicodedata
import re

from nltk.tokenize import word_tokenize
from nltk.tokenize import sent_tokenize

from nltk.corpus import stopwords


from nltk.corpus import wordnet
from nltk import pos_tag
from nltk import ne_chunk

from nltk.stem.porter import PorterStemmer
from nltk.stem.wordnet import WordNetLemmatizer

from nltk.probability import FreqDist
import matplotlib.pyplot as plt
from wordcloud import WordCloud

pd.set_option['display.max_colwidth', 30]

df = pd.read_csv['Amazon_Unlocked_Mobile_small_Part_V.csv']
df.head[3].T

df['Reviews_cleaned_wo_single_char'] = df['Reviews_cleaned_wo_single_char'].astype[str]

clean_text_wo_single_char = pk.load[open["clean_text_wo_single_char.pkl",'rb']]
clean_text_wo_single_char

Ngoài Khung dữ liệu và Chuỗi mẫu, chúng tôi tải các bảng tần suất đã lưu trước đó từ bài đăng NLP - Tiền xử lý văn bản V [Khám phá văn bản] tại thời điểm này

df_most_common_words = pd.read_csv['df_most_common_words.csv']
df_least_common_words = pd.read_csv['df_least_common_words.csv']
df_most_common_words_text_corpus = pd.read_csv['df_most_common_words_text_corpus.csv']
df_least_common_words_text_corpus = pd.read_csv['df_least_common_words_text_corpus.csv']

3 Định nghĩa các chức năng cần thiết

Tất cả các chức năng được tóm tắt ở đây. Tôi sẽ chỉ lại nơi chúng được sử dụng trong bài đăng này nếu chúng mới và chưa được giải thích

def word_count_func[text]:
    '''
    Counts words within a string
    
    Args:
        text [str]: String to which the function is to be applied, string
    
    Returns:
        Number of words within a string, integer
    ''' 
    return len[text.split[]]

def single_word_remove_func[text, word_2_remove]:
    '''
    Removes a specific word from string, if present
    
    Step 1: Use word_tokenize[] to get tokens from string
    Step 2: Removes the defined word from the created tokens
    
    Args:
        text [str]: String to which the functions are to be applied, string
        word_2_remove [str]: Word to be removed from the text, string
    
    Returns:
        String with removed words
    '''    
    word_to_remove = word_2_remove
    
    words = word_tokenize[text]
    text = ' '.join[[word for word in words if word != word_to_remove]]
    return text

def multiple_word_remove_func[text, words_2_remove_list]:
    '''
    Removes certain words from string, if present
    
    Step 1: Use word_tokenize[] to get tokens from string
    Step 2: Removes the defined words from the created tokens
    
    Args:
        text [str]: String to which the functions are to be applied, string
        words_2_remove_list [list]: Words to be removed from the text, list of strings
    
    Returns:
        String with removed words
    '''     
    words_to_remove_list = words_2_remove_list
    
    words = word_tokenize[text]
    text = ' '.join[[word for word in words if word not in words_to_remove_list]]
    return text

def most_freq_word_func[text, n_words=5]:
    '''
    Returns the most frequently used words from a text
    
    Step 1: Use word_tokenize[] to get tokens from string
    Step 2: Uses the FreqDist function to determine the word frequency
    
    Args:
        text [str]: String to which the functions are to be applied, string
    
    Returns:
        List of the most frequently occurring words [by default = 5]
    ''' 
    words = word_tokenize[text]
    fdist = FreqDist[words] 
    
    df_fdist = pd.DataFrame[{'Word': fdist.keys[],
                             'Frequency': fdist.values[]}]
    df_fdist = df_fdist.sort_values[by='Frequency', ascending=False]
    
    n_words = n_words
    most_freq_words_list = list[df_fdist['Word'][0:n_words]]
    
    return most_freq_words_list

pd.set_option['display.max_colwidth', 30]

4 Tiền xử lý văn bản