Hướng dẫn python read csv from bottom - python đọc csv từ dưới lên

Khá giống như cách đối với một tệp văn bản: Đọc toàn bộ vào một danh sách và sau đó đi ngược lại:

import csv
with open['test.csv', 'r'] as textfile:
    for row in reversed[list[csv.reader[textfile]]]:
        print ', '.join[row]

Nếu bạn muốn nhận được ưa thích, bạn có thể viết rất nhiều mã đọc các khối bắt đầu từ cuối tệp và làm việc ngược, phát ra một dòng tại một thời điểm và sau đó cung cấp nó thành

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    quoting = False
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '"':
                quoting = not quoting
            elif c == '\n' and part and not quoting:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

7, nhưng điều đó sẽ chỉ hoạt động với một tệp Điều đó có thể được tìm kiếm, tức là các tệp đĩa nhưng không phải là đầu vào tiêu chuẩn.

Một số người trong chúng ta có các tệp không phù hợp với bộ nhớ, có ai có thể đi kèm với một giải pháp không yêu cầu lưu trữ toàn bộ tệp trong bộ nhớ không?

Đó là một chút khó khăn hơn. May mắn thay, tất cả

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    quoting = False
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '"':
                quoting = not quoting
            elif c == '\n' and part and not quoting:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

7 mong đợi là một đối tượng giống như lặp lại, trả về một chuỗi [dòng] cho mỗi cuộc gọi đến

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    quoting = False
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '"':
                quoting = not quoting
            elif c == '\n' and part and not quoting:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

9. Vì vậy, chúng tôi lấy kỹ thuật Darius Bacon được trình bày theo "Cách hiệu quả nhất để tìm kiếm các dòng X cuối cùng của một tệp trong Python" để đọc các dòng của một tệp ngược, mà không phải kéo toàn bộ tệp:

import os

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '\n' and part:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

def reversed_blocks[file, blocksize=4096]:
    "Generate blocks of file's contents in reverse order."
    file.seek[0, os.SEEK_END]
    here = file.tell[]
    while 0 < here:
        delta = min[blocksize, here]
        here -= delta
        file.seek[here, os.SEEK_SET]
        yield file.read[delta]

và Feed

import pandas as pd

0 vào mã để đảo ngược các dòng trước khi chúng đến

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    quoting = False
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '"':
                quoting = not quoting
            elif c == '\n' and part and not quoting:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

7, loại bỏ nhu cầu về

import pandas as pd

2 và

import pandas as pd

import csv
with open['test.csv', 'r'] as textfile:
    for row in csv.reader[reversed_lines[textfile]]:
        print ', '.join[row]

Có một giải pháp pythonic hơn có thể, không yêu cầu đảo ngược ký tự của khối trong bộ nhớ [gợi ý: Chỉ cần lấy một danh sách các chỉ số trong đó có các dòng kết thúc trong khối, đảo ngược nó và sử dụng nó để sử dụng để sử dụng Cắt khối] và sử dụng

import pandas as pd

4 ra khỏi

import pandas as pd

5 để dán các cụm đường từ các khối liên tiếp với nhau, nhưng đó là một bài tập cho người đọc.

Điều đáng chú ý là thành ngữ đảo ngược_lines [] ở trên chỉ hoạt động nếu các cột trong tệp CSV không chứa newlines.

Aargh! Luôn luôn có một cái gì đó. May mắn thay, không quá tệ khi sửa chữa điều này:

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    quoting = False
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '"':
                quoting = not quoting
            elif c == '\n' and part and not quoting:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

Tất nhiên, bạn sẽ cần thay đổi ký tự trích dẫn nếu phương ngữ CSV của bạn không sử dụng

import pandas as pd

Hướng dẫn này giải thích cách đọc tệp CSV trong Python bằng cách sử dụng hàm read_csv của gói pandas. Nếu không sử dụng hàm read_csv, việc nhập tệp CSV không đơn giản với lập trình hướng đối tượng Python không đơn giản. Pandas là một gói Python mạnh mẽ tuyệt vời để thao tác dữ liệu và hỗ trợ các chức năng khác nhau để tải và nhập dữ liệu từ các định dạng khác nhau. Ở đây chúng tôi đang đề cập đến cách xử lý các vấn đề phổ biến trong việc nhập tệp CSV.

Cài đặt và tải Gói Pandas

Hãy chắc chắn rằng bạn đã cài đặt gói gandas trên hệ thống của mình. Nếu bạn thiết lập Python bằng Anaconda, nó đi kèm với gói Pandas để bạn không cần phải cài đặt lại. Nếu không, bạn có thể cài đặt nó bằng cách sử dụng lệnh

import pandas as pd

7. Bước tiếp theo là tải gói bằng cách chạy lệnh sau.

import pandas as pd

8 là một bí danh của gói gandas. Chúng tôi sẽ sử dụng nó thay vì tên đầy đủ "gấu trúc".

import pandas as pd

Tạo dữ liệu mẫu để nhập

Chương trình dưới đây tạo ra một khung dữ liệu gấu trúc mẫu có thể được sử dụng thêm để trình diễn.

dt = {'ID': [11, 12, 13, 14, 15],
            'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],
            'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],
            'salary': [74, 76, 96, 71, 78]}
mydt = pd.DataFrame[dt, columns = ['ID', 'first_name', 'company', 'salary']]

Dữ liệu mẫu trông giống như dưới đây -

  ID first_name company  salary
0  11      David     Aon      74
1  12      Jamie     TCS      76
2  13      Steve  Google      96
3  14    Stevart     RBS      71
4  15       John       .      78

Lưu dữ liệu dưới dạng CSV trong thư mục làm việc

Kiểm tra thư mục làm việc trước khi bạn lưu dataFile của bạn.

import os
os.getcwd[]

Trong trường hợp bạn muốn thay đổi thư mục làm việc, bạn có thể chỉ định nó trong hàm

import pandas as pd

9. Backslash đơn không hoạt động trong Python, vì vậy hãy sử dụng 2 dấu gạch chéo ngược trong khi chỉ định vị trí tệp.

os.chdir["C:\\Users\\DELL\\Documents\\"]

Lệnh sau đây bảo Python ghi dữ liệu ở định dạng CSV trong thư mục làm việc của bạn.

mydt.to_csv['workingfile.csv', index=False]

Ví dụ 1: Đọc tệp CSV với hàng tiêu đề

Đó là cú pháp cơ bản của hàm read_csv []. Bạn chỉ cần đề cập đến tên tệp. Nó giả định bạn có tên cột trong hàng đầu tiên của tệp CSV của bạn.

import os

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '\n' and part:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

def reversed_blocks[file, blocksize=4096]:
    "Generate blocks of file's contents in reverse order."
    file.seek[0, os.SEEK_END]
    here = file.tell[]
    while 0 < here:
        delta = min[blocksize, here]
        here -= delta
        file.seek[here, os.SEEK_SET]
        yield file.read[delta]

Nó lưu trữ dữ liệu theo cách của nó khi chúng ta có các tiêu đề trong hàng đầu tiên của DataFile. Điều quan trọng là nhấn mạnh rằng

dt = {'ID': [11, 12, 13, 14, 15],
            'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],
            'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],
            'salary': [74, 76, 96, 71, 78]}
mydt = pd.DataFrame[dt, columns = ['ID', 'first_name', 'company', 'salary']]

0 là giá trị mặc định. Do đó chúng ta không cần đề cập đến tham số tiêu đề = tham số. Nó có nghĩa là tiêu đề bắt đầu từ hàng đầu tiên khi việc lập chỉ mục trong Python bắt đầu từ 0. Mã trên tương đương với dòng mã này.

dt = {'ID': [11, 12, 13, 14, 15],
            'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],
            'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],
            'salary': [74, 76, 96, 71, 78]}
mydt = pd.DataFrame[dt, columns = ['ID', 'first_name', 'company', 'salary']]

1header= parameter. It means header starts from first row as indexing in python starts from 0. The above code is equivalent to this line of code.

dt = {'ID': [11, 12, 13, 14, 15],
            'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],
            'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],
            'salary': [74, 76, 96, 71, 78]}
mydt = pd.DataFrame[dt, columns = ['ID', 'first_name', 'company', 'salary']]

Kiểm tra dữ liệu sau khi nhập

import os

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '\n' and part:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

def reversed_blocks[file, blocksize=4096]:
    "Generate blocks of file's contents in reverse order."
    file.seek[0, os.SEEK_END]
    here = file.tell[]
    while 0 < here:
        delta = min[blocksize, here]
        here -= delta
        file.seek[here, os.SEEK_SET]
        yield file.read[delta]

Nó trả về 5 số hàng và 4 số cột. Tên cột là

dt = {'ID': [11, 12, 13, 14, 15],
            'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],
            'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],
            'salary': [74, 76, 96, 71, 78]}
mydt = pd.DataFrame[dt, columns = ['ID', 'first_name', 'company', 'salary']]

Xem các loại cột dữ liệu chúng tôi đã nhập. First_name và công ty là các biến ký tự. Các biến còn lại là các biến số.first_name and company are character variables. Remaining variables are numeric ones.

import os

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '\n' and part:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

def reversed_blocks[file, blocksize=4096]:
    "Generate blocks of file's contents in reverse order."
    file.seek[0, os.SEEK_END]
    here = file.tell[]
    while 0 < here:
        delta = min[blocksize, here]
        here -= delta
        file.seek[here, os.SEEK_SET]
        yield file.read[delta]

Ví dụ 2: Đọc tệp CSV có tiêu đề ở hàng thứ hai

Giả sử bạn có cột hoặc tên biến trong hàng thứ hai. Để đọc loại tệp CSV này, bạn có thể gửi lệnh sau.

import os

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '\n' and part:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

def reversed_blocks[file, blocksize=4096]:
    "Generate blocks of file's contents in reverse order."
    file.seek[0, os.SEEK_END]
    here = file.tell[]
    while 0 < here:
        delta = min[blocksize, here]
        here -= delta
        file.seek[here, os.SEEK_SET]
        yield file.read[delta]

dt = {'ID': [11, 12, 13, 14, 15],
            'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],
            'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],
            'salary': [74, 76, 96, 71, 78]}
mydt = pd.DataFrame[dt, columns = ['ID', 'first_name', 'company', 'salary']]

3 bảo Python chọn tiêu đề từ hàng thứ hai. Nó đặt hàng thứ hai làm tiêu đề. Đó không phải là một ví dụ thực tế. Tôi chỉ sử dụng nó để minh họa để bạn có ý tưởng làm thế nào để giải quyết nó. Để làm cho nó thực tế, bạn có thể thêm các giá trị ngẫu nhiên vào hàng đầu tiên trong tệp CSV và sau đó nhập lại.

import os

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '\n' and part:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

def reversed_blocks[file, blocksize=4096]:
    "Generate blocks of file's contents in reverse order."
    file.seek[0, os.SEEK_END]
    here = file.tell[]
    while 0 < here:
        delta = min[blocksize, here]
        here -= delta
        file.seek[here, os.SEEK_SET]
        yield file.read[delta]

Xác định tên cột của riêng bạn thay vì hàng tiêu đề từ tệp CSV

import os

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '\n' and part:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

def reversed_blocks[file, blocksize=4096]:
    "Generate blocks of file's contents in reverse order."
    file.seek[0, os.SEEK_END]
    here = file.tell[]
    while 0 < here:
        delta = min[blocksize, here]
        here -= delta
        file.seek[here, os.SEEK_SET]
        yield file.read[delta]

SKIPROWS = 1 có nghĩa là chúng tôi đang bỏ qua hàng đầu tiên và tên = tùy chọn được sử dụng để gán tên biến theo cách thủ công. means we are ignoring first row and names= option is used to assign variable names manually.

import os

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '\n' and part:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

def reversed_blocks[file, blocksize=4096]:
    "Generate blocks of file's contents in reverse order."
    file.seek[0, os.SEEK_END]
    here = file.tell[]
    while 0 < here:
        delta = min[blocksize, here]
        here -= delta
        file.seek[here, os.SEEK_SET]
        yield file.read[delta]

Ví dụ 3: Bỏ qua hàng nhưng giữ tiêu đề

import os

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '\n' and part:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

def reversed_blocks[file, blocksize=4096]:
    "Generate blocks of file's contents in reverse order."
    file.seek[0, os.SEEK_END]
    here = file.tell[]
    while 0 < here:
        delta = min[blocksize, here]
        here -= delta
        file.seek[here, os.SEEK_SET]
        yield file.read[delta]

Trong trường hợp này, chúng tôi đang bỏ qua các hàng thứ hai và thứ ba trong khi nhập. Đừng quên chỉ mục bắt đầu từ 0 trong Python, vì vậy 0 đề cập đến hàng đầu tiên và 1 đề cập đến hàng thứ hai và 2 ngụ ý hàng thứ ba.second and third rows while importing. Don't forget index starts from 0 in python so 0 refers to first row and 1 refers to second row and 2 implies third row.

import os

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '\n' and part:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

def reversed_blocks[file, blocksize=4096]:
    "Generate blocks of file's contents in reverse order."
    file.seek[0, os.SEEK_END]
    here = file.tell[]
    while 0 < here:
        delta = min[blocksize, here]
        here -= delta
        file.seek[here, os.SEEK_SET]
        yield file.read[delta]

Thay vì [1,2] bạn cũng có thể viết

dt = {'ID': [11, 12, 13, 14, 15],
            'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],
            'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],
            'salary': [74, 76, 96, 71, 78]}
mydt = pd.DataFrame[dt, columns = ['ID', 'first_name', 'company', 'salary']]

4. Cả hai có nghĩa là cùng một hàm nhưng hàm range [] rất hữu ích khi bạn muốn bỏ qua nhiều hàng để nó tiết kiệm thời gian xác định thủ công vị trí hàng.[1,2] you can also write

dt = {'ID': [11, 12, 13, 14, 15],
            'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],
            'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],
            'salary': [74, 76, 96, 71, 78]}
mydt = pd.DataFrame[dt, columns = ['ID', 'first_name', 'company', 'salary']]

4. Both means the same thing but range[ ] function is very useful when you want to skip many rows so it saves time of manually defining row position.

Bí mật ẩn của tùy chọn bỏ qua

Khi bỏ qua = 4, nó có nghĩa là bỏ qua bốn hàng từ trên cùng. SKIPROWS = [1,2,3,4] có nghĩa là bỏ qua các hàng từ thứ hai đến thứ năm. Đó là bởi vì khi danh sách được chỉ định trong tùy chọn SKIPROWS =, nó bỏ qua các hàng tại các vị trí chỉ mục. Khi một giá trị số nguyên duy nhất được chỉ định trong tùy chọn, nó xem xét bỏ qua các hàng đó từ trên cùngskiprows = 4, it means skipping four rows from top. skiprows=[1,2,3,4] means skipping rows from second through fifth. It is because when list is specified in skiprows= option, it skips rows at index positions. When a single integer value is specified in the option, it considers skip those rows from top

Ví dụ 4: Đọc tệp CSV không có hàng tiêu đề

Nếu bạn chỉ định "Header = none", Python sẽ gán một loạt các số bắt đầu từ 0 đến [số cột - 1] làm tên cột. Trong DataFile này, chúng tôi có tên cột trong hàng đầu tiên."header = None", python would assign a series of numbers starting from 0 to [number of columns - 1] as column names. In this datafile, we have column names in first row.

import os

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '\n' and part:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

def reversed_blocks[file, blocksize=4096]:
    "Generate blocks of file's contents in reverse order."
    file.seek[0, os.SEEK_END]
    here = file.tell[]
    while 0 < here:
        delta = min[blocksize, here]
        here -= delta
        file.seek[here, os.SEEK_SET]
        yield file.read[delta]

Xem đầu ra được hiển thị bên dưới-

Đầu ra

Thêm tiền tố vào tên cột

import csv
with open['test.csv', 'r'] as textfile:
    for row in csv.reader[reversed_lines[textfile]]:
        print ', '.join[row]

Trong trường hợp này, chúng tôi đang đặt

dt = {'ID': [11, 12, 13, 14, 15],
            'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],
            'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],
            'salary': [74, 76, 96, 71, 78]}
mydt = pd.DataFrame[dt, columns = ['ID', 'first_name', 'company', 'salary']]

5 làm tiền tố cho Python bao gồm từ khóa này trước mỗi tên cột.

import csv
with open['test.csv', 'r'] as textfile:
    for row in csv.reader[reversed_lines[textfile]]:
        print ', '.join[row]

Ví dụ 5: Chỉ định các giá trị bị thiếu

Tùy chọn

dt = {'ID': [11, 12, 13, 14, 15],
            'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],
            'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],
            'salary': [74, 76, 96, 71, 78]}
mydt = pd.DataFrame[dt, columns = ['ID', 'first_name', 'company', 'salary']]

6 được sử dụng để đặt một số giá trị thành giá trị trống / bị thiếu trong khi nhập tệp CSV.

import csv
with open['test.csv', 'r'] as textfile:
    for row in csv.reader[reversed_lines[textfile]]:
        print ', '.join[row]

import csv
with open['test.csv', 'r'] as textfile:
    for row in csv.reader[reversed_lines[textfile]]:
        print ', '.join[row]

Ví dụ 6: Đặt cột chỉ mục

import csv
with open['test.csv', 'r'] as textfile:
    for row in csv.reader[reversed_lines[textfile]]:
        print ', '.join[row]

import csv
with open['test.csv', 'r'] as textfile:
    for row in csv.reader[reversed_lines[textfile]]:
        print ', '.join[row]

Như bạn có thể thấy ở đầu ra ở trên, ID cột đã được đặt làm cột chỉ mục.

Ví dụ 7: Đọc tệp CSV từ URL bên ngoài

Bạn có thể đọc dữ liệu trực tiếp từ tệp CSV được lưu trữ trên liên kết web. Nó rất tiện dụng khi bạn cần tải các bộ dữ liệu có sẵn công khai từ GitHub, Kaggle và các trang web khác.

import csv
with open['test.csv', 'r'] as textfile:
    for row in csv.reader[reversed_lines[textfile]]:
        print ', '.join[row]

DataFrame này chứa 2311 hàng và 8 cột. Sử dụng

dt = {'ID': [11, 12, 13, 14, 15],
            'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],
            'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],
            'salary': [74, 76, 96, 71, 78]}
mydt = pd.DataFrame[dt, columns = ['ID', 'first_name', 'company', 'salary']]

7, bạn có thể tạo tóm tắt này.

Ví dụ 8: Bỏ qua 5 hàng cuối cùng trong khi nhập CSV

import csv
with open['test.csv', 'r'] as textfile:
    for row in csv.reader[reversed_lines[textfile]]:
        print ', '.join[row]

Trong mã trên, chúng tôi đang loại trừ 5 hàng dưới cùng bằng cách sử dụng tham số Skip_footer =.skip_footer= parameter.

Ví dụ 9: Chỉ đọc 5 hàng đầu tiên

import csv
with open['test.csv', 'r'] as textfile:
    for row in csv.reader[reversed_lines[textfile]]:
        print ', '.join[row]

Sử dụng tùy chọn NROWS =, bạn có thể tải số k số hàng đầu.nrows= option, you can load top K number of rows.

Ví dụ 10: Giải thích "," As Hàng ngàn người phân tách

import csv
with open['test.csv', 'r'] as textfile:
    for row in csv.reader[reversed_lines[textfile]]:
        print ', '.join[row]

Ví dụ 11: Chỉ đọc các cột cụ thể

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    quoting = False
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '"':
                quoting = not quoting
            elif c == '\n' and part and not quoting:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

Mã trên chỉ đọc các cột dựa trên các vị trí chỉ mục là vị trí thứ hai, thứ sáu và thứ tám.

Ví dụ 12: Đọc một số hàng và cột

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    quoting = False
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '"':
                quoting = not quoting
            elif c == '\n' and part and not quoting:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

Trong lệnh trên, chúng tôi đã kết hợp Usecols = và NROWS = Tùy chọn. Nó sẽ chỉ chọn 5 hàng đầu tiên và các cột được chọn.

Ví dụ 13: Đọc tệp với dấu phân cách bán ruột kết

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    quoting = False
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '"':
                quoting = not quoting
            elif c == '\n' and part and not quoting:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

Sử dụng hàm sep = tham số trong hàm read_csv [], bạn có thể nhập tệp với bất kỳ dấu phân cách nào khác ngoài dấu phẩy mặc định. Trong trường hợp này, chúng tôi đang sử dụng Semi-Colon làm dấu phân cách.sep= parameter in read_csv[ ] function, you can import file with any delimiter other than default comma. In this case, we are using semi-colon as a separator.

Ví dụ 14: Thay đổi loại cột trong khi nhập CSV

Giả sử bạn muốn thay đổi định dạng cột từ INT64 thành float64 trong khi tải tệp CSV vào Python. Chúng ta có thể sử dụng tùy chọn DTYPE = cho cùng.dtype = option for the same.

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    quoting = False
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '"':
                quoting = not quoting
            elif c == '\n' and part and not quoting:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

Ví dụ 15: Đo thời gian thực hiện để nhập tệp CSV lớn

Với việc sử dụng

dt = {'ID': [11, 12, 13, 14, 15],
            'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],
            'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],
            'salary': [74, 76, 96, 71, 78]}
mydt = pd.DataFrame[dt, columns = ['ID', 'first_name', 'company', 'salary']]

8, bạn có thể thu được thời gian để làm cho tokenization, chuyển đổi và làm sạch bộ nhớ phân tích cú pháp.

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    quoting = False
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '"':
                quoting = not quoting
            elif c == '\n' and part and not quoting:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

Ví dụ 16: Cách đọc tệp CSV mà không cần sử dụng gói Pandas

Để nhập tệp CSV bằng cách Pure Python, bạn có thể gửi lệnh sau:

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    quoting = False
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '"':
                quoting = not quoting
            elif c == '\n' and part and not quoting:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

Bạn cũng có thể tải xuống và tải tệp CSV từ URL hoặc trang web bên ngoài.

def reversed_lines[file]:
    "Generate the lines of file in reverse order."
    part = ''
    quoting = False
    for block in reversed_blocks[file]:
        for c in reversed[block]:
            if c == '"':
                quoting = not quoting
            elif c == '\n' and part and not quoting:
                yield part[::-1]
                part = ''
            part += c
    if part: yield part[::-1]

Kết thúc

Sau khi hoàn thành hướng dẫn này, tôi hy vọng bạn đã tự tin trong việc nhập tệp CSV vào Python với các cách để làm sạch và quản lý tệp. Bạn cũng có thể xem hướng dẫn này giải thích cách nhập các tệp có định dạng khác nhau vào Python. Sau khi hoàn thành, bạn nên tìm hiểu cách thực hiện thao tác dữ liệu phổ biến hoặc các tác vụ gây tranh cãi như lọc, chọn và đổi tên các cột, xác định và loại bỏ các bản sao, vv trên gấu trúc DataFrame.

Làm cách nào để đọc hàng đầu tiên của tệp CSV trong Python?

Bước 1: Để đọc các hàng trong Python, trước tiên, chúng ta cần tải tệp CSV trong một đối tượng. Vì vậy, để tải tệp CSV vào một đối tượng sử dụng phương thức Open []. Bước 2: Tạo một đối tượng đầu đọc bằng cách chuyển đối tượng tệp được tạo ở trên cho hàm đầu đọc. Bước 3: Sử dụng cho vòng lặp trên đối tượng đầu đọc để có được mỗi hàng.

Làm cách nào để đọc cột đầu tiên của tệp CSV trong Python?

Làm cách nào để đọc cột đầu tiên của tệp CSV trong Python? Để đọc tệp CSV, hãy gọi PD. read_csv [file_name, usecols = cols_list] với file_name làm tên của tệp CSV, DELIMITER là dấu phân cách và cols_list làm danh sách các cột cụ thể để đọc từ tệp CSV.

Cách tốt nhất để đọc tệp CSV trong Python là gì?

Đọc một tệp CSV bằng Python..

Sử dụng thư viện CSV.Nhập CSV với Open ["./ Bwq.csv", 'r'] dưới dạng tệp: csvreader = csv.reader [tệp] cho hàng trong csvreader: in [hàng] ở đây chúng tôi đang nhập thư viện CSV để sử dụng.....

Sử dụng thư viện Pandas.Nhập gấu trúc dưới dạng dữ liệu PD = pd.Read_CSV ["BWQ.CSV"] dữ liệu ..

Làm cách nào để đọc tệp CSV từ dữ liệu cụ thể trong Python?

Các bước để đọc tệp CSV:..

Nhập thư viện CSV.Nhập CSV ..

Mở tệp CSV.Các .....

Sử dụng đối tượng CSV.Reader để đọc tệp CSV.csvreader = csv.Reader [tệp].

Trích xuất tên trường.Tạo một danh sách trống gọi là tiêu đề.....

Trích xuất các hàng/hồ sơ.....

Đóng tệp ..