Hướng dẫn count row csv python

2018-10-29 EDIT

Thank you for the comments.

I tested several kinds of code to get the number of lines in a csv file in terms of speed. The best method is below.

with open[filename] as f:
    sum[1 for line in f]

Here is the code tested.

import timeit
import csv
import pandas as pd

filename = './sample_submission.csv'

def talktime[filename, funcname, func]:
    print[f"# {funcname}"]
    t = timeit.timeit[f'{funcname}["{filename}"]', setup=f'from __main__ import {funcname}', number = 100] / 100
    print['Elapsed time : ', t]
    print['n = ', func[filename]]
    print['\n']

def sum1forline[filename]:
    with open[filename] as f:
        return sum[1 for line in f]
talktime[filename, 'sum1forline', sum1forline]

def lenopenreadlines[filename]:
    with open[filename] as f:
        return len[f.readlines[]]
talktime[filename, 'lenopenreadlines', lenopenreadlines]

def lenpd[filename]:
    return len[pd.read_csv[filename]] + 1
talktime[filename, 'lenpd', lenpd]

def csvreaderfor[filename]:
    cnt = 0
    with open[filename] as f:
        cr = csv.reader[f]
        for row in cr:
            cnt += 1
    return cnt
talktime[filename, 'csvreaderfor', csvreaderfor]

def openenum[filename]:
    cnt = 0
    with open[filename] as f:
        for i, line in enumerate[f,1]:
            cnt += 1
    return cnt
talktime[filename, 'openenum', openenum]

The result was below.

# sum1forline
Elapsed time :  0.6327946722068599
n =  2528244


# lenopenreadlines
Elapsed time :  0.655304473598555
n =  2528244


# lenpd
Elapsed time :  0.7561274056295324
n =  2528244


# csvreaderfor
Elapsed time :  1.5571560935772661
n =  2528244


# openenum
Elapsed time :  0.773000013928679
n =  2528244

In conclusion, sum[1 for line in f] is fastest. But there might not be significant difference from len[f.readlines[]].

sample_submission.csv is 30.2MB and has 31 million characters.

View Discussion

Improve Article

Save Article

  • Read
  • Discuss
  • View Discussion

    Improve Article

    Save Article

    CSV [Comma Separated Values] is a simple fileformat used to store tabular data, such as a spreadsheet or database. A CSV file stores tabular data [numbers and text] in plain text. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format.

    In this article, we are going to discuss various approaches to count the number of lines in a CSV file using Python.

    We are going to use the below dataset to perform all operations:

    Python3

    import pandas as pd

    results = pd.read_csv['Data.csv']

    print[results]

    Output:

    To count the number of lines/rows present in a CSV file, we have two different types of methods:

    • Using len[] function.
    • Using a counter.

    Using len[] function

    Under this method, we need to read the CSV file using pandas library and then use the len[] function with the imported CSV file, which will return an int value of a number of lines/rows present in the CSV file.

    Python3

    import pandas as pd

    results = pd.read_csv['Data.csv']

    print["Number of lines present:-"

          len[results]]

    Output:

    Using a counter

    Under this approach, we will be initializing an integer rowcount to -1 [not 0 as iteration will start from the heading and not the first row]at the beginning and iterate through the whole file and incrementing the rowcount by one. And in the end, we will be printing the rowcount value.

    Python3

    rowcount  = 0

    for row in open["Data.csv"]:

      rowcount+= 1

    print["Number of lines present:-", rowcount]

    Output:


    Chủ Đề