How do i check if two text files are the same in python?

Yes, I think hashing the file would be the best way if you have to compare several files and store hashes for later comparison. As hash can clash, a byte-by-byte comparison may be done depending on the use case.

Generally byte-by-byte comparison would be sufficient and efficient, which filecmp module already does + other things too.

See http://docs.python.org/library/filecmp.html e.g.

>>> import filecmp
>>> filecmp.cmp('file1.txt', 'file1.txt')
True
>>> filecmp.cmp('file1.txt', 'file2.txt')
False

Speed consideration: Usually if only two files have to be compared, hashing them and comparing them would be slower instead of simple byte-by-byte comparison if done efficiently. e.g. code below tries to time hash vs byte-by-byte

Disclaimer: this is not the best way of timing or comparing two algo. and there is need for improvements but it does give rough idea. If you think it should be improved do tell me I will change it.

import random
import string
import hashlib
import time

def getRandText(N):
    return  "".join([random.choice(string.printable) for i in xrange(N)])

N=1000000
randText1 = getRandText(N)
randText2 = getRandText(N)

def cmpHash(text1, text2):
    hash2 = hashlib.md5()
    hash2.update(text1)
    hash2 = hash2.hexdigest()

    hash2 = hashlib.md5()
    hash2.update(text2)
    hash2 = hash2.hexdigest()

    return  hash2 == hash2

def cmpByteByByte(text1, text2):
    return text1 == text2

for cmpFunc in (cmpHash, cmpByteByByte):
    st = time.time()
    for i in range(10):
        cmpFunc(randText1, randText2)
    print cmpFunc.func_name,time.time()-st

and the output is

cmpHash 0.234999895096
cmpByteByByte 0.0

View Discussion

Improve Article

Save Article

  • Read
  • Discuss
  • View Discussion

    Improve Article

    Save Article

    In Python, there are many methods available to this comparison. In this Article, We’ll find out how toCompare two different files line by line. Python supports many modules to do so and here we will discuss approaches using its various modules.

    This article uses two sample files for implementation.

    Files in use:

    • file.txt

    How do i check if two text files are the same in python?

    • file1.txt

    How do i check if two text files are the same in python?

    Method 1: Using unified_diff()

    Python has a Module which is specially used for comparing the differences between the files. To get differences using the difflib library, we have to call the unified_diff() function to this comparison.  

    Syntax:

    unified_diff(file1, file2, fromfile, tofile, lineterm)

    Parameter:

    • file1: List of String such as file_1_text
    • file2: List of String such as file_2_text
    • fromfile: first file name with extension
    • tofile: second file name with extension
    • lineterm: argument to “” so that the output will be automatically uniformly newline free

    Approach

    • Import module
    • Open files
    • Compare using unified_diff() with appropriate attributes

    Example:

    Python3

    import difflib

    with open('file1.txt') as file_1:

        file_1_text = file_1.readlines()

    with open('file2.txt') as file_2:

        file_2_text = file_2.readlines()

    for line in difflib.unified_diff(

            file_1_text, file_2_text, fromfile='file1.txt',

            tofile='file2.txt', lineterm=''):

        print(line)

    Output:

    — file1.txt

    +++ file2.txt

    @@ -1,5 +1,5 @@

     Learning

     Python

     is

    -too

    -simple.

    +so

    +easy.

    Method 2: Using differ

    There is one Class available for comparing the differences between the files which named as Differ inside the difflib library. This class is used for comparing sequences of lines of text, and producing human-readable differences or deltas.

    CodeMeaning

    ‘-‘

    line unique to sequence 1

    ‘+’

    line unique to sequence 2

    ‘ ‘

    line common to both sequences

    ‘?’

    line not present in either input sequence

    Approach

    • Import module
    • Open files
    • Read contents line bt line
    • Call compare function with the use of differ class object

    Example:

    Python3

    from difflib import Differ

    with open('file1.txt') as file_1, open('file2.txt') as file_2:

        differ = Differ()

        for line in differ.compare(file_1.readlines(), file_2.readlines()):

            print(line)

    Output:

    Learning

    Python

    is

    – too

    – simple.

    + so

    + easy.

    Method 3: Using while loop and Intersection Method

    Approach

    • Open both files in read mode
    • Store list of strings
    • Start comparing both files with the help of intersection() method for common strings
    • Compare both files for differences using while loop
    • Close both files

    Example:

    Python3

    file_1 = open('file1.txt', 'r')

    file_2 = open('file2.txt', 'r')

    print("Comparing files ", " @ " + 'file1.txt', " # " + 'file2.txt', sep='\n')

    file_1_line = file_1.readline()

    file_2_line = file_2.readline()

    line_no = 1

    print()

    with open('file1.txt') as file1:

        with open('file2.txt') as file2:

            same = set(file1).intersection(file2)

    print("Common Lines in Both Files")

    for line in same:

        print(line, end='')

    print('\n')

    print("Difference Lines in Both Files")

    while file_1_line != '' or file_2_line != '':

        file_1_line = file_1_line.rstrip()

        file_2_line = file_2_line.rstrip()

        if file_1_line != file_2_line:

            if file_1_line == '':

                print("@", "Line-%d" % line_no, file_1_line)

            else:

                print("@-", "Line-%d" % line_no, file_1_line)

            if file_2_line == '':

                print("#", "Line-%d" % line_no, file_2_line)

            else:

                print("#+", "Line-%d" % line_no, file_2_line)

            print()

        file_1_line = file_1.readline()

        file_2_line = file_2.readline()

        line_no += 1

    file_1.close()

    file_2.close()

    Output:

    Comparing files 

     @ file1.txt

     # file2.txt

    Common Lines in Both Files

    Learning

    Python

    is

    Difference Lines in Both Files

    @- Line-4 too

    #+ Line-4 so

    @- Line-5 simple.

    #+ Line-5 easy.


    How do you compare text files in Python?

    Approach.
    Open both files in read mode..
    Store list of strings..
    Start comparing both files with the help of intersection() method for common strings..
    Compare both files for differences using while loop..
    Close both files..

    How do you check if a file is the same in Python?

    Python Check If File Exists.
    from os.path import exists file_exists = exists(path_to_file) ... .
    from pathlib import Path path = Path(path_to_file) path.is_file() ... .
    import os.path. ... .
    os.path.exists(path_to_file) ... .
    import os.path file_exists = os.path.exists('readme.txt') print(file_exists) ... .
    False..

    How do I compare data in two text files?

    Type the following command to compare two similar files in ASCII mode and press Enter: fc /L filename1.txt filename2.txt..
    Type the following command to compare two files displaying only the first line that is different and press Enter: fc /a filename1.txt filename2.txt..

    How can I tell if contents of two files are the same?

    Use the diff command to compare text files. It can compare single files or the contents of directories. When the diff command is run on regular files, and when it compares text files in different directories, the diff command tells which lines must be changed in the files so that they match.