Hướng dẫn python compare binary strings

In python I need to print a diff of two binary files. I was looking at difflib.Differ which does a lot.

Nội dung chính

  • The dircmp class¶
  • How do I compare two binary files in Python?
  • How do I compare two binary files?
  • How do I compare two files in Python?
  • Does diff work on binary files?

However differ assumes lines of text and so the output does not list the byte index and the hex value difference.

What I need is output that has what byte is different, how the byte is different, the actual hex values of the two bytes.

In Python, how do you compare two binary files (output: the byte diff index, the hex values of the two bytes)?

I was doing something like:

# /usr/bin/env python2
import difflib
x = open('/path/to/file1', 'r').read()
y = open('/path/to/file2', 'r').read()
print '\n'.join(difflib.Differ().compare(x, y))

But this doesn't output the byte index where the difference is. And it doesn't print the hex values.

Source code: Lib/filecmp.py


The filecmp module defines functions to compare files and directories, with various optional time/correctness trade-offs. For comparing files, see also the difflib module.

The filecmp module defines the following functions:

filecmp.cmp(f1, f2, shallow=True)

Compare the files named f1 and f2, returning True if they seem equal, False otherwise.

If shallow is true and the os.stat() signatures (file type, size, and modification time) of both files are identical, the files are taken to be equal.

Otherwise, the files are treated as different if their sizes or contents differ.

Note that no external programs are called from this function, giving it portability and efficiency.

This function uses a cache for past comparisons and the results, with cache entries invalidated if the os.stat() information for the file changes. The entire cache may be cleared using clear_cache().

filecmp.cmpfiles(dir1, dir2, common, shallow=True)

Compare the files in the two directories dir1 and dir2 whose names are given by common.

Returns three lists of file names: match, mismatch, errors. match contains the list of files that match, mismatch contains the names of those that don’t, and errors lists the names of files which could not be compared. Files are listed in errors if they don’t exist in one of the directories, the user lacks permission to read them or if the comparison could not be done for some other reason.

The shallow parameter has the same meaning and default value as for filecmp.cmp().

For example, cmpfiles('a', 'b', ['c', 'd/e']) will compare a/c with b/c and a/d/e with b/d/e. 'c' and 'd/e' will each be in one of the three returned lists.

filecmp.clear_cache()

Clear the filecmp cache. This may be useful if a file is compared so quickly after it is modified that it is within the mtime resolution of the underlying filesystem.

New in version 3.4.

The dircmp class¶

class filecmp.dircmp(a, b, ignore=None, hide=None)

Construct a new directory comparison object, to compare the directories a and b. ignore is a list of names to ignore, and defaults to filecmp.DEFAULT_IGNORES. hide is a list of names to hide, and defaults to [os.curdir, os.pardir].

The dircmp class compares files by doing shallow comparisons as described for filecmp.cmp().

The dircmp class provides the following methods:

report()

Print (to sys.stdout) a comparison between a and b.

report_partial_closure()

Print a comparison between a and b and common immediate subdirectories.

report_full_closure()

Print a comparison between a and b and common subdirectories (recursively).

The dircmp class offers a number of interesting attributes that may be used to get various bits of information about the directory trees being compared.

Note that via __getattr__() hooks, all attributes are computed lazily, so there is no speed penalty if only those attributes which are lightweight to compute are used.

left

The directory a.

right

The directory b.

left_list

Files and subdirectories in a, filtered by hide and ignore.

right_list

Files and subdirectories in b, filtered by hide and ignore.

common

Files and subdirectories in both a and b.

left_only

Files and subdirectories only in a.

right_only

Files and subdirectories only in b.

common_dirs

Subdirectories in both a and b.

common_files

Files in both a and b.

common_funny

Names in both a and b, such that the type differs between the directories, or names for which os.stat() reports an error.

same_files

Files which are identical in both a and b, using the class’s file comparison operator.

diff_files

Files which are in both a and b, whose contents differ according to the class’s file comparison operator.

funny_files

Files which are in both a and b, but could not be compared.

subdirs

A dictionary mapping names in common_dirs to dircmp instances (or MyDirCmp instances if this instance is of type MyDirCmp, a subclass of dircmp).

Changed in version 3.10: Previously entries were always dircmp instances. Now entries are the same type as self, if self is a subclass of dircmp.

filecmp.DEFAULT_IGNORES

New in version 3.4.

List of directories ignored by dircmp by default.

Here is a simplified example of using the subdirs attribute to search recursively through two directories to show common different files:

>>> from filecmp import dircmp
>>> def print_diff_files(dcmp):
...     for name in dcmp.diff_files:
...         print("diff_file %s found in %s and %s" % (name, dcmp.left,
...               dcmp.right))
...     for sub_dcmp in dcmp.subdirs.values():
...         print_diff_files(sub_dcmp)
...
>>> dcmp = dircmp('dir1', 'dir2') 
>>> print_diff_files(dcmp) 

How do I compare two binary files in Python?

Below compares x and y and then we look at the output:.

d = difflib. Differ().

e = d. compare(x,y) #set the compare output to a variable..

for i in range(0,len(e)):.

if i. startswith("-"): #if that char start with "-" is not a match..

print(i + "index is different").

How do I compare two binary files?

Use the command cmp to check if two files are the same byte by byte. The command cmp does not list differences like the diff command. However it is handy for a fast check of whether two files are the same or not (especially useful for binary data files).

How do I compare two files in Python?

Approach.

Open both files in read mode..

Store list of strings..

Start comparing both files with the help of intersection() method for common strings..

Compare both files for differences using while loop..

Close both files..

Does diff work on binary files?

You can also force diff to consider all files to be binary files, and report only whether they differ (but not how). Use the `--brief' option for this. In operating systems that distinguish between text and binary files, diff normally reads and writes all data as text.