Find duplicate lines in text file python

I have a text file with some 1,200 rows. Some of them are duplicates.

How could I find the duplicate lines in the file [but not worrying about case] and then print out the line's text on the screen, so I can go off and find it? I don't want to delete them or anything, just find which lines they might be.

jww

93k86 gold badges382 silver badges839 bronze badges

asked Oct 17, 2012 at 15:26

4

This is pretty easy with a set:

with open['file'] as f:
    seen = set[]
    for line in f:
        line_lower = line.lower[]
        if line_lower in seen:
            print[line]
        else:
            seen.add[line_lower]

answered Oct 17, 2012 at 15:28

mgilsonmgilson

288k60 gold badges601 silver badges675 bronze badges

9

as there are only 1200 lines, so you can also use collections.Counter[]:

>>> from collections import Counter

>>> with open['data1.txt'] as f:
...     c=Counter[c.strip[].lower[] for c in f if c.strip[]] #for case-insensitive search
...     for line in c:
...         if c[line]>1:
...             print line
... 

if data1.txt is something like this:

ABC
abc
aBc
CAB
caB
bca
BcA
acb

output is:

cab
abc
bca

answered Oct 17, 2012 at 15:34

Ashwini ChaudharyAshwini Chaudhary

236k56 gold badges444 silver badges495 bronze badges

0

Finding Case-Insensitive Duplicates

This won't give you line numbers, but it will give you a list of duplicate lines which you can then investigate further. For example:

tr 'A-Z' 'a-z' < /tmp/foo | sort | uniq -d

Example Data File

# /tmp/foo
one
One
oNe
two
three

The pipeline listed above will correctly yield:

one

Finding the Line Numbers

You could then grep for related line numbers like so:

grep --ignore-case --line-number one /tmp/foo

answered Oct 17, 2012 at 15:36

Todd A. JacobsTodd A. Jacobs

78.3k14 gold badges139 silver badges192 bronze badges

Here is a short program in Python to identify the count of duplicate lines in a text file.

import tkinter as tk
from tkinter import filedialog
from collections import defaultdict
import pandas as pd
import collections
from pathlib import Path
import os

root= tk.Tk[]

canvas1 = tk.Canvas[root, width = 800, height = 300]
canvas1.pack[]

label1 = tk.Label[root, text='Log Analyser']
label2 = tk.Label[root, text='Import a file...']
label1.config[font=['Arial', 20]]
label2.config[font=['Arial', 10]]
canvas1.create_window[400, 50, window=label1]
canvas1.create_window[200, 180, window=label2]

def getLogFile []:
      global df

      import_file = filedialog.askopenfilename[]
      Counter = 0

      with open[import_file, "r+"] as f:
            d = f.readlines[]
            f.seek[0]
            entries = Path[import_file]
            fileabspath = os.path.abspath[import_file]
                        
            fw= open[fileabspath.replace[entries.name,"Duplicate_Log_Info.txt"],"w+"]
            
            counts = collections.Counter[l.strip[] for l in f]
            for line, count in counts.most_common[]:
                #print [line, "|"+str[count]]
                fw.write[line + "|"+str[count] + "\n"]
            label3 = tk.Label[root, text=entries.name + ": Import is successful, Please check the output file - "+ fw.name + "."]
            label3.config[font=['Arial', 10]]
            canvas1.create_window[400, 220, window=label3]
            f.close[]
            fw.close[]

            
browseButton_Excel = tk.Button[text='Choose a file...', command=getLogFile, bg='green', fg='white', font=['helvetica', 12, 'bold']]
canvas1.create_window[400, 180, window=browseButton_Excel]

button3 = tk.Button [root, text='Close', command=root.destroy, bg='green', font=['helvetica', 11, 'bold']]
canvas1.create_window[500, 180, window=button3]

root.mainloop[]

Output:

If you enjoyed this blog post, feel free to share it with your friends!

How do you find duplicates in a text file using Python?

Find Duplicate Value from Two Files Using Python 3.
# Read First File. print['Reading First File \n'] with open['first_file.txt','r'] as file_one: ... .
#Read Second File. print['Reading Second File \n'] with open['second_file.txt','r'] as file_two: ... .
#Compare the data of two files. for file_one_data in data_one:.

How do I find duplicates in a text file?

Related.
Find duplicate file use cmp command..
Notepad++ find between lines..
Skip the first 6 lines/rows in a text file with awk..
Regex find duplicate of partial string..
Replace multiple duplicate lines with spaces in Notepad++.
Print a file and substract in two rows..

How do you remove duplicate lines in a text file using Python?

Explanation:.
First of all, save the path of the input and output file paths in two variables. ... .
Create one Set variable. ... .
Open the output file in write mode. ... .
Start one for loop to read from the input file line by line. ... .
Find the hash value of the current line. ... .
Check if this hash value is already in the Set variable or not..

How do I find duplicates in Notepad ++?

Find duplicates and delete all in notepad++.
sort line with Edit -> Line Operations -> Sort Lines Lexicographically ascending..
do a Find / Replace: Find What: ^[. *\r?\ n]\1+ Replace with: [Nothing, leave empty] Check Regular Expression in the lower left. Click Replace All..

Chủ Đề