How do you check for repeated words in python?

I can see where you are going with sort, as you can reliably know when you have hit a new word and keep track of counts for each unique word. However, what you really want to do is use a hash [dictionary] to keep track of the counts as dictionary keys are unique. For example:

words = sentence.split[]
counts = {}
for word in words:
    if word not in counts:
        counts[word] = 0
    counts[word] += 1

Now that will give you a dictionary where the key is the word and the value is the number of times it appears. There are things you can do like using collections.defaultdict[int] so you can just add the value:

counts = collections.defaultdict[int]
for word in words:
    counts[word] += 1

But there is even something better than that... collections.Counter which will take your list of words and turn it into a dictionary [an extension of dictionary actually] containing the counts.

counts = collections.Counter[words]

From there you want the list of words in sorted order with their counts so you can print them. items[] will give you a list of tuples, and sorted will sort [by default] by the first item of each tuple [the word in this case]... which is exactly what you want.

import collections
sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
words = sentence.split[]
word_counts = collections.Counter[words]
for word, count in sorted[word_counts.items[]]:
    print['"%s" is repeated %d time%s.' % [word, count, "s" if count > 1 else ""]]

OUTPUT

"As" is repeated 1 time.
"are" is repeated 2 times.
"as" is repeated 3 times.
"certain" is repeated 2 times.
"do" is repeated 1 time.
"far" is repeated 2 times.
"laws" is repeated 1 time.
"mathematics" is repeated 1 time.
"not" is repeated 2 times.
"of" is repeated 1 time.
"reality" is repeated 2 times.
"refer" is repeated 2 times.
"the" is repeated 1 time.
"they" is repeated 3 times.
"to" is repeated 2 times.

Explanation

In this program, we need to find out the duplicate words present in the string and display those words.

To find the duplicate words from the string, we first split the string into words. We count the occurrence of each word in the string. If count is greater than 1, it implies that a word has duplicate in the string.

In above example, the words highlighted in green are duplicate words.

Algorithm

  1. Define a string.
  2. Convert the string into lowercase to make the comparison insensitive.
  3. Split the string into words.
  4. Two loops will be used to find duplicate words. Outer loop will select a word and Initialize variable count to 1. Inner loop will compare the word selected by outer loop with rest of the words.
  5. If a match found, then increment the count by 1 and set the duplicates of word to '0' to avoid counting it again.
  6. After the inner loop, if count of a word is greater than 1 which signifies that the word has duplicates in the string.

Solution

Python

Output:

 Duplicate words in a given string : 
big
black

C

Output:

Duplicate words in a given string : 
big
black

JAVA

Output:

Duplicate words in a given string : 
big
black

C#

Output:

Duplicate words in a given string : 
big
Black

PHP

Output:

Duplicate words in a given string : 
big
black

Next Topic#

In this article, we will show you how to find the most repeated word in a given text file using python.

Assume we have taken a text file with the name ExampleTextFile.txt consisting of some random text. We will return the most repeated word in a given text file

ExampleTextFile.txt

Good Morning TutorialsPoint
This is TutorialsPoint sample File
Consisting of Specific
source codes in Python,Seaborn,Scala
Summary and Explanation
Welcome TutorialsPoint
Learn with a joy

Algorithm [Steps]

Following are the Algorithm/steps to be followed to perform the desired task −

  • Import the Counter function [The Counter class is a form of object data-set provided by Python3's collections module. The Collections module exposes specialized container datatypes to the user, serving as an alternative to Python's general-purpose built-ins such as dictionaries, lists, and tuples. The Counter is a subclass that counts hashable objects. When called, it creates an iterable hash table implicitly] from the collections module

  • Create a variable to store the path of the text file.

  • Create a list to store all the words.

  • Use the open[] function[opens a file and returns a file object as a result] to open the text file in read-only mode by passing the file name, and mode as arguments to it [Here “r” represents read-only mode].

with open[inputFile, 'r'] as filedata:
  • Traverse in each line of the file using the for loop.

  • Use the split[] function [splits a string into a list. We can define the separator; the default separator is any whitespace] to split the text file content into a list of words and store it in a variable.

  • Traverse in the list of words using the for loop.

  • Use the append[] function [adds the element to the list at the end], to append each word to the list.

  • Use the Counter[] function [which gives the frequency of words as a key-value pairs], to calculate the frequency [number of times the word has occurred] of all the words.

  • Create a variable to store the maximum frequency.

  • Loop in the above words frequency dictionary using the for loop.

  • Using the if conditional statement and the in keyword, check whether the frequency of the word is greater than the maximum frequency.

The in keyword works in two ways:
The in keyword is used to determine whether a value exists in a sequence [list, range, string etc].
It is also used to iterate through a sequence in a for loop
  • If the frequency of the word is greater than the maximum frequency.

  • Create a variable to store the most repeated word in a text file.

  • Print the most repeated word in a text file.

  • Close the input file with the close[] function [used to close an opened file].

Example

The following program traverses lines of a text file and prints the frequency of key-value pair from a text file using the counter function from the collections module -

from collections import Counter inputFile = "ExampleTextFile.txt" newWordsList = [] with open[inputFile, 'r'] as filedata: for textline in filedata: wordsList = textline.split[] for word in wordsList: newWordsList.append[word] wordsFrequency = Counter[newWordsList] maxFrequency = 0 for textword in wordsFrequency: if[wordsFrequency[textword] > maxFrequency]: maxFrequency = wordsFrequency[textword] mostRepeatedWord = textword print["{",mostRepeatedWord,"} is the most repeated word in a text file"] filedata.close[]

Output

On executing, the above program will generate the following output −

{ TutorialsPoint } is the most repeated word in a text file

In this program, we read some random text from a text file. We read over the entire file, breaking it down into words, and adding all of the text file's words to the list. We used the Counter[] method to count the frequency of all the words in the text file, which returns a dictionary with keys as words and values as the frequency of the word. Then we iterated over the dictionary's words, checking whether the frequency was greater than the maximum frequency. If it was, this was the most frequent word, so we saved the result in a variable and updated the maximum frequency with the frequency of the current word. Finally, we displayed the most frequent word.

Conclusion

This article showed us how to read a file, traverse it line by line, and retrieve all the words in that line. Once we get them, we may reverse the words, change the case, check the vowels, retrieve the word length, etc. We also learned how to use the Counter[] method to determine the frequency of a list of words. This function may be used to determine the frequency of a string, list, tuple and so on.

Updated on 18-Aug-2022 08:50:24

  • Related Questions & Answers
  • Find the second most repeated word in a sequence in Java
  • Second most repeated word in a sequence in Python?
  • How to find and replace the word in a text file using PowerShell?
  • Find the first repeated word in a string in Python using Dictionary
  • Find the first repeated word in a string in Python?
  • How to Count Word Occurrences in a Text File using Shell Script?
  • Write a program in Python to find the most repeated element in a series
  • How to find and replace within a text file using Python?
  • Find the first repeated word in a string in Java
  • Find the first repeated word in a string in C++
  • How to find Second most repeated string in a sequence in android?
  • How to find a file using Python?
  • How to find the most recent file in a directory on Linux?
  • How to search and replace text in a file using Python?
  • How to write a single line in text file using Python?
  • How to read a text file in Python?

How do you find repeated words in Python?

Python.
string = "big black bug bit a big black dog on his big black nose";.
#Converts the string into lowercase..
string = string.lower[];.
#Split the string into words using built-in function..
words = string.split[" "];.
print["Duplicate words in a given string : "];.
for i in range[0, len[words]]:.
count = 1;.

How do I search for a repeated word in a text file in Python?

This program will follow the below algorithm:.
Open the file in read mode..
Initialize two empty set. ... .
Iterate through the lines of the file with a loop..
For each line, get the list of words by using split..
Iterate through the words of each line by using a loop..

How do I find a repeated character in a string in python?

First, we will find the duplicate characters of a string using the count method..
Initialize a string..
Initialize an empty list..
Loop over the string. Check whether the char frequency is greater than one or not using the count method..

How do I find the most repeated words in word?

Split a line at a time and store in an array. Iterate through the array and find the frequency of each word and compare the frequency with maxcount..
import java. ... .
import java. ... .
import java. ... .
public class MostRepeatedWord {.
public static void main[String[] args] throws Exception {.
String line, word = "";.

Chủ Đề