I can see where you are going with sort, as you can reliably know when you have hit a new word and keep track of counts for each unique word. However, what you really want to do is use a hash [dictionary] to keep track of the counts as dictionary keys are unique. For example:
words = sentence.split[]
counts = {}
for word in words:
if word not in counts:
counts[word] = 0
counts[word] += 1
Now that will give you a dictionary where the key is the word and the value is the number of times it appears. There are things you can do like using collections.defaultdict[int]
so you can just add the
value:
counts = collections.defaultdict[int]
for word in words:
counts[word] += 1
But there is even something better than that... collections.Counter
which will take your list of words and turn it into a dictionary [an extension of dictionary actually] containing the counts.
counts = collections.Counter[words]
From there you want the list of words in sorted order with their counts so you can print them. items[]
will give you a list of tuples, and sorted
will sort [by default] by the first item of each tuple [the word in this case]... which is exactly what you want.
import collections
sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
words = sentence.split[]
word_counts = collections.Counter[words]
for word, count in sorted[word_counts.items[]]:
print['"%s" is repeated %d time%s.' % [word, count, "s" if count > 1 else ""]]
OUTPUT
"As" is repeated 1 time.
"are" is repeated 2 times.
"as" is repeated 3 times.
"certain" is repeated 2 times.
"do" is repeated 1 time.
"far" is repeated 2 times.
"laws" is repeated 1 time.
"mathematics" is repeated 1 time.
"not" is repeated 2 times.
"of" is repeated 1 time.
"reality" is repeated 2 times.
"refer" is repeated 2 times.
"the" is repeated 1 time.
"they" is repeated 3 times.
"to" is repeated 2 times.
Explanation
In this program, we need to find out the duplicate words present in the string and display those words.
To find the duplicate words from the string, we first split the string into words. We count the occurrence of each word in the string. If count is greater than 1, it implies that a word has duplicate in the string.
In above example, the words highlighted in green are duplicate words.
Algorithm
- Define a string.
- Convert the string into lowercase to make the comparison insensitive.
- Split the string into words.
- Two loops will be used to find duplicate words. Outer loop will select a word and Initialize variable count to 1. Inner loop will compare the word selected by outer loop with rest of the words.
- If a match found, then increment the count by 1 and set the duplicates of word to '0' to avoid counting it again.
- After the inner loop, if count of a word is greater than 1 which signifies that the word has duplicates in the string.
Solution
Python
Output:
Duplicate words in a given string : big black
C
Output:
Duplicate words in a given string : big black
JAVA
Output:
Duplicate words in a given string : big black
C#
Output:
Duplicate words in a given string : big Black
PHP
Output:
Duplicate words in a given string : big black
Next Topic#
In this article, we will show you how to find the most repeated word in a given text file using python.
Assume we have taken a text file with the name ExampleTextFile.txt consisting of some random text. We will return the most repeated word in a given text file
ExampleTextFile.txt
Good Morning TutorialsPoint This is TutorialsPoint sample File Consisting of Specific source codes in Python,Seaborn,Scala Summary and Explanation Welcome TutorialsPoint Learn with a joy
Algorithm [Steps]
Following are the Algorithm/steps to be followed to perform the desired task −
Import the Counter function [The Counter class is a form of object data-set provided by Python3's collections module. The Collections module exposes specialized container datatypes to the user, serving as an alternative to Python's general-purpose built-ins such as dictionaries, lists, and tuples. The Counter is a subclass that counts hashable objects. When called, it creates an iterable hash table implicitly] from the collections module
Create a variable to store the path of the text file.
Create a list to store all the words.
Use the open[] function[opens a file and returns a file object as a result] to open the text file in read-only mode by passing the file name, and mode as arguments to it [Here “r” represents read-only mode].
with open[inputFile, 'r'] as filedata:
Traverse in each line of the file using the for loop.
Use the split[] function [splits a string into a list. We can define the separator; the default separator is any whitespace] to split the text file content into a list of words and store it in a variable.
Traverse in the list of words using the for loop.
Use the append[] function [adds the element to the list at the end], to append each word to the list.
Use the Counter[] function [which gives the frequency of words as a key-value pairs], to calculate the frequency [number of times the word has occurred] of all the words.
Create a variable to store the maximum frequency.
Loop in the above words frequency dictionary using the for loop.
Using the if conditional statement and the in keyword, check whether the frequency of the word is greater than the maximum frequency.
The in keyword works in two ways: The in keyword is used to determine whether a value exists in a sequence [list, range, string etc]. It is also used to iterate through a sequence in a for loop
If the frequency of the word is greater than the maximum frequency.
Create a variable to store the most repeated word in a text file.
Print the most repeated word in a text file.
Close the input file with the close[] function [used to close an opened file].
Example
The following program traverses lines of a text file and prints the frequency of key-value pair from a text file using the counter function from the collections module -
from collections import Counter inputFile = "ExampleTextFile.txt" newWordsList = [] with open[inputFile, 'r'] as filedata: for textline in filedata: wordsList = textline.split[] for word in wordsList: newWordsList.append[word] wordsFrequency = Counter[newWordsList] maxFrequency = 0 for textword in wordsFrequency: if[wordsFrequency[textword] > maxFrequency]: maxFrequency = wordsFrequency[textword] mostRepeatedWord = textword print["{",mostRepeatedWord,"} is the most repeated word in a text file"] filedata.close[]
Output
On executing, the above program will generate the following output −
{ TutorialsPoint } is the most repeated word in a text file
In this program, we read some random text from a text file. We read over the entire file, breaking it down into words, and adding all of the text file's words to the list. We used the Counter[] method to count the frequency of all the words in the text file, which returns a dictionary with keys as words and values as the frequency of the word. Then we iterated over the dictionary's words, checking whether the frequency was greater than the maximum frequency. If it was, this was the most frequent word, so we saved the result in a variable and updated the maximum frequency with the frequency of the current word. Finally, we displayed the most frequent word.
Conclusion
This article showed us how to read a file, traverse it line by line, and retrieve all the words in that line. Once we get them, we may reverse the words, change the case, check the vowels, retrieve the word length, etc. We also learned how to use the Counter[] method to determine the frequency of a list of words. This function may be used to determine the frequency of a string, list, tuple and so on.
Updated on 18-Aug-2022 08:50:24
- Related Questions & Answers
- Find the second most repeated word in a sequence in Java
- Second most repeated word in a sequence in Python?
- How to find and replace the word in a text file using PowerShell?
- Find the first repeated word in a string in Python using Dictionary
- Find the first repeated word in a string in Python?
- How to Count Word Occurrences in a Text File using Shell Script?
- Write a program in Python to find the most repeated element in a series
- How to find and replace within a text file using Python?
- Find the first repeated word in a string in Java
- Find the first repeated word in a string in C++
- How to find Second most repeated string in a sequence in android?
- How to find a file using Python?
- How to find the most recent file in a directory on Linux?
- How to search and replace text in a file using Python?
- How to write a single line in text file using Python?
- How to read a text file in Python?