I am very new a Python
I want to change sentence if there are repeated words.
Correct
- Ex. "this just so so so nice" --> "this is just so nice"
- Ex. "this is just is is" --> "this is just is"
Right now am I using this reg. but it do all so change on letters. Ex. "My friend and i is happy" --> "My friend and is happy" [it remove the "i" and space] ERROR
text = re.sub[r'[\w+]\1', r'\1', text] #remove duplicated words in row
How can I do the same change but instead of letters it have to check on words?
asked Jun 21, 2013 at 15:08
text = re.sub[r'\b[\w+][ \1\b]+', r'\1', text] #remove duplicated words in row
The \b
matches the empty string, but only at the beginning or end of a word.
answered Jun 21, 2013 at 15:15
tomtom
20.4k6 gold badges40 silver badges36 bronze badges
Non- regex solution using itertools.groupby
:
>>> strs = "this is just is is"
>>> from itertools import groupby
>>> " ".join[[k for k,v in groupby[strs.split[]]]]
'this is just is'
>>> strs = "this just so so so nice"
>>> " ".join[[k for k,v in groupby[strs.split[]]]]
'this just so nice'
answered Jun 21, 2013 at 15:10
Ashwini ChaudharyAshwini Chaudhary
236k55 gold badges442 silver badges495 bronze badges
1
\b: Matches Word Boundaries
\w: Any word character
\1: Replaces the matches with the second word found
import re def Remove_Duplicates[Test_string]: Pattern = r"\b[\w+][?:\W\1\b]+" return re.sub[Pattern, r"\1", Test_string, flags=re.IGNORECASE] Test_string1 = "Good bye bye world world" Test_string2 = "Ram went went to to his home" Test_string3 = "Hello hello world world" print[Remove_Duplicates[Test_string1]] print[Remove_Duplicates[Test_string2]] print[Remove_Duplicates[Test_string3]]
Result:
Good bye world
Ram went to his home
Hello world
answered Feb 17, 2021 at 19:22
Not the answer you're looking for? Browse other questions tagged python regex or ask your own question.
View Discussion
Improve Article
Save Article
View Discussion
Improve Article
Save Article
Given a string str which represents a sentence, the task is to
remove the duplicate words from sentences using regular expression in java.
Examples:
Input: str = “Good bye bye world world”
Output: Good bye world
Explanation:
We remove the second occurrence of bye and world from Good bye bye world world
Input: str = “Ram went went to to to his home”
Output: Ram went to his home
Explanation:
We remove the second occurrence of went and the second and third occurrences of to from Ram went went to to to his home.
Input: str = “Hello hello world world”
Output: Hello world
Explanation:
We remove the second occurrence of hello and world from Hello hello world world.
Approach
- Get the sentence.
- Form a regular expression to remove duplicate words from sentences.
regex = "\\b[\\w+][?:\\W+\\1\\b]+";
- The details of the above regular expression can be understood as:
- “\\b”: A word boundary. Boundaries are needed for special cases. For example, in “My thesis is great”, “is” wont be matched twice.
- “\\w+” A word character: [a-zA-Z_0-9]
- “\\W+”: A non-word character: [^\w]
- “\\1”: Matches whatever was matched in the 1st group of parentheses, which in this case is the [\w+]
- “+”: Match whatever it’s placed after 1 or more
times
- Match the sentence with the Regex. In Java, this can be done using Pattern.matcher[].
- return the modified sentence.
Below is the implementation of the above approach:
C++
#include
#include
using
namespace
std;
string removeDuplicateWords[string s]
{
const
regex pattern[
"\\b[\\w+][?:\\W+\\1\\b]+"
, regex_constants::icase];
string answer = s;
for
[
auto
it = sregex_iterator[s.begin[], s.end[], pattern];
it != sregex_iterator[]; it++]
{
smatch match;
match = *it;
answer.replace[answer.find[match.str[0]], match.str[0].length[], match.str[1]];
}
return
answer;
}
int
main[]
{
string str1
=
"Good bye bye world world"
;
cout