Python regex remove duplicate words

I am very new a Python

I want to change sentence if there are repeated words.

Correct

  • Ex. "this just so so so nice" --> "this is just so nice"
  • Ex. "this is just is is" --> "this is just is"

Right now am I using this reg. but it do all so change on letters. Ex. "My friend and i is happy" --> "My friend and is happy" [it remove the "i" and space] ERROR

text = re.sub[r'[\w+]\1', r'\1', text] #remove duplicated words in row

How can I do the same change but instead of letters it have to check on words?

asked Jun 21, 2013 at 15:08

text = re.sub[r'\b[\w+][ \1\b]+', r'\1', text] #remove duplicated words in row

The \b matches the empty string, but only at the beginning or end of a word.

answered Jun 21, 2013 at 15:15

tomtom

20.4k6 gold badges40 silver badges36 bronze badges

Non- regex solution using itertools.groupby:

>>> strs = "this is just is is"
>>> from itertools import groupby
>>> " ".join[[k for k,v in groupby[strs.split[]]]]
'this is just is'
>>> strs = "this just so so so nice" 
>>> " ".join[[k for k,v in groupby[strs.split[]]]]
'this just so nice'

answered Jun 21, 2013 at 15:10

Ashwini ChaudharyAshwini Chaudhary

236k55 gold badges442 silver badges495 bronze badges

1

  • \b: Matches Word Boundaries

  • \w: Any word character

  • \1: Replaces the matches with the second word found

      import re
    
    
      def Remove_Duplicates[Test_string]:
          Pattern = r"\b[\w+][?:\W\1\b]+"
          return re.sub[Pattern, r"\1", Test_string, flags=re.IGNORECASE]
    
    
      Test_string1 = "Good bye bye world world"
      Test_string2 = "Ram went went to to his home"
      Test_string3 = "Hello hello world world"
      print[Remove_Duplicates[Test_string1]]
      print[Remove_Duplicates[Test_string2]]
      print[Remove_Duplicates[Test_string3]]
    

Result:

    Good bye world
    Ram went to his home
    Hello world

answered Feb 17, 2021 at 19:22

Not the answer you're looking for? Browse other questions tagged python regex or ask your own question.

View Discussion

Improve Article

Save Article

  • Read
  • Discuss
  • View Discussion

    Improve Article

    Save Article

    Given a string str which represents a sentence, the task is to remove the duplicate words from sentences using regular expression in java.
    Examples: 
     

    Input: str = “Good bye bye world world” 
    Output: Good bye world 
    Explanation: 
    We remove the second occurrence of bye and world from Good bye bye world world
    Input: str = “Ram went went to to to his home” 
    Output: Ram went to his home 
    Explanation: 
    We remove the second occurrence of went and the second and third occurrences of to from Ram went went to to to his home.
    Input: str = “Hello hello world world” 
    Output: Hello world 
    Explanation: 
    We remove the second occurrence of hello and world from Hello hello world world. 
     

    Approach
     

    1. Get the sentence.
    2. Form a regular expression to remove duplicate words from sentences. 
       
    regex = "\\b[\\w+][?:\\W+\\1\\b]+";
    1. The details of the above regular expression can be understood as: 
      • “\\b”: A word boundary. Boundaries are needed for special cases. For example, in “My thesis is great”, “is” wont be matched twice.
      • “\\w+” A word character: [a-zA-Z_0-9] 
         
      • “\\W+”: A non-word character: [^\w] 
         
      • “\\1”: Matches whatever was matched in the 1st group of parentheses, which in this case is the [\w+] 
         
      • “+”: Match whatever it’s placed after 1 or more times 
         
    2. Match the sentence with the Regex. In Java, this can be done using Pattern.matcher[].
       
    3. return the modified sentence.

    Below is the implementation of the above approach:
     

    C++

    #include

    #include

    using namespace std;

    string removeDuplicateWords[string s]

    {

      const regex pattern["\\b[\\w+][?:\\W+\\1\\b]+", regex_constants::icase];

      string answer = s;

      for [auto it = sregex_iterator[s.begin[], s.end[], pattern];

           it != sregex_iterator[]; it++]

      {

          smatch match;

          match = *it;

          answer.replace[answer.find[match.str[0]], match.str[0].length[], match.str[1]];

      }

      return answer;

    }

    int main[]

    {

      string str1

          = "Good bye bye world world";

      cout

    Chủ Đề