Python regex remove duplicate words

I am very new a Python

I want to change sentence if there are repeated words.


  • Ex. "this just so so so nice" --> "this is just so nice"
  • Ex. "this is just is is" --> "this is just is"

Right now am I using this reg. but it do all so change on letters. Ex. "My friend and i is happy" --> "My friend and is happy" (it remove the "i" and space) ERROR

text = re.sub(r'(\w+)\1', r'\1', text) #remove duplicated words in row

How can I do the same change but instead of letters it have to check on words?

asked Jun 21, 2013 at 15:08

text = re.sub(r'\b(\w+)( \1\b)+', r'\1', text) #remove duplicated words in row

The \b matches the empty string, but only at the beginning or end of a word.

answered Jun 21, 2013 at 15:15


Non- regex solution using itertools.groupby:

>>> strs = "this is just is is"
>>> from itertools import groupby
>>> " ".join([k for k,v in groupby(strs.split())])
'this is just is'
>>> strs = "this just so so so nice" 
>>> " ".join([k for k,v in groupby(strs.split())])
'this just so nice'

answered Jun 21, 2013 at 15:10

Python regex remove duplicate words

Ashwini Chaudhary

  • \b: Matches Word Boundaries

  • \w: Any word character

  • \1: Replaces the matches with the second word found

      import re
      def Remove_Duplicates(Test_string):
          Pattern = r"\b(\w+)(?:\W\1\b)+"
          return re.sub(Pattern, r"\1", Test_string, flags=re.IGNORECASE)
      Test_string1 = "Good bye bye world world"
      Test_string2 = "Ram went went to to his home"
      Test_string3 = "Hello hello world world"


    Good bye world
    Ram went to his home
    Hello world

answered Feb 17, 2021 at 19:22

Python regex remove duplicate words

    Given a string str which represents a sentence, the task is to remove the duplicate words from sentences using regular expression in java.

    Input: str = “Good bye bye world world” 
    Output: Good bye world 
    We remove the second occurrence of bye and world from Good bye bye world world
    Input: str = “Ram went went to to to his home” 
    Output: Ram went to his home 
    We remove the second occurrence of went and the second and third occurrences of to from Ram went went to to to his home.
    Input: str = “Hello hello world world” 
    Output: Hello world 
    We remove the second occurrence of hello and world from Hello hello world world. 


    1. Get the sentence.
    2. Form a regular expression to remove duplicate words from sentences. 
    regex = "\\b(\\w+)(?:\\W+\\1\\b)+";
    1. The details of the above regular expression can be understood as: 
      • “\\b”: A word boundary. Boundaries are needed for special cases. For example, in “My thesis is great”, “is” wont be matched twice.
      • “\\w+” A word character: [a-zA-Z_0-9] 
      • “\\W+”: A non-word character: [^\w] 
      • “\\1”: Matches whatever was matched in the 1st group of parentheses, which in this case is the (\w+) 
      • “+”: Match whatever it’s placed after 1 or more times 
    2. Match the sentence with the Regex. In Java, this can be done using Pattern.matcher().
    3. return the modified sentence.

    Below is the implementation of the above approach:




    using namespace std;

    string removeDuplicateWords(string s)


      const regex pattern("\\b(\\w+)(?:\\W+\\1\\b)+", regex_constants::icase);

      string answer = s;

      for (auto it = sregex_iterator(s.begin(), s.end(), pattern);

           it != sregex_iterator(); it++)


          smatch match;

          match = *it;

          answer.replace(answer.find(match.str(0)), match.str(0).length(), match.str(1));


      return answer;


    int main()


      string str1

          = "Good bye bye world world";

      cout << removeDuplicateWords(str1) << endl;

      string str2

          = "Ram went went to to his home";

      cout << removeDuplicateWords(str2) << endl;

      string str3

          = "Hello hello world world";

      cout << removeDuplicateWords(str3) << endl;

      return 0;



    import java.util.regex.Matcher;

    import java.util.regex.Pattern;

    class GFG {

        public static String

        removeDuplicateWords(String input)


            String regex

                = "\\b(\\w+)(?:\\W+\\1\\b)+";

            Pattern p

                = Pattern.compile(



            Matcher m = p.matcher(input);

            while (m.find()) {


                    = input.replaceAll(




            return input;


        public static void main(String args[])


            String str1

                = "Good bye bye world world";



            String str2

                = "Ram went went to to his home";



            String str3

                = "Hello hello world world";






    import re

    def removeDuplicateWords(input):

        regex = r'\b(\w+)(?:\W+\1\b)+'

        return re.sub(regex, r'\1', input, flags=re.IGNORECASE)

    str1 = "Good bye bye world world"


    str2 = "Ram went went to to his home"


    str3 = "Hello hello world world"



    Good bye world
    Ram went to his home
    Hello world

