Hướng dẫn python replace non alphanumeric

I have a string with which i want to replace any character that isn't a standard character or number such as (a-z or 0-9) with an asterisk. For example, "h^&ell`.,|o w]{+orld" is replaced with "h*ell*o*w*orld". Note that multiple characters such as "^&" get replaced with one asterisk. How would I go about doing this?

Nội dung chính

  • Not the answer you're looking for? Browse other questions tagged python or ask your own question.
  • How do you replace non alphabetic characters in Python?
  • How do I remove non alphabetic characters from a string?
  • How do you find a non alphanumeric character in Python?
  • How do you replace non alphanumeric characters with empty strings?

nneonneo

165k35 gold badges293 silver badges368 bronze badges

asked Oct 20, 2012 at 5:10

2

Regex to the rescue!

import re

s = re.sub('[^0-9a-zA-Z]+', '*', s)

Example:

>>> re.sub('[^0-9a-zA-Z]+', '*', 'h^&ell`.,|o w]{+orld')
'h*ell*o*w*orld'

answered Oct 20, 2012 at 5:11

nneonneonneonneo

165k35 gold badges293 silver badges368 bronze badges

6

The pythonic way.

print "".join([ c if c.isalnum() else "*" for c in s ])

This doesn't deal with grouping multiple consecutive non-matching characters though, i.e.

"h^&i => "h**i not "h*i" as in the regex solutions.

crizCraig

7,9185 gold badges53 silver badges52 bronze badges

answered Feb 28, 2014 at 13:27

baloanbaloan

6555 silver badges7 bronze badges

Try:

s = filter(str.isalnum, s)

in Python3:

s = ''.join(filter(str.isalnum, s))

Edit: realized that the OP wants to replace non-chars with '*'. My answer does not fit

answered Jan 5, 2015 at 5:15

DonDon

16.4k11 gold badges61 silver badges97 bronze badges

0

Use \W which is equivalent to [^a-zA-Z0-9_]. Check the documentation, https://docs.python.org/2/library/re.html

import re
s =  'h^&ell`.,|o w]{+orld'
replaced_string = re.sub(r'\W+', '*', s)
output: 'h*ell*o*w*orld'

update: This solution will exclude underscore as well. If you want only alphabets and numbers to be excluded, then solution by nneonneo is more appropriate.

Csaba Toth

9,2265 gold badges70 silver badges112 bronze badges

answered Aug 12, 2016 at 18:54

psunpsun

5659 silver badges13 bronze badges

2

Not the answer you're looking for? Browse other questions tagged python or ask your own question.

How do you replace non alphabetic characters in Python?

1. Using regular expressions. A simple solution is to use regular expressions for removing non-alphanumeric characters from a string. The idea is to use the special character \W , which matches any character which is not a word character.

How do I remove non alphabetic characters from a string?

A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .

How do you find a non alphanumeric character in Python?

Python String isalnum() Method The isalnum() method returns True if all the characters are alphanumeric, meaning alphabet letter (a-z) and numbers (0-9). Example of characters that are not alphanumeric: (space)! #%&? etc.

How do you replace non alphanumeric characters with empty strings?

The approach is to use the String. replaceAll method to replace all the non-alphanumeric characters with an empty string.

Replace all non-alphanumeric characters in a Python string #

Use the re.sub() method to replace all non-alphanumeric characters in a string, e.g. new_str = re.sub(r'[^a-zA-Z0-9]', '|', my_str). The re.sub() method will return a new string where all occurrences of non-alphanumeric characters are replaced by the provided replacement.

Copied!

import re my_str = 'apple, kiwi, banana' # ✅ Replace all non-alphanumeric characters in string (re.sub()) new_str = re.sub(r'[^a-zA-Z0-9]', '|', my_str) print(new_str) # 👉️ 'apple||kiwi||banana' # ✅ Replace one or more consecutive non-alphanumeric characters with a single character new_str = re.sub(r'[^a-zA-Z0-9]+', '|', my_str) print(new_str) # 👉️ 'apple|kiwi|banana' # ✅ Replace all non-alphanumeric characters in string, preserving whitespace new_str = re.sub(r'[^a-zA-Z0-9\s]', '|', my_str) print(new_str) # 👉️ 'apple| kiwi| banana' # ---------------------------------------------------------------- # ✅ Replace all non-alphanumeric characters in string (generator expression) new_str = ''.join(char if char.isalnum() else '|' for char in my_str) print(new_str) # 👉️ 'apple||kiwi||banana' # ✅ Preserve whitespace new_str = ''.join(char if char.isalnum() or char == ' ' else '|' for char in my_str) print(new_str) # 👉️ 'apple| kiwi| banana'

The first example uses the re.sub() method to replace all non-alphanumeric characters in a string.

The re.sub method returns a new string that is obtained by replacing the occurrences of the pattern with the provided replacement.

Copied!

import re my_str = 'apple, kiwi, banana' new_str = re.sub(r'[^a-zA-Z0-9]', '|', my_str) print(new_str) # 👉️ 'apple||kiwi||banana' new_str = re.sub(r'[^a-zA-Z0-9]+', '|', my_str) print(new_str) # 👉️ 'apple|kiwi|banana' new_str = re.sub(r'[^a-zA-Z0-9\s]', '|', my_str) print(new_str) # 👉️ 'apple| kiwi| banana'

If the pattern isn't found, the string is returned as is.

The first argument we passed to the re.sub() method is a regular expression.

The square brackets [] are used to indicate a set of characters.

The caret ^ at the beginning of the set means "NOT". In other words, match all characters that are NOT lowercase or uppercase letters or numbers.

The a-z and A-Z character ranges match the lowercase and uppercase letters in the range.

If you need to replace multiple, consecutive non-alphanumeric characters with a single replacement string, add a plus + at the end of the regex.

Copied!

import re my_str = 'apple, kiwi, banana' new_str = re.sub(r'[^a-zA-Z0-9]+', '|', my_str) print(new_str) # 👉️ 'apple|kiwi|banana'

The plus + matches the preceding character (any non-letter or non-number) 1 or more times.

We used a pipe | as the replacement character in the examples, however, you can use any other replacement string.

If you need to replace all non-alphanumeric characters in a string and preserve the whitespace, use the following regular expression.

Copied!

import re my_str = 'apple, kiwi, banana' new_str = re.sub(r'[^a-zA-Z0-9\s]', '|', my_str) print(new_str) # 👉️ 'apple| kiwi| banana'

The \s character matches unicode whitespace characters like [ \t\n\r\f\v].

If you ever need help reading or writing a regular expression, consult the regular expression syntax subheading in the official docs.

The page contains a list of all of the special characters with many useful examples.

Alternatively, you can use a generator expression.

To replace all non-alphanumeric characters in a string:

  1. Use a generator expression to iterate over the string.
  2. Return the character if it's alphanumeric, otherwise return the replacement.
  3. Use the join() method to join the characters into a string.

Copied!

my_str = 'apple, kiwi, banana' new_str = ''.join(char if char.isalnum() else '|' for char in my_str) print(new_str) # 👉️ 'apple||kiwi||banana' # ✅ Preserve whitespace new_str = ''.join(char if char.isalnum() or char == ' ' else '|' for char in my_str) print(new_str) # 👉️ 'apple| kiwi| banana'

We used a generator expression to iterate over the string.

Generator expressions are used to perform some operation for every element or select a subset of elements that meet a condition.

On each iteration, we use the str.isalnum() method to check if the current character is alphanumeric.

The str.isalnum method returns True if all characters in the string are alphanumeric and the string contains at least one character, otherwise the method returns False.

Copied!

print('C'.isalnum()) # 👉️ True print('^'.isalnum()) # 👉️ False

If the character is alphanumeric, we return the character, otherwise we return the replacement string.

The last step is to join the list of characters into a string.

Copied!

my_str = 'apple, kiwi, banana' new_str = ''.join(char if char.isalnum() else '|' for char in my_str) print(new_str) # 👉️ 'apple||kiwi||banana'

The str.join method takes an iterable as an argument and returns a string which is the concatenation of the strings in the iterable.

The string the method is called on is used as the separator between the elements.

For our purposes, we call the join() method on an empty string to join the characters without a separator.

If you need to preserve the whitespace, use the boolean or operator.

Copied!

my_str = 'apple, kiwi, banana' new_str = ''.join(char if char.isalnum() or char == ' ' else '|' for char in my_str) print(new_str) # 👉️ 'apple| kiwi| banana'

We used the boolean or operator, so for the character to be added to the generator object, one of the conditions has to be met.

The character has to be alphanumeric or it has to be a space.