Python remove special characters from string
I need to remove all special characters, punctuation and spaces from a string so that I only have letters and numbers. Show
jscs 63.3k13 gold badges149 silver badges193 bronze badges asked Apr 30, 2011 at 17:41
This can be done without regex:
You can use
If you insist on using regex, other solutions will do fine. However note that if it can be done without using a regular expression, that's the best way to go about it.
wjandrea 24.2k8 gold badges51 silver badges71 bronze badges answered Apr 30, 2011 at 17:47
user225312user225312 120k66 gold badges167 silver badges181 bronze badges 6 Here is a regex to match a string of characters that are not a letters or numbers:
Here is the Python command to do a regex substitution:
wjandrea 24.2k8 gold badges51 silver badges71 bronze badges answered Apr 30, 2011 at 17:46
Andy WhiteAndy White 84.8k47 gold badges173 silver badges208 bronze badges 9 Shorter way :
If you want spaces between words and numbers substitute '' with ' ' answered Aug 7, 2014 at 13:26
tuxErrantetuxErrante 1,16410 silver badges18 bronze badges 6 TLDRI timed the provided answers.
is typically 3x faster than the next fastest provided top answer. Caution should be taken when using this option. Some special characters (e.g. ø) may not be striped using this method. After seeing this, I was interested in expanding on the provided answers by finding out which executes in the least amount of time, so I went through and checked some of the proposed answers with
Example 1
Example 2
Example 3
The above results are a product of the lowest returned result from an average of: Example 3 can be 3x faster than Example 1. answered Aug 6, 2016 at 1:04
mbeacommbeacom 1,32814 silver badges25 bronze badges 7 Python 2.*I think just
Python 3.*In Python3,
or to pass
note: unpacking in answered Apr 14, 2016 at 9:32
Grijesh ChauhanGrijesh Chauhan 55.6k19 gold badges134 silver badges199 bronze badges 4
you can add more special character and that will be replaced by '' means nothing i.e they will be removed. answered May 25, 2014 at 9:28
pkmpkm 2,6151 gold badge27 silver badges44 bronze badges 0 Differently than everyone else did using regex, I would try to exclude every character that is not what I want, instead of enumerating explicitly what I don't want. For example, if I want only characters from 'a to z' (upper and lower case) and numbers, I would exclude everything else:
This means "substitute every character that is not a number, or a character in the range 'a to z' or 'A to Z' with an empty string". In fact, if you insert the special character Extra tip: if you also need to lowercase the result, you can make the regex even faster and easier, as long as you won't find any uppercase now.
answered Sep 5, 2018 at 10:02
AndreaAndrea 3,9304 gold badges34 silver badges53 bronze badges string.punctuation contains following characters:
You can use translate and maketrans functions to map punctuations to empty values (replace)
Output:
answered Mar 17, 2020 at 15:14
Vlad BezdenVlad Bezden 75.2k23 gold badges234 silver badges174 bronze badges
answered Jun 15, 2018 at 12:09
snehasneha 7596 silver badges7 bronze badges Assuming you want to use a regex and you want/need Unicode-cognisant 2.x code that is 2to3-ready:
answered Apr 30, 2011 at 21:07
John MachinJohn Machin 79.3k11 gold badges137 silver badges182 bronze badges The most generic approach is using the 'categories' of the unicodedata table which classifies every single character. E.g. the following code filters only printable characters based on their category:
Look at the given URL above for all related categories. You also can of course filter by the punctuation categories.
BioGeek 21.1k21 gold badges80 silver badges137 bronze badges answered Apr 30, 2011 at 18:00
2 For other languages like
German, Spanish, Danish, French etc that contain special characters (like German "Umlaute" as Example for German:
answered Jun 27, 2020 at 10:00
petezurichpetezurich 8,2808 gold badges37 silver badges54 bronze badges This will remove all special characters, punctuation, and spaces from a string and only have numbers and letters.
answered May 11, 2021 at 8:29
Use translate:
Caveat: Only works on ascii strings. answered Mar 23, 2016 at 19:37
jjmurrejjmurre 3323 silver badges14 bronze badges 2 This will remove all non-alphanumeric characters except spaces.
Dharman♦ 27.6k21 gold badges75 silver badges126 bronze badges answered Feb 1, 2021 at 16:57
0
same as double quotes."""
answered Jul 16, 2018 at 11:52
After 10 Years, below I wrote there is the best solution. You can remove/clean all special characters, punctuation, ASCII characters and spaces from the string.
answered Oct 27, 2021 at 13:21
answered Apr 6 at 15:02
Art BinduArt Bindu 5213 silver badges11 bronze badges
and you shall see your result as 'askhnlaskdjalsdk answered Feb 25, 2016 at 8:00
Dsw WdsDsw Wds 4644 silver badges17 bronze badges 1 Not the answer you're looking for? Browse other questions tagged python regex string or ask your own question.How I can remove the special characters from string in Python?Using 'str.
replace() , we can replace a specific character. If we want to remove that specific character, replace that character with an empty string. The str. replace() method will replace all occurrences of the specific character mentioned.
How do I remove special characters from a string?Example of removing special characters using replaceAll() method. public class RemoveSpecialCharacterExample1.. public static void main(String args[]). String str= "This#string%contains^special*characters&.";. str = str.replaceAll("[^a-zA-Z0-9]", " ");. System.out.println(str);. How do I remove multiple special characters from a string in Python?Remove Multiple Characters from a String in Python. Using nested replace(). Using translate() & maketrans(). Using subn(). Using sub(). How do I remove special characters from a string in pandas?Add df = df. astype(float) after the replace and you've got it. I'd skip inplace and just do df = df. replace('\*', '', regex=True).
|