In Python, how to check if a string only contains certain characters?
I need to check a string containing only a..z, 0..9, and . [period] and no other character.
I could iterate over each character and check the character is a..z or 0..9, or . but that would be slow.
I am not clear now how to do it with a regular expression.
Is this correct? Can you suggest a simpler regular expression or a more efficient approach.
#Valid chars . a-z 0-9
def check[test_str]:
import re
#//docs.python.org/library/re.html
#re.search returns None if no position in the string matches the pattern
#pattern to search for any character other then . a-z 0-9
pattern = r'[^\.a-z0-9]'
if re.search[pattern, test_str]:
#Character other then . a-z 0-9 was found
print 'Invalid : %r' % [test_str,]
else:
#No character other then . a-z 0-9 was found
print 'Valid : %r' % [test_str,]
check[test_str='abcde.1']
check[test_str='abcde.1#']
check[test_str='ABCDE.12']
check[test_str='_-/>"!@#12345abcde>>
Valid : "abcde.1"
Invalid : "abcde.1#"
Invalid : "ABCDE.12"
Invalid : "_-/>"!@#12345abcde>>reg.match['jsdlfjdsf12324..3432jsdflsdf']
True
but match[] doesn't return True
[2] For use with match[], the ^
at the start of the pattern is redundant, and appears to be slightly slower than the same pattern without the ^
[3] Should foster the use of raw string automatically unthinkingly for any re pattern
[4] The backslash in front of the dot/period is redundant
[5] Slower than the OP's code!
prompt>rem OP's version -- NOTE: OP used raw string!
prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile[r'[^a-z0-9\.]']" "not bool[reg.search[t]]"
1000000 loops, best of 3: 1.43 usec per loop
prompt>rem OP's version w/o backslash
prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile[r'[^a-z0-9.]']" "not bool[reg.search[t]]"
1000000 loops, best of 3: 1.44 usec per loop
prompt>rem cleaned-up version of accepted answer
prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile[r'[a-z0-9.]+\Z']" "bool[reg.match[t]]"
100000 loops, best of 3: 2.07 usec per loop
prompt>rem accepted answer
prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile['^[a-z0-9\.]+$']" "bool[reg.match[t]]"
100000 loops, best of 3: 2.08 usec per loop
[6] Can produce the wrong answer!!
>>> import re
>>> bool[re.compile['^[a-z0-9\.]+$'].match['1234\n']]
True # uh-oh
>>> bool[re.compile['^[a-z0-9\.]+\Z'].match['1234\n']]
False
answered Aug 24, 2009 at 23:12
John MachinJohn Machin
79.4k11 gold badges138 silver badges183 bronze badges
3
Simpler approach? A little more Pythonic?
>>> ok = "0123456789abcdef"
>>> all[c in ok for c in "123456abc"]
True
>>> all[c in ok for c in "hello world"]
False
It certainly isn't the most efficient, but it's sure readable.
answered Aug 24, 2009 at 16:26
Mark RushakoffMark Rushakoff
241k44 gold badges401 silver badges395 bronze badges
3
EDIT: Changed the regular expression to exclude A-Z
Regular expression solution is the fastest pure python solution so far
reg=re.compile['^[a-z0-9\.]+$']
>>>reg.match['jsdlfjdsf12324..3432jsdflsdf']
True
>>> timeit.Timer["reg.match['jsdlfjdsf12324..3432jsdflsdf']", "import re; reg=re.compile['^[a-z0-9\.]+$']"].timeit[]
0.70509696006774902
Compared to other solutions:
>>> timeit.Timer["set['jsdlfjdsf12324..3432jsdflsdf']