Check if string contains only specific characters python

In Python, how to check if a string only contains certain characters?

I need to check a string containing only a..z, 0..9, and . [period] and no other character.

I could iterate over each character and check the character is a..z or 0..9, or . but that would be slow.

I am not clear now how to do it with a regular expression.

Is this correct? Can you suggest a simpler regular expression or a more efficient approach.

#Valid chars . a-z 0-9 
def check[test_str]:
    import re
    #//docs.python.org/library/re.html
    #re.search returns None if no position in the string matches the pattern
    #pattern to search for any character other then . a-z 0-9
    pattern = r'[^\.a-z0-9]'
    if re.search[pattern, test_str]:
        #Character other then . a-z 0-9 was found
        print 'Invalid : %r' % [test_str,]
    else:
        #No character other then . a-z 0-9 was found
        print 'Valid   : %r' % [test_str,]

check[test_str='abcde.1']
check[test_str='abcde.1#']
check[test_str='ABCDE.12']
check[test_str='_-/>"!@#12345abcde>> 
Valid   : "abcde.1"
Invalid : "abcde.1#"
Invalid : "ABCDE.12"
Invalid : "_-/>"!@#12345abcde>>reg.match['jsdlfjdsf12324..3432jsdflsdf']
True

but match[] doesn't return True

[2] For use with match[], the ^ at the start of the pattern is redundant, and appears to be slightly slower than the same pattern without the ^

[3] Should foster the use of raw string automatically unthinkingly for any re pattern

[4] The backslash in front of the dot/period is redundant

[5] Slower than the OP's code!

prompt>rem OP's version -- NOTE: OP used raw string!

prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile[r'[^a-z0-9\.]']" "not bool[reg.search[t]]"
1000000 loops, best of 3: 1.43 usec per loop

prompt>rem OP's version w/o backslash

prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile[r'[^a-z0-9.]']" "not bool[reg.search[t]]"
1000000 loops, best of 3: 1.44 usec per loop

prompt>rem cleaned-up version of accepted answer

prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile[r'[a-z0-9.]+\Z']" "bool[reg.match[t]]"
100000 loops, best of 3: 2.07 usec per loop

prompt>rem accepted answer

prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile['^[a-z0-9\.]+$']" "bool[reg.match[t]]"
100000 loops, best of 3: 2.08 usec per loop

[6] Can produce the wrong answer!!

>>> import re
>>> bool[re.compile['^[a-z0-9\.]+$'].match['1234\n']]
True # uh-oh
>>> bool[re.compile['^[a-z0-9\.]+\Z'].match['1234\n']]
False

answered Aug 24, 2009 at 23:12

John MachinJohn Machin

79.4k11 gold badges138 silver badges183 bronze badges

3

Simpler approach? A little more Pythonic?

>>> ok = "0123456789abcdef"
>>> all[c in ok for c in "123456abc"]
True
>>> all[c in ok for c in "hello world"]
False

It certainly isn't the most efficient, but it's sure readable.

answered Aug 24, 2009 at 16:26

Mark RushakoffMark Rushakoff

241k44 gold badges401 silver badges395 bronze badges

3

EDIT: Changed the regular expression to exclude A-Z

Regular expression solution is the fastest pure python solution so far

reg=re.compile['^[a-z0-9\.]+$']
>>>reg.match['jsdlfjdsf12324..3432jsdflsdf']
True
>>> timeit.Timer["reg.match['jsdlfjdsf12324..3432jsdflsdf']", "import re; reg=re.compile['^[a-z0-9\.]+$']"].timeit[]
0.70509696006774902

Compared to other solutions:

>>> timeit.Timer["set['jsdlfjdsf12324..3432jsdflsdf'] 

Chủ Đề