Have a set of string as follows
text:u'MUC-EC-099_SC-Memory-01_TC-25'
text:u'MUC-EC-099_SC-Memory-01_TC-26'
text:u'MUC-EC-099_SC-Memory-01_TC-27'
These data i have extracted from a Xls file and converted to string
, now i have to Extract data which is inside single quotes and put them in a list.
expecting output like
[MUC-EC-099_SC-Memory-01_TC-25, MUC-EC-099_SC-Memory-01_TC-26,MUC-EC-099_SC-Memory-01_TC-27]
Thanks in advance.
asked Oct 18, 2013 at 12:29
Use re.findall
:
>>> import re
>>> strs = """text:u'MUC-EC-099_SC-Memory-01_TC-25'
text:u'MUC-EC-099_SC-Memory-01_TC-26'
text:u'MUC-EC-099_SC-Memory-01_TC-27'"""
>>> re.findall[r"'[.*?]'", strs, re.DOTALL]
['MUC-EC-099_SC-Memory-01_TC-25',
'MUC-EC-099_SC-Memory-01_TC-26',
'MUC-EC-099_SC-Memory-01_TC-27'
]
answered Oct 18, 2013 at 12:34
Ashwini ChaudharyAshwini Chaudhary
236k56 gold badges444 silver badges495 bronze badges
You can use the following expression:
[?>> sheet.cell[2,2]
number:4.0
>>> sheet.cell[3,3]
text:u'C'
To get the unwrapped object, use .value
:
>>> sheet.cell[3,3].value
u'C'
[Remember that the u
here is simply telling you the string is unicode; it's not a problem.]
answered Oct 18, 2013 at 12:45
DSMDSM
326k62 gold badges572 silver badges479 bronze badges
Extract strings between quotes in Python #
Use the re.findall[]
method to extract strings between quotes, e.g. my_list = re.findall[r'"[[^"]*]"', my_str]
. The re.findall
method will match the provided pattern in the string and will return a list containing the strings between the quotes.
Copied!
import re # ✅ extract string between double quotes my_str = 'One "Two" Three "Four"' my_list = re.findall[r'"[[^"]*]"', my_str] print[my_list] # 👉️ ['Two', 'Four'] print[my_list[0]] # 👉️ 'Two' print[my_list[1]] # 👉️ 'Four' # --------------------------------------------------- # ✅ extract string between single quotes my_str_2 = "One 'Two' Three 'Four'" my_list_2 = re.findall[r"'[[^']*]'", my_str_2] print[my_list_2] # 👉️ ['Two', 'Four']
The first example in the code snippet extracts strings between double quotes, and the second extracts strings between single quotes.
The re.findall method takes a pattern and a string as arguments and returns a list of strings containing all non-overlapping matches of the pattern in the string.
Let's look at the regular expression in the first example.
Copied!
import re # ✅ extract string between double quotes my_str = 'One "Two" Three "Four"' my_list = re.findall[r'"[[^"]*]"', my_str] print[my_list] # 👉️ ['Two', 'Four'] print[my_list[0]] # 👉️ 'Two' print[my_list[1]] # 👉️ 'Four'
The regex starts and ends with double quotes because we want to match anything that is inside of double quotes in the string.
The parentheses []
in the regular expression match whatever is inside and indicate the start and end of a group.
The group's contents can still be retrieved after the match.
The square brackets []
are used to indicate a set of characters.
The caret ^
at the beginning of the set means "NOT". In other words,
match all characters that are NOT a double quote.
The asterisk *
matches the preceding regular expression [anything but double quotes] zero or more times.
In its entirety, the regular expression matches zero or more characters that are not double quotes and are inside of double quotes.
Copied!
import re my_str_2 = "One 'Two' Three 'Four'" my_list_2 = re.findall[r"'[[^']*]'", my_str_2] print[my_list_2] # 👉️ ['Two', 'Four'] print[my_list_2[0]] # 👉️ Two print[my_list_2[1]] # 👉️ Four
You can also use this approach to extract strings from between single quotes.
Copied!
import re my_str_2 = "One 'Two' Three 'Four'" my_list_2 = re.findall[r"'[[^']*]'", my_str_2] print[my_list_2] # 👉️ ['Two', 'Four'] print[my_list_2[0]] # 👉️ Two print[my_list_2[1]] # 👉️ Four
All we had to do is wrap the group in single quotes instead of double quotes and place a single quote in the set of characters.
In its entirety, the regex matches zero or more characters that are not single quotes and are inside of single quotes.