How about this algorithm? Split text on whitespace, then trim punctuation. This carefully removes punctuation from the edge of words, without harming apostrophes inside words such as we're
.
>>> text
"'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'"
>>> text.split[]
["'Oh,", 'you', "can't", 'help', "that,'", 'said', 'the', 'Cat:', "'we're", 'all', 'mad', 'here.', "I'm", 'mad.', "You're", "mad.'"]
>>> import string
>>> [word.strip[string.punctuation] for word in text.split[]]
['Oh', 'you', "can't", 'help', 'that', 'said', 'the', 'Cat', "we're", 'all', 'mad', 'here', "I'm", 'mad', "You're", 'mad']
On this page: .split[], .join[], and list[].
Splitting a Sentence into Words: .split[]
Below, mary is a single string. Even though it is a sentence, the words are not represented as discreet units. For that, you need a different data type: a list of strings where each string corresponds to a word. .split[] is the method to use:
>>> mary = 'Mary had a little lamb' >>> mary.split[] ['Mary', 'had', 'a', 'little', 'lamb'] |
>>> mwords = mary.split[] >>> mwords ['Mary', 'had', 'a', 'little', 'lamb'] >>> len[mwords] # number of items in mwords 5 >>> len[mary] # number of characters 22 |
>>> chom = ' colorless green \n\tideas\n' # ' ', '\n', '\t' bunched up >>> print[chom] colorless green ideas >>> chom.split[] ['colorless', 'green', 'ideas'] |
Splitting on a Specific Substring
By providing an optional parameter, .split['x'] can be used to split a string on a specific substring 'x'. Without 'x' specified, .split[] simply splits on all whitespace, as seen above.
>>> mary = 'Mary had a little lamb' >>> mary.split['a'] # splits on 'a' ['M', 'ry h', 'd ', ' little l', 'mb'] >>> hi = 'Hello mother,\nHello father.' >>> print[hi] Hello mother, Hello father. >>> hi.split[] # no parameter given: splits on whitespace ['Hello', 'mother,', 'Hello', 'father.'] >>> hi.split['\n'] # splits on '\n' only ['Hello mother,', 'Hello father.'] |
String into a List of Characters: list[]
But what if you want to split a string into a list of characters? In Python, characters are simply strings of length 1. The list[] function turns a string into a list of individual letters:
>>> list['hello world'] ['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'] |
Joining a List of Strings: .join[]
If you have a list of words, how do you put them back together into a single string? .join[] is the method to use. Called on a "separator" string 'x', 'x'.join[y] joins every element in the list y separated by 'x'. Below, words in mwords are joined back into the sentence string with a space in between:
>>> mwords ['Mary', 'had', 'a', 'little', 'lamb'] >>> ' '.join[mwords] 'Mary had a little lamb' |
>>> '--'.join[mwords] 'Mary--had--a--little--lamb' >>> '\t'.join[mwords] 'Mary\thad\ta\tlittle\tlamb' >>> print['\t'.join[mwords]] Mary had a little lamb |
>>> hi = 'hello world' >>> hichars = list[hi] >>> hichars ['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'] >>> ''.join[hichars] 'hello world' |
How do you split a list of words in Python?
You can use the str. split[sep=None] function, which returns a list of the words in the string, using sep as the delimiter string. If sep is not specified or is None , consecutive whitespace runs are regarded as a single separator.
How do I split a string into a list of words?
To convert a string in a list of words, you just need to split it on whitespace. You can use split[] from the string class. The default delimiter for this method is whitespace, i.e., when called on a string, it'll split that string at whitespace characters.
How do you break a sentence in Python?
Split Sentence Into Words With the str. split[] Function in Python. The str. split[] function in Python takes a separator as an input parameter and splits the calling string into multiple strings based on the separator.