How do you split a paragraph into a list of words in python?

How about this algorithm? Split text on whitespace, then trim punctuation. This carefully removes punctuation from the edge of words, without harming apostrophes inside words such as we're.

>>> text
"'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'"

>>> text.split()
["'Oh,", 'you', "can't", 'help', "that,'", 'said', 'the', 'Cat:', "'we're", 'all', 'mad', 'here.', "I'm", 'mad.', "You're", "mad.'"]

>>> import string
>>> [word.strip(string.punctuation) for word in text.split()]
['Oh', 'you', "can't", 'help', 'that', 'said', 'the', 'Cat', "we're", 'all', 'mad', 'here', "I'm", 'mad', "You're", 'mad']

On this page: .split(), .join(), and list().

Splitting a Sentence into Words: .split()

Below, mary is a single string. Even though it is a sentence, the words are not represented as discreet units. For that, you need a different data type: a list of strings where each string corresponds to a word. .split() is the method to use:

>>> mary = 'Mary had a little lamb'
>>> mary.split() 
['Mary', 'had', 'a', 'little', 'lamb'] 

.split() splits mary on whitespce, and the returned result is a list of words in mary. This list contains 5 items as the len() function demonstrates. len() on mary, by contrast, returns the number of characters in the string (including the spaces).

>>> mwords = mary.split() 
>>> mwords
['Mary', 'had', 'a', 'little', 'lamb'] 
>>> len(mwords)                # number of items in mwords
>>> len(mary)                  # number of characters

Whitespace characters include space ' ', the newline character '\n', and tab '\t', among others. .split() separates on any combined sequence of those characters:

>>> chom = ' colorless     green \n\tideas\n'       # ' ', '\n', '\t' bunched up
>>> print(chom)
 colorless     green 
>>> chom.split()
['colorless', 'green', 'ideas'] 

Splitting on a Specific Substring

By providing an optional parameter, .split('x') can be used to split a string on a specific substring 'x'. Without 'x' specified, .split() simply splits on all whitespace, as seen above.

>>> mary = 'Mary had a little lamb'
>>> mary.split('a')                 # splits on 'a'
['M', 'ry h', 'd ', ' little l', 'mb'] 
>>> hi = 'Hello mother,\nHello father.'
>>> print(hi)
Hello mother,
Hello father. 
>>> hi.split()                # no parameter given: splits on whitespace
['Hello', 'mother,', 'Hello', 'father.'] 
>>> hi.split('\n')                 # splits on '\n' only
['Hello mother,', 'Hello father.'] 

String into a List of Characters: list()

But what if you want to split a string into a list of characters? In Python, characters are simply strings of length 1. The list() function turns a string into a list of individual letters:

>>> list('hello world')
['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'] 

More generally, list() is a built-in function that turns a Python data object into a list. When a string type is given, what's returned is a list of characters in it. When other data types are given, the specifics vary but the returned type is always a list. See this tutorial for details.

Joining a List of Strings: .join()

If you have a list of words, how do you put them back together into a single string? .join() is the method to use. Called on a "separator" string 'x', 'x'.join(y) joins every element in the list y separated by 'x'. Below, words in mwords are joined back into the sentence string with a space in between:

>>> mwords
['Mary', 'had', 'a', 'little', 'lamb'] 
>>> ' '.join(mwords)
'Mary had a little lamb' 

Joining can be done on any separator string. Below, '--' and the tab character '\t' are used.

>>> '--'.join(mwords)
>>> '\t'.join(mwords)
>>> print('\t'.join(mwords))
Mary    had     a       little  lamb 

The method can also be called on the empty string '' as the separator. The effect is the elements in the list joined together with nothing in between. Below, a list of characters is put back together into the original string:

>>> hi = 'hello world'
>>> hichars = list(hi)
>>> hichars
['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'] 
>>> ''.join(hichars)
'hello world' 

How do you split a list of words in Python?

You can use the str. split(sep=None) function, which returns a list of the words in the string, using sep as the delimiter string. If sep is not specified or is None , consecutive whitespace runs are regarded as a single separator.

How do I split a string into a list of words?

To convert a string in a list of words, you just need to split it on whitespace. You can use split() from the string class. The default delimiter for this method is whitespace, i.e., when called on a string, it'll split that string at whitespace characters.

How do you break a sentence in Python?

Split Sentence Into Words With the str. split() Function in Python. The str. split() function in Python takes a separator as an input parameter and splits the calling string into multiple strings based on the separator.

Tải thêm tài liệu liên quan đến bài viết How do you split a paragraph into a list of words in python?