Как разделить строку по нескольким разделителям python

Python: Split a String on Multiple Delimiters

Python Split a String on Multiple Delimiters Cover Image

In this tutorial, you’ll learn how to use Python to split a string on multiple delimiters. You’ll learn how to do this with the built-in regular expressions library re as well as with the built-in string .split() method.

But why even learn how to split data? Splitting data can be an immensely useful skill to learn. Data comes in all shapes and it’s often not as clean as we would like to be. There will be many times when you want to split a string by multiple delimiters to make it more easy to work with.

Now lets get started!

Table of Contents

How do you split a string in Python?

Python has a built-in method you can apply to string, called .split() , which allows you to split a string by a certain delimiter.

The method looks like this:

In this method, the:

separator: argument accepts what character to split on. If no argument is provided, it uses any whitespace to split.
maxsplit: the number of splits to do, where the default value is -1 , meaning that all occurrences are split.

Let’s say you had a string that you wanted to split by commas – let’s learn how to do this:

We can see here that what’s returned is a list that contains all of the newly split values.

Split a Python String on Multiple Delimiters using Regular Expressions

The most intuitive way to split a string is to use the built-in regular expression library re . The library has a built in .split() method, similar to the example covered above. What’s unique about this method is that it allows you to use regular expressions to split our strings.

Let’s see what this method looks like:

Similar to the example above, the maxsplit= argument allows us to set how often a string should be split. If it’s set to any positive non-zero number, it’ll split only that number of times.

So, let’s repeat our earlier example with the re module:

Now, say you have a string with multiple delimiters. The re method makes it easy to split this string too!

Let’s take a look at another example:

What we’ve done here is passed in a raw string that re helps interpret. We pass in the pipe character | as an or statement.

We can simplify this even further by passing in a regular expressions collection. Let’s see how we can do this:

This returns the same thing as before, but it’s a bit cleaner to write and to read.

Split a Python String on Multiple Delimiters using String Split

You’re also able to avoid use of the re module altogether. The module can be a little intimidating, so if you’re more comfortable, you can accomplish this without the module as well.

In the example below, you’ll learn how to split a Python string with multiple delimiters by first replacing values. We’ll take our new string and replace all delimiters to be one consistent delimiter. Let’s take a look:

This method works fine when you have a small number of delimiters, but it quickly becomes messy when you have more than 2 or 3 delimiters that you would want to split your string by. It’s better to stick to the re module for more complex splits.

Create a Function to Split a Python String with Multiple Delimiters

Finally, let’s take a look at how to split a string using a function. For this function, we’ll use the re module. You’ll be able to pass in a list of delimiters and a string and have a split string returned.

Let’s get started!

Conclusion

In this post, you learned how to split a Python string by multiple delimiters. You learned how to do this using the built-in .split() method, as well as the built-in regular expression re ‘s .split() function.

To learn more about splitting Python strings, check out the .split() method’s documentation here. To learn more about splitting strings with re , check out the official documentation here.

How to Split String with Multiple Delimiters in Python

There are several ways to split a string with multiple delimiters in Python.

Method 1: Using the re.split() method
Method 2: Using the functools.reduce() function with list comprehension
Method 3: Using the str.split() method with a custom function

Method 1: Using the re.split() method

To split a string with a single delimiter in Python, you can use the re.split() method. The function allows you to split a string based on a pattern, which can include multiple delimiters.

Output

In this example, the input string is split into a list of substrings based on the specified delimiters.

Method 2: Using functools.reduce() with list comprehension

You can use the split() in combination with a for loop, functools.reduce() method, and list comprehension.

Output

Method 3: Using the str.split() method with custom function

The str.split() method is a built-in Python method for splitting a string into a list of substrings based on a specified delimiter. If the delimiter is not provided, the method splits the string based on whitespace characters (spaces, tabs, and newlines).

However, the str.split() method does not directly support multiple delimiters. To achieve this, you can create a custom function that uses the str.split() method in a loop over the delimiters.

Output

In this example, the multi_split() function accepts an input string and a list of delimiters. Then, it iteratively splits the input string using each delimiter and accumulates the result in the segments list.

Split Strings into words with multiple word boundary delimiters

I think what I want to do is a fairly common task but I’ve found no reference on the web. I have text with punctuation, and I want a list of the words.

But Python’s str.split() only works with one argument, so I have all words with the punctuation after I split with whitespace. Any ideas?

31 Answers 31

re.split(pattern, string[, maxsplit=0])

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. If maxsplit is nonzero, at most maxsplit splits occur, and the remainder of the string is returned as the final element of the list. (Incompatibility note: in the original Python 1.5 release, maxsplit was ignored. This has been fixed in later releases.)

A case where regular expressions are justified:

Another quick way to do this without a regexp is to replace the characters first, as below:

So many answers, yet I can’t find any solution that does efficiently what the title of the questions literally asks for (splitting on multiple possible separators—instead, many answers split on anything that is not a word, which is different). So here is an answer to the question in the title, that relies on Python’s standard and efficient re module:

the […] matches one of the separators listed inside,
the \- in the regular expression is here to prevent the special interpretation of — as a character range indicator (as in A-Z ),
the + skips one or more delimiters (it could be omitted thanks to the filter() , but this would unnecessarily produce empty strings between matched single-character separators), and
filter(None, …) removes the empty strings possibly created by leading and trailing separators (since empty strings have a false boolean value).

This re.split() precisely "splits with multiple separators", as asked for in the question title.

This solution is furthermore immune to the problems with non-ASCII characters in words found in some other solutions (see the first comment to ghostdog74’s answer).

The re module is much more efficient (in speed and concision) than doing Python loops and tests "by hand"!

Another way, without regex

Pro-Tip: Use string.translate for the fastest string operations Python has.

First, the slow way (sorry pprzemek):

Next, we use re.findall() (as given by the suggested answer). MUCH faster:

Finally, we use translate :

Explanation:

string.translate is implemented in C and unlike many string manipulation functions in Python, string.translate does not produce a new string. So it’s about as fast as you can get for string substitution.

It’s a bit awkward, though, as it needs a translation table in order to do this magic. You can make a translation table with the maketrans() convenience function. The objective here is to translate all unwanted characters to spaces. A one-for-one substitute. Again, no new data is produced. So this is fast!

Next, we use good old split() . split() by default will operate on all whitespace characters, grouping them together for the split. The result will be the list of words that you want. And this approach is almost 4x faster than re.findall() !

I had a similar dilemma and didn’t want to use ‘re’ module.

First, I want to agree with others that the regex or str.translate(. ) based solutions are most performant. For my use case the performance of this function wasn’t significant, so I wanted to add ideas that I considered with that criteria.

My main goal was to generalize ideas from some of the other answers into one solution that could work for strings containing more than just regex words (i.e., blacklisting the explicit subset of punctuation characters vs whitelisting word characters).

Note that, in any approach, one might also consider using string.punctuation in place of a manually defined list.

Option 1 — re.sub

I was surprised to see no answer so far uses re.sub(. ). I find it a simple and natural approach to this problem.

In this solution, I nested the call to re.sub(. ) inside re.split(. ) — but if performance is critical, compiling the regex outside could be beneficial — for my use case, the difference wasn’t significant, so I prefer simplicity and readability.

Option 2 — str.replace

This is a few more lines, but it has the benefit of being expandable without having to check whether you need to escape a certain character in regex.

It would have been nice to be able to map the str.replace to the string instead, but I don’t think it can be done with immutable strings, and while mapping against a list of characters would work, running every replacement against every character sounds excessive. (Edit: See next option for a functional example.)

Option 3 — functools.reduce

(In Python 2, reduce is available in global namespace without importing it from functools.)

Then this becomes a three-liner:

Explanation

This is what in Haskell is known as the List monad. The idea behind the monad is that once «in the monad» you «stay in the monad» until something takes you out. For example in Haskell, say you map the python range(n) -> [1,2. n] function over a List. If the result is a List, it will be append to the List in-place, so you’d get something like map(range, [3,4,1]) -> [0,1,2,0,1,2,3,0] . This is known as map-append (or mappend, or maybe something like that). The idea here is that you’ve got this operation you’re applying (splitting on a token), and whenever you do that, you join the result into the list.

You can abstract this into a function and have tokens=string.punctuation by default.

Advantages of this approach:

This approach (unlike naive regex-based approaches) can work with arbitrary-length tokens (which regex can also do with more advanced syntax).
You are not restricted to mere tokens; you could have arbitrary logic in place of each token, for example one of the «tokens» could be a function which splits according to how nested parentheses are.

I like re, but here is my solution without it:

sep.__contains__ is a method used by ‘in’ operator. Basically it is the same as

but is more convenient here.

groupby gets our string and function. It splits string in groups using that function: whenever a value of function changes — a new group is generated. So, sep.__contains__ is exactly what we need.

groupby returns a sequence of pairs, where pair[0] is a result of our function and pair[1] is a group. Using ‘if not k’ we filter out groups with separators (because a result of sep.__contains__ is True on separators). Well, that’s all — now we have a sequence of groups where each one is a word (group is actually an iterable so we use join to convert it to string).

This solution is quite general, because it uses a function to separate string (you can split by any condition you need). Also, it doesn’t create intermediate strings/lists (you can remove join and the expression will become lazy, since each group is an iterator)

Python .split() – Splitting a String in Python

Quincy Larson

Python .split() – Splitting a String in Python

Do you want to turn a string into an array of strings using Python? One way to do this is with Python’s built-in .split() method.

Here’s an example of how to do this in the Python command line:

You can open up the Python REPL from your command line. Python is built into Linux, Mac, and Windows. I’ve written a guide to how you can open the latest version of Python from your Mac terminal.

Note that the «,» argument in the example above is actually optional. Check this out:

The Python .split() method is smart enough to infer what the separator should be. In string1 I used a space. In string2 I used a comma. In both cases it worked.

How to use Python .split() with a specific separator

In practice, you will want to pass a separator as an argument. Let me show you how to do that:

The output is the same, but it’s cleaner. Here’s a more complicated string, where specifying the separator makes bigger difference:

As you can see, it’s a safer bet to specify a separator.

Also note that leading and trailing spaces may be included in some of the strings in your resulting array. Just something to look out for. ��

How do you split a string into multiple strings in Python?

You can split a string into as many parts as you need. This all depends on what character you want to split the string on.

But if you want to ensure that a string does not get split into more than a certain number of parts, you will want to use pass the maxsplit argument in your function call.

How do you split a string into 3 parts in Python?

If you want to put an upper bound on the number of parts your string will be split into, you can specify this using the maxsplit argument, like this:

As you can see, the split function simply stops splitting the string after the 3rd space, so that a total of 4 strings are in the resulting array.

I hope you find this is helpful. Thanks for reading, and happy coding. If you want to learn more, check out freeCodeCamp’s core curriculum.

Как разделить строку по нескольким разделителям python

Python: Split a String on Multiple Delimiters

How do you split a string in Python?

Split a Python String on Multiple Delimiters using Regular Expressions

Split a Python String on Multiple Delimiters using String Split

Create a Function to Split a Python String with Multiple Delimiters

Conclusion

How to Split String with Multiple Delimiters in Python

Method 1: Using the re.split() method

Method 2: Using functools.reduce() with list comprehension

Method 3: Using the str.split() method with custom function

Split Strings into words with multiple word boundary delimiters

31 Answers 31

Option 1 — re.sub

Option 2 — str.replace

Option 3 — functools.reduce

Python .split() – Splitting a String in Python

How to use Python .split() with a specific separator

How do you split a string into multiple strings in Python?

How do you split a string into 3 parts in Python?

Добавить комментарий Отменить ответ