A Guide on Python String replace() Method
Replacing characters in strings using Python can be done with multiple ways. In this quick tutorial, you'll learn how to use the replace method to manipulate strings.
You'll learn through examples how to replace a character in a string, how to replace multiple characters in a string, and how to replace the last character in a string.
How to Replace a String in Python
If you’re looking for ways to remove or replace all or part of a string in Python, then this tutorial is for you. You’ll be taking a fictional chat room transcript and sanitizing it using both the .replace() method and the re.sub() function.
In Python, the .replace() method and the re.sub() function are often used to clean up text by removing strings or substrings or replacing them. In this tutorial, you’ll be playing the role of a developer for a company that provides technical support through a one-to-one text chat. You’re tasked with creating a script that’ll sanitize the chat, removing any personal data and replacing any swear words with emoji.
You’re only given one very short chat transcript:
Even though this transcript is short, it’s typical of the type of chats that agents have all the time. It has user identifiers, ISO time stamps, and messages.
In this case, the client johndoe filed a complaint, and company policy is to sanitize and simplify the transcript, then pass it on for independent evaluation. Sanitizing the message is your job!
Sample Code: Click here to download the free sample code that you’ll use to replace strings in Python.
The first thing you’ll want to do is to take care of any swear words.
How to Remove or Replace a Python String or Substring
The most basic way to replace a string in Python is to use the .replace() string method:
As you can see, you can chain .replace() onto any string and provide the method with two arguments. The first is the string that you want to replace, and the second is the replacement.
Note: Although the Python shell displays the result of .replace() , the string itself stays unchanged. You can see this more clearly by assigning your string to a variable:
Notice that when you simply call .replace() , the value of name doesn’t change. But when you assign the result of name.replace() to the name variable, ‘Fake Python’ becomes ‘Real Python’ .
Now it’s time to apply this knowledge to the transcript:
Loading the transcript as a triple-quoted string and then using the .replace() method on one of the swear words works fine. But there’s another swear word that’s not getting replaced because in Python, the string needs to match exactly:
As you can see, even if the casing of one letter doesn’t match, it’ll prevent any replacements. This means that if you’re using the .replace() method, you’ll need to call it various times with the variations. In this case, you can just chain on another call to .replace() :
Success! But you’re probably thinking that this isn’t the best way to do this for something like a general-purpose transcription sanitizer. You’ll want to move toward some way of having a list of replacements, instead of having to type out .replace() each time.
Set Up Multiple Replacement Rules
There are a few more replacements that you need to make to the transcript to get it into a format acceptable for independent review:
- Shorten or remove the time stamps
- Replace the usernames with Agent and Client
Now that you’re starting to have more strings to replace, chaining on .replace() is going to get repetitive. One idea could be to keep a list of tuples, with two items in each tuple. The two items would correspond to the arguments that you need to pass into the .replace() method—the string to replace and the replacement string:
In this version of your transcript-cleaning script, you created a list of replacement tuples, which gives you a quick way to add replacements. You could even create this list of tuples from an external CSV file if you had loads of replacements.
You then iterate over the list of replacement tuples. In each iteration, you call .replace() on the string, populating the arguments with the old and new variables that have been unpacked from each replacement tuple.
Note: The unpacking in the for loop in this case is functionally the same as using indexing:
If you’re mystified by unpacking, then check out the section on unpacking from the tutorial on Python lists and tuples.
With this, you’ve made a big improvement in the overall readability of the transcript. It’s also easier to add replacements if you need to. Running this script reveals a much cleaner transcript:
That’s a pretty clean transcript. Maybe that’s all you need. But if your inner automator isn’t happy, maybe it’s because there are still some things that may be bugging you:
- Replacing the swear words won’t work if there’s another variation using -ing or a different capitalization, like BLAst.
- Removing the date from the time stamp currently only works for August 24, 2022.
- Removing the full time stamp would involve setting up replacement pairs for every possible time—not something you’re too keen on doing.
- Adding the space after Agent in order to line up your columns works but isn’t very general.
If these are your concerns, then you may want to turn your attention to regular expressions.
Leverage re.sub() to Make Complex Rules
Whenever you’re looking to do any replacing that’s slightly more complex or needs some wildcards, you’ll usually want to turn your attention toward regular expressions, also known as regex.
Regex is a sort of mini-language made up of characters that define a pattern. These patterns, or regexes, are typically used to search for strings in find and find and replace operations. Many programming languages support regex, and it’s widely used. Regex will even give you superpowers.
In Python, leveraging regex means using the re module’s sub() function and building your own regex patterns:
While you can mix and match the sub() function with the .replace() method, this example only uses sub() , so you can see how it’s used. You’ll note that you can replace all variations of the swear word by using just one replacement tuple now. Similarly, you’re only using one regex for the full time stamp:
Now your transcript has been completely sanitized, with all noise removed! How did that happen? That’s the magic of regex.
The first regex pattern, «blast\w*» , makes use of the \w special character, which will match alphanumeric characters and underscores. Adding the * quantifier directly after it will match zero or more characters of \w .
Another vital part of the first pattern is that the re.IGNORECASE flag makes it a case-insensitive pattern. So now, any substring containing blast , regardless of capitalization, will be matched and replaced.
Note: The «blast\w*» pattern is quite broad and will also modify fibroblast to fibro . It also can’t identify a polite use of the word. It just matches the characters. That said, the typical swear words that you’d want to censor don’t really have polite alternate meanings!
The second regex pattern uses character sets and quantifiers to replace the time stamp. You often use character sets and quantifiers together. A regex pattern of [abc] , for example, will match one character of a , b , or c . Putting a * directly after it would match zero or more characters of a , b , or c .
There are more quantifiers, though. If you used [abc] <10>, it would match exactly ten characters of a , b or c in any order and any combination. Also note that repeating characters is redundant, so [aa] is equivalent to [a] .
For the time stamp, you use an extended character set of [-T:+\d] to match all the possible characters that you might find in the time stamp. Paired with the quantifier <25>, this will match any possible time stamp, at least until the year 10,000.
Note: The special character, \d , matches any digit character.
The time stamp regex pattern allows you to select any possible date in the time stamp format. Seeing as the the times aren’t important for the independent reviewer of these transcripts, you replace them with an empty string. It’s possible to write a more advanced regex that preserves the time information while removing the date.
The third regex pattern is used to select any user string that starts with the keyword «support» . Note that you escape ( \ ) the square bracket ( [ ) because otherwise the keyword would be interpreted as a character set.
Finally, the last regex pattern selects the client username string and replaces it with «Client» .
Note: While it would be great fun to go into more detail about these regex patterns, this tutorial isn’t about regex. Work through the Python regex tutorial for a good primer on the subject. Also, you can make use of the fantastic RegExr web site, because regex is tricky and regex wizards of all levels rely on handy tools like RegExr.
RegExr is particularly good because you can copy and paste regex patterns, and it’ll break them down for you with explanations.
With regex, you can drastically cut down the number of replacements that you have to write out. That said, you still may have to come up with many patterns. Seeing as regex isn’t the most readable of languages, having lots of patterns can quickly become hard to maintain.
Thankfully, there’s a neat trick with re.sub() that allows you to have a bit more control over how replacement works, and it offers a much more maintainable architecture.
Use a Callback With re.sub() for Even More Control
One trick that Python and sub() have up their sleeves is that you can pass in a callback function instead of the replacement string. This gives you total control over how to match and replace.
To get started building this version of the transcript-sanitizing script, you’ll use a basic regex pattern to see how using a callback with sub() works:
The regex pattern that you’re using will match the time stamps, and instead of providing a replacement string, you’re passing in a reference to the sanitize_message() function. Now, when sub() finds a match, it’ll call sanitize_message() with a match object as an argument.
Since sanitize_message() just prints the object that it’s received as an argument, when running this, you’ll see the match objects being printed to the console:
A match object is one of the building blocks of the re module. The more basic re.match() function returns a match object. sub() doesn’t return any match objects but uses them behind the scenes.
Because you get this match object in the callback, you can use any of the information contained within it to build the replacement string. Once it’s built, you return the new string, and sub() will replace the match with the returned string.
Apply the Callback to the Script
In your transcript-sanitizing script, you’ll make use of the .groups() method of the match object to return the contents of the two capture groups, and then you can sanitize each part in its own function or discard it:
Instead of having lots of different regexes, you can have one top level regex that can match the whole line, dividing it up into capture groups with brackets ( () ). The capture groups have no effect on the actual matching process, but they do affect the match object that results from the match:
- \[(.+)\] matches any sequence of characters wrapped in square brackets. The capture group picks out the username string, for instance johndoe .
- [-T:+\d] <25>matches the time stamp, which you explored in the last section. Since you won’t be using the time stamp in the final transcript, it’s not captured with brackets.
- : matches a literal colon. The colon is used as a separator between the message metadata and the message itself.
- (.+) matches any sequence of characters until the end of the line, which will be the message.
The content of the capturing groups will be available as separate items in the match object by calling the .groups() method, which returns a tuple of the matched strings.
Note: The entry regex definition uses Python’s implicit string concatenation:
Functionally, this is the same as writing it all out as one single string: r»\[(.+)\] [-T:+\d] <25>: (.+)» . Organizing your longer regex patterns on separate lines allow you to break it up into chunks, which not only makes it more readable but also allow you to insert comments too.
The two groups are the user string and the message. The .groups() method returns them as a tuple of strings. In the sanitize_message() function, you first use unpacking to assign the two strings to variables:
Note how this architecture allows a very broad and inclusive regex at the top level, and then lets you supplement it with more precise regexes within the replacement callback.
The sanitize_message() function makes use of two functions to clean up usernames and bad words. It additionally uses f-strings to justify the messages. Note how censor_bad_words() uses a dynamically created regex while censor_users() relies on more basic string processing.
This is now looking like a good first prototype for a transcript-sanitizing script! The output is squeaky clean:
Nice! Using sub() with a callback gives you far more flexibility to mix and match different methods and build regexes dynamically. This structure also gives you the most room to grow when your bosses or clients inevitably change their requirements on you!
Conclusion
In this tutorial, you’ve learned how to replace strings in Python. Along the way, you’ve gone from using the basic Python .replace() string method to using callbacks with re.sub() for absolute control. You’ve also explored some regex patterns and deconstructed them into a better architecture to manage a replacement script.
With all that knowledge, you’ve successfully cleaned a chat transcript, which is now ready for independent review. Not only that, but your transcript-sanitizing script has plenty of room to grow.
Sample Code: Click here to download the free sample code that you’ll use to replace strings in Python.
String Manipulation and Regular Expressions
One place where the Python language really shines is in the manipulation of strings. This section will cover some of Python’s built-in string methods and formatting operations, before moving on to a quick guide to the extremely useful subject of regular expressions. Such string manipulation patterns come up often in the context of data science work, and is one big perk of Python in this context.
Strings in Python can be defined using either single or double quotations (they are functionally equivalent):
In addition, it is possible to define multi-line strings using a triple-quote syntax:
With this, let’s take a quick tour of some of Python’s string manipulation tools.
Simple String Manipulation in Python¶
For basic manipulation of strings, Python’s built-in string methods can be extremely convenient. If you have a background working in C or another low-level language, you will likely find the simplicity of Python’s methods extremely refreshing. We introduced Python’s string type and a few of these methods earlier; here we’ll dive a bit deeper
Formatting strings: Adjusting case¶
Python makes it quite easy to adjust the case of a string. Here we’ll look at the upper() , lower() , capitalize() , title() , and swapcase() methods, using the following messy string as an example:
To convert the entire string into upper-case or lower-case, you can use the upper() or lower() methods respectively:
A common formatting need is to capitalize just the first letter of each word, or perhaps the first letter of each sentence. This can be done with the title() and capitalize() methods:
The cases can be swapped using the swapcase() method:
Formatting strings: Adding and removing spaces¶
Another common need is to remove spaces (or other characters) from the beginning or end of the string. The basic method of removing characters is the strip() method, which strips whitespace from the beginning and end of the line:
To remove just space to the right or left, use rstrip() or lstrip() respectively:
To remove characters other than spaces, you can pass the desired character to the strip() method:
The opposite of this operation, adding spaces or other characters, can be accomplished using the center() , ljust() , and rjust() methods.
For example, we can use the center() method to center a given string within a given number of spaces:
Similarly, ljust() and rjust() will left-justify or right-justify the string within spaces of a given length:
All these methods additionally accept any character which will be used to fill the space. For example:
Because zero-filling is such a common need, Python also provides zfill() , which is a special method to right-pad a string with zeros:
Finding and replacing substrings¶
If you want to find occurrences of a certain character in a string, the find() / rfind() , index() / rindex() , and replace() methods are the best built-in methods.
find() and index() are very similar, in that they search for the first occurrence of a character or substring within a string, and return the index of the substring:
The only difference between find() and index() is their behavior when the search string is not found; find() returns -1 , while index() raises a ValueError :
The related rfind() and rindex() work similarly, except they search for the first occurrence from the end rather than the beginning of the string:
For the special case of checking for a substring at the beginning or end of a string, Python provides the startswith() and endswith() methods:
To go one step further and replace a given substring with a new string, you can use the replace() method. Here, let’s replace ‘brown’ with ‘red’ :
The replace() function returns a new string, and will replace all occurrences of the input:
For a more flexible approach to this replace() functionality, see the discussion of regular expressions in Flexible Pattern Matching with Regular Expressions.
Splitting and partitioning strings¶
If you would like to find a substring and then split the string based on its location, the partition() and/or split() methods are what you’re looking for. Both will return a sequence of substrings.
The partition() method returns a tuple with three elements: the substring before the first instance of the split-point, the split-point itself, and the substring after:
The rpartition() method is similar, but searches from the right of the string.
The split() method is perhaps more useful; it finds all instances of the split-point and returns the substrings in between. The default is to split on any whitespace, returning a list of the individual words in a string:
A related method is splitlines() , which splits on newline characters. Let’s do this with a Haiku, popularly attributed to the 17th-century poet Matsuo Bashō:
Note that if you would like to undo a split() , you can use the join() method, which returns a string built from a splitpoint and an iterable:
A common pattern is to use the special character «\n» (newline) to join together lines that have been previously split, and recover the input:
Format Strings¶
In the preceding methods, we have learned how to extract values from strings, and to manipulate strings themselves into desired formats. Another use of string methods is to manipulate string representations of values of other types. Of course, string representations can always be found using the str() function; for example:
For more complicated formats, you might be tempted to use string arithmetic as outlined in Basic Python Semantics: Operators:
A more flexible way to do this is to use format strings, which are strings with special markers (noted by curly braces) into which string-formatted values will be inserted. Here is a basic example:
Inside the <> marker you can also include information on exactly what you would like to appear there. If you include a number, it will refer to the index of the argument to insert:
If you include a string, it will refer to the key of any keyword argument:
Finally, for numerical inputs, you can include format codes which control how the value is converted to a string. For example, to print a number as a floating point with three digits after the decimal point, you can use the following:
As before, here the » 0 » refers to the index of the value to be inserted. The » : » marks that format codes will follow. The » .3f » encodes the desired precision: three digits beyond the decimal point, floating-point format.
This style of format specification is very flexible, and the examples here barely scratch the surface of the formatting options available. For more information on the syntax of these format strings, see the Format Specification section of Python’s online documentation.
Flexible Pattern Matching with Regular Expressions¶
The methods of Python’s str type give you a powerful set of tools for formatting, splitting, and manipulating string data. But even more powerful tools are available in Python’s built-in regular expression module. Regular expressions are a huge topic; there are there are entire books written on the topic (including Jeffrey E.F. Friedl’s Mastering Regular Expressions, 3rd Edition), so it will be hard to do justice within just a single subsection.
My goal here is to give you an idea of the types of problems that might be addressed using regular expressions, as well as a basic idea of how to use them in Python. I’ll suggest some references for learning more in Further Resources on Regular Expressions.
Changing one character in a string
What is the easiest way in Python to replace a character in a string?
15 Answers 15
Don’t modify strings.
Work with them as lists; turn them into strings only when needed.
Python strings are immutable (i.e. they can’t be modified). There are a lot of reasons for this. Use lists until you have no choice, only then turn them into strings.
Fastest method?
There are three ways. For the speed seekers I recommend ‘Method 2’
Method 1
Which is pretty slow compared to ‘Method 2’
Method 2 (FAST METHOD)
Which is much faster:
Method 3:
Python strings are immutable, you change them by making a copy.
The easiest way to do what you want is probably:
The text[1:] returns the string in text from position 1 to the end, positions count from 0 so ‘1’ is the second character.
edit: You can use the same string slicing technique for any part of the string
Or if the letter only appears once you can use the search and replace technique suggested below