Debugging a Running Python Process

Only if it were as easy as installing debug symbols, attach the process with gdb and py-bt ! So we have a python agent, which distributes files, running across the fleet. And on some random hosts, it went haywire! On those set of hosts, the process was using 100% of CPU and not doing anything meaningful work. Restarting the process fixes the problem. I had worked on debugging a stuck process, but this was the opposite. Time to dive deep.
First Obvious Step: strace
So for some reason, the process is constantly opening root dir and calling fstat on it. Why? I don’t know!
Cannot work, because
- I have not worked with it before (I really should)
- Problem is not reproducible, the 100% CPU usage appeared on random hosts so stopping a process is not an option.
Attach gdb to the process?
Sure! I have heard it works with python too with extensions. There is this official DebuggingWithGdb guide which suggests vanilla steps of installing debug symbols with yum or apt-get . If life were that simple. The thing is we do not run on system python, the one that you find under /usr/bin/python. There is a separately compiled and packaged python installed on hosts at a non-standard location, so grabbing random debug symbols from the internet would not work.
Searched on slack chat history (poor man’s stackoverflow!) if someone has tried this adventure earlier, and some people sure had. Got a link to debug symbol RPM from chat history and installed that
And to get where exactly did that RPM install the debug binary (remember custom in-house built python? It does not install files at any standard location)
Search for a bin/python3.debug or something similar in that. Once you get the path to debug binary, it would be as simple as this, right?
Wrong! Turns out, when we attach gdb using that debug binary, it cannot load most symbols and errors out like
Despite given the debug binary, gdb somehow is not able to load symbols. Meaning python binary that is run as a process and the debug binary that we provided do not correspond to the same version or build of python. The command on the last line did not work either saying package not found. Let’s first confirm the running python binary
[On a side note, the running pex file is a PythonEXecutable, something like zipped virtualenv]
Apparently, the binary that runs python is 3.7.0 and build version 0.0.9 (not sure what exactly does that mean) and that version did not match with original debug binary I installed earlier. To find corresponding debug binary, which would not be available on the internet, I searched on local artifact store (remember debuginfo-install was not able to find it) where there was a promising-looking python37-debuginfo_linux_rhel6_x86_64-0.0.9-3.7.0.el6.x86_64.rpm which exactly corresponds to the binary version that process is running. Okay, now using same rpm -ql command we find the debug binary and run
So that symbols are getting loaded. Now just run py-bt and get backtrace, right? Wrong!
Okay, gdb. That was supposed to work! Why did it not? Turns out, all these python related macros does not come built-in but they are kind of added at runtime. CPython interpreter ships with a script that gdb loads and these python magic macros work. That script in an ideal world, loads automatically and you don’t need to do anything. But remember where we live 🙂 The script is called python37-gdb.py but it was for some strange reason not shipped with the debug binary installed. Again, in an ideal world, the script is supposed to come with the package and should be autoloaded. This is where you can get it from python source code Tools/gdb/libpython.py It is named a bit differently. During build time it gets renamed I guess. Anyway, here’s how you load the file and then the macros would work! Yeeeey.
Now process was getting paused at some memcpy function. From original strace output it seemed to be getting stuck in an infinite open calls. So that’s our pointer. We set a breakpoint at open call and then take a backtrace. And we get to know who’s doing it!
Cool stuff! We know form pex’s third party module we’re making those calls. Let’s go frame by frame
The second frame looks suspicious. The function containing has an argument prefix ‘/////////….’ and we also saw in our initial strace output the program opening ‘/’ repeatedly. So the dots connect. Next? Go to pex’s github repo and search for reported issues and there indeed was PR#638 Here’s the code snippet from the earlier version
So this method is passed a zip argument (the pex python file) and zipfile.is_zipfile will return true and program proceeds happily. But when does it not, it modifies path as a parent dir using os.path.dirname(. ) and while loop continues. The parent dir is not zip either, so it goes to its parent dir. And so it goes on till path is / . Now parent of / is / itself so while loop continues infinitely and we see 100% CPU usage and process doing nothing else. The issue is explained here by author who raised a fix.
The Root Cause
So ideally the zip (.pex) file is supposed to exist. This particular scenario happened while we were moving away from pex. We install a newer packaged file and restart the process. But for some reason (which we will not discuss here) the process was not getting killed and it continued running with a pex file which did not exist anymore (hence is_zipfile fails) because of the upgrade to new packaging.
Categories: linux , python
Updated: December 18, 2019
You May Also Enjoy
SRECon’21: Leveraging ML to Detect Application HotSpots [@scale, of Course!]
less than 1 minute read
Long pandemic no see! I submitted two proposals for SRECon 20 Asia which was supposed to happen in Australia. One talk got selected. But instead of the confe.
Python Metaprogramming: Functions, Flask and Google Cloud Functions
Everything in Python is an object. And that includes functions. Let’s see what I learned while I was trying to work with Google cloud functions with python r.
Golang Http Client and Compression
I had a very (seemingly) simple task. Verify my golang http client, talking with an ElasticSearch cluster, is compressing data on wire. Because in trials, th.
PyCon19 India: Let’s Hunt a Memory Leak
We faced a memory leak in production and I wrote about it in this blog post. A while back, I somewhere came across the open Call for Proposals for Pycon Indi.
Debugging of CPython processes with gdb
pdb has been, is and probably always will be the bread and butter of Python programmers, when they need to find the root cause of a problem in their applications, as it’s a built-in and easy to use debugger. But there are cases, when pdb can’t help you, e.g. if your app has got stuck somewhere, and you need to attach to a running process to find out why, without restarting it. This is where gdb shines.
Why gdb?
gdb is a general purpose debugger, that is mostly used for debugging of C and C++ applications (although it actually supports Ada, Objective-C, Pascal and more).
There are different reasons why a Python programmer would be interested in gdb for debugging:
gdb allows one to attach to a running process without starting an app in debug mode or modifying the app code in some way first (e.g. putting something like import rpdb; rpdb.set_trace() into the code)
gdb allows one to take a core dump of a process and analyze it later. This is useful, when you don’t want to stop the process for the duration of time, while you are introspecting its state, as well as when you do post-mortem debugging of a process that has already failed (e.g. crashed with a segmentation fault)
most debuggers available for Python (notable exceptions are winpdb and pydevd) do not support switching between threads of the application being debugged. gdb allows that, as well as debugging of threads created by non-Python code (e.g. in some native library used)
Debugging of interpreted languages
So what makes Python special when using gdb ?
In contradistinction to programming languages like C or C++, Python code is not compiled into a native binary for a target platform. Instead there is an interpreter (e.g. CPython, the reference implementation of Python), which executes compiled byte-code.
This effectively means, that when you attach to a Python process with gdb , you’ll debug the interpreter instance and introspect the process state at the interpreter level, not the application level: i.e. you will see functions and variables of the interpreter, not of your app.
To give you an example, let’s take a look at a gdb backtrace of a CPython (the most popular Python interpreter) process:
and one obtained by the means of traceback.extract_stack() :
As is, the former is of little help, when you are trying to find a problem in your Python code, and all you see is the current state of the interpreter itself.
However, PyEval_EvalFrameEx looks interesting: it’s a function of CPython, which executes bytecode of Python application level functions and, thus, has access to their state — the very state we are usually interested in.
gdb and Python
Search results for «gdb debug python» can be confusing. The thing is, that starting from gdb version 7 it’s been possible to extend the debugger with Python code, e.g. in order to provide visualisations for C++ STL types, which is much easier to implement in Python rather than in the built-in macro language.
In order to be able to debug CPython processes and introspect the application level state, the interpreter developers decided to extend gdb and wrote a script for that in. Python, of course!
So it’s two different, but related things:
- gdb versions 7+ are extendable with Python modules
- there’s a Python gdb extension for debugging of CPython processes
Debugging Python with gdb 101
First of all, you need to install gdb :
depending on the Linux distro you are using.
The next step is to install debugging symbols for the CPython build you have:
Some Linux distros like CentOS or RHEL ship debugging symbols separately from all other packages and recommend to install those like:
The installed debugging symbols will be used by the CPython script for gdb in order to analyze the PyEval_EvalFrameEx frames (a frame essentially is a function call and the associated state in a form of local variables and CPU registers, etc) and map those to application level functions in your code.
Without debugging symbols it’s much harder to do — gdb allows you to manipulate the process memory in any way you want, but you can’t easily understand what data structures reside in what memory areas.
After all preparatory steps have been completed, you can give gdb a try. E.g. in order to attach to a running CPython process, do:
At this point you can get an application level backtrace for the current thread (note that some frames are «missing» — this is expected, as gdb counts all the interpreter level frames and only some of those are calls in application level code — PyEval_EvalFrameEx ones):
or find out what exact line of the application code is currently being executed:
or look at values of local variables:
There are more py- commands provided by the CPython script for gdb . Check out the debugging guide for details.
Gotchas
Although the described technique should work out-of-box, there are a few known gotchas.
python-dbg
The python-dbg package in Debian and Ubuntu will not only install the debugging symbols for python (which are stripped at the package build time to save disk space), but also provide an additional CPython binary python-dbg .
The latter essentially is a separate build of CPython (with —with-pydebug flag passed to ./configure ) with many run-time checks. Generally, you don’t want to use python-dbg in production, as it can be (much) slower than python , e.g.:
The good thing is, that you don’t need to: it’s still possible to debug python executable by the means of gdb , as long as the corresponding debugging symbols are installed. So python-dbg just adds a bit more confusion to the CPython/gdb story — you can safely ignore its existence.
Build flags
Some Linux distros build CPython passing the -g0 or -g1 option to gcc : the former produces a binary without debugging information at all, and the latter does not allow gdb to get information about local variables at runtime.
Both these options break the described workflow of debugging CPython processes by the means of gdb . The solution is to rebuild CPython with -g or -g2 ( 2 is the default value when -g is passed).
Fortunately, all current versions of the major Linux distros (Ubuntu Trusty/Xenial, Debian Jessie, CentOS/RHEL 7) ship the «correctly» built CPython.
Optimized out frames
For introspection to work properly, it’s crucial, that information about PyEval_EvalFrameEx arguments is preserved for each call. Depending on the optimization level used in gcc when building CPython or the concrete compiler version used, it’s possible that this information will be lost at runtime (especially with aggressive optimizations enabled by -O3 ). In this case gdb will show you something like:
i.e. some application level frames will be available, some will not. There is little you can do at this point, except for rebuilding CPython with a lower optimization level, but that often is not an option for production (not to mention the fact you’ll be using a custom CPython build, not the one provided by your Linux distro).
Update: actually, there is something you could do. This «frame information optimized out» message essentially tells you that gdb wasn’t able to figure out the location of PyFrameObject data structure in a given stack frame (DWARF debugging symbols allow gdb to calculate addresses of local variables and function arguments). But it has to be somewhere; otherwise CPython would not be able to execute your Python code.
On x86-64 machines the obvious place to check is CPU registers: there are 16 general purpose CPU registers, that compilers can use for storing the values of function call arguments and local variables.
The following command prints the values of all CPU registers in the selected stack frame:
But these are just numbers. We need to help gdb put some meaning behind them.
Note, that some of the numbers above clearly look like memory addresses. We can ask gdb to interpret the value of a CPU register as a pointer to some data type. We know, that most of CPython runtime data structures are PyObject’s, that store information on the actual type internally (e.g. ->ob_type->tp_name field contains a type name encoded as a C-string).
So what we’ll do is try to cast the value of each CPU register to PyObject* and see if we can find anything useful:
If we give gdb a memory address, that does not actually point to a PyObject instance, we’ll get an error on pointer dereference.
There are only so many CPU registers to check. And you can easily automate this search by the means of a helper gdb command similar to:
E.g., my CPython build puts the pointer to PyFrameObject to the CPU register RBX:
Note, that the loaded libpython-gdb.py script provides pretty-printing for PyFrameObject data structure, as well it’s able to figure out a specific type of a given PyObject automatically. So even if high-level commands like py-bt don’t work on such stack frames, you’ll be able to get the very same information by pointing gdb to the location of PyFrameObject manually.
Of course, manually poking CPU registers and memory addresses is not pretty, but it can be the only way of debugging «optimized out» frames.
Virtual environments and custom CPython builds
When a virtual environment is used, it may appear that the extension does not work:
gdb can still follow the CPython frames, but information on PyEval_EvalCodeEx calls is not available.
If you scroll up the gdb output a bit, you’ll see that gdb failed to find the debugging symbols for python executable:
How is a virtual environment any different? Why did not gdb find the debugging symbols?
First and foremost, the path to python executable is different. Note, that I did not specify the executable file, when attaching to the process. In this case gdb will take the executable file of the process (i.e. /proc/$PID/exe value on Linux).
One of the ways to separate debugging symbols is to put those into a well-known directory (default is /usr/lib/debug/ , although it’s configurable via debug-file-directory option in gdb ). In our case gdb tried to load debugging symbols from /usr/lib/debug/home/rpodolyaka/workspace/venvs/default/bin/python2 and, obviously, did not find anything there.
The solution is simple — specify the executable under debug explicitly when running gdb :
Thus, gdb will look for debugging symbols in the «right» place — /usr/lib/debug/usr/bin/python2.7 .
It’s also worth mentioning, that it’s possible that debugging symbols for a particular executable are identified by a unique build-id value stored in ELF executable headers. E.g. CPython on my Debian machine:
In this case gdb will look for debugging symbols using the build-id value:
This has a nice implication — it no longer matters how the executable is called: virtualenv just creates a copy of the specified interpreter executable, thus, both executables — the one in /usr/bin/ and the one in your virtual environment will use the very same debugging symbols:
The first problem is solved, bt output now looks much nicer, but py-bt command is still undefined:
Once again, this is caused by the fact that python binary in a virtual environment has a different path. By default, gdb will try to auto-load Python extensions for a particular object file under debug, if they exist. Specifically, gdb will look for objfile-gdb.py and try to source it on start:
If, for some reason this has not been done, you can always do it manually:
e.g. if you want to test a new version of the gdb extension shipped with CPython.
PyPy, Jython, etc
The described debugging technique is only feasible for the CPython interpreter as is, as the gdb extension is specifically written to introspect the state of CPython internals (e.g. PyEval_EvalFrameEx calls).
For PyPy there is an open issue on Bitbucket, where it was proposed to provide integration with gdb , but looks like the attached patches have not been merged yet and the person, who wrote those, lost interest in this.
For Jython you could probably use standard tools for debugging of JVM applications, e.g. VisualVM.
Conclusion
gdb is a powerful tool, that allows one to debug complex problems with crashing or hanging CPython processes, as well as Python code, that does calls to native libraries. On modern Linux distros debugging CPython processes with gdb must be as simple as installing of debugging symbols for the concrete interpreter build, although there are a few known gotchas, especially when virtual environments are used.
Python Debugging With Pdb
Debugging applications can sometimes be an unwelcome activity. You’re busy working under a time crunch and you just want it to work. However, at other times, you might be learning a new language feature or experimenting with a new approach and want to understand more deeply how something is working.
Regardless of the situation, debugging code is a necessity, so it’s a good idea to be comfortable working in a debugger. In this tutorial, I’ll show you the basics of using pdb, Python’s interactive source code debugger.
I’ll walk you through a few common uses of pdb. You may want to bookmark this tutorial for quick reference later when you might really need it. pdb, and other debuggers, are indispensable tools. When you need a debugger, there’s no substitute. You really need it.
By the end of this tutorial, you’ll know how to use the debugger to see the state of any variable in your application. You’ll also be able to stop and resume your application’s flow of execution at any moment, so you can see exactly how each line of code affects its internal state.
This is great for tracking down hard-to-find bugs and allows you to fix faulty code more quickly and reliably. Sometimes, stepping through code in pdb and seeing how values change can be a real eye-opener and lead to “aha” moments, along with the occasional “face palm”.
pdb is part of Python’s standard library, so it’s always there and available for use. This can be a life saver if you need to debug code in an environment where you don’t have access to the GUI debugger you’re familiar with.
The example code in this tutorial uses Python 3.6. You can find the source code for these examples on GitHub.
At the end of this tutorial, there is a quick reference for Essential pdb Commands.
There’s also a printable pdb Command Reference you can use as a cheat sheet while debugging:
Free Bonus: Click here to get a printable «pdb Command Reference» (PDF) that you can keep on your desk and refer to while debugging.
Getting Started: Printing a Variable’s Value
In this first example, we’ll look at using pdb in its simplest form: checking the value of a variable.
Insert the following code at the location where you want to break into the debugger:
When the line above is executed, Python stops and waits for you to tell it what to do next. You’ll see a (Pdb) prompt. This means that you’re now paused in the interactive debugger and can enter a command.
Starting in Python 3.7, there’s another way to enter the debugger. PEP 553 describes the built-in function breakpoint() , which makes entering the debugger easy and consistent:
By default, breakpoint() will import pdb and call pdb.set_trace() , as shown above. However, using breakpoint() is more flexible and allows you to control debugging behavior via its API and use of the environment variable PYTHONBREAKPOINT . For example, setting PYTHONBREAKPOINT=0 in your environment will completely disable breakpoint() , thus disabling debugging. If you’re using Python 3.7 or later, I encourage you to use breakpoint() instead of pdb.set_trace() .
You can also break into the debugger, without modifying the source and using pdb.set_trace() or breakpoint() , by running Python directly from the command-line and passing the option -m pdb . If your application accepts command-line arguments, pass them as you normally would after the filename. For example:
There are a lot of pdb commands available. At the end of this tutorial, there is a list of Essential pdb Commands. For now, let’s use the p command to print a variable’s value. Enter p variable_name at the (Pdb) prompt to print its value.
Let’s look at the example. Here’s the example1.py source:
If you run this from your shell, you should get the following output:
If you’re having trouble getting the examples or your own code to run from the command line, read How Do I Make My Own Command-Line Commands Using Python? If you’re on Windows, check the Python Windows FAQ.
Now enter p filename . You should see:
Since you’re in a shell and using a CLI (command-line interface), pay attention to the characters and formatting. They’ll give you the context you need:
- > starts the 1st line and tells you which source file you’re in. After the filename, there is the current line number in parentheses. Next is the name of the function. In this example, since we’re not paused inside a function and at module level, we see <module>() .
- -> starts the 2nd line and is the current source line where Python is paused. This line hasn’t been executed yet. In this example, this is line 5 in example1.py , from the > line above.
- (Pdb) is pdb’s prompt. It’s waiting for a command.
Use the command q to quit debugging and exit.
Printing Expressions
When using the print command p , you’re passing an expression to be evaluated by Python. If you pass a variable name, pdb prints its current value. However, you can do much more to investigate the state of your running application.
In this example, the function get_path() is called. To inspect what’s happening in this function, I’ve inserted a call to pdb.set_trace() to pause execution just before it returns:
If you run this from your shell, you should get the output:
- > : We’re in the source file example2.py on line 10 in the function get_path() . This is the frame of reference the p command will use to resolve variable names, i.e. the current scope or context.
- -> : Execution has paused at return head . This line hasn’t been executed yet. This is line 10 in example2.py in the function get_path() , from the > line above.
Let’s print some expressions to look at the current state of the application. I use the command ll (longlist) initially to list the function’s source:
You can pass any valid Python expression to p for evaluation.
This is especially helpful when you are debugging and want to test an alternative implementation directly in the application at runtime.
You can also use the command pp (pretty-print) to pretty-print expressions. This is helpful if you want to print a variable or expression with a large amount of output, e.g. lists and dictionaries. Pretty-printing keeps objects on a single line if it can or breaks them onto multiple lines if they don’t fit within the allowed width.
Stepping Through Code
There are two commands you can use to step through code when debugging:
| Command | Description |
|---|---|
| n (next) | Continue execution until the next line in the current function is reached or it returns. |
| s (step) | Execute the current line and stop at the first possible occasion (either in a function that is called or in the current function). |
There’s a 3rd command named unt (until). It is related to n (next). We’ll look at it later in this tutorial in the section Continuing Execution.
The difference between n (next) and s (step) is where pdb stops.
Use n (next) to continue execution until the next line and stay within the current function, i.e. not stop in a foreign function if one is called. Think of next as “staying local” or “step over”.
Use s (step) to execute the current line and stop in a foreign function if one is called. Think of step as “step into”. If execution is stopped in another function, s will print —Call— .
Both n and s will stop execution when the end of the current function is reached and print —Return— along with the return value at the end of the next line after -> .
Let’s look at an example using both commands. Here’s the example3.py source:
If you run this from your shell and enter n , you should get the output:
With n (next), we stopped on line 15 , the next line. We “stayed local” in <module>() and “stepped over” the call to get_path() . The function is <module>() since we’re currently at module level and not paused inside another function.
With s (step), we stopped on line 6 in the function get_path() since it was called on line 14 . Notice the line —Call— after the s command.
Conveniently, pdb remembers your last command. If you’re stepping through a lot of code, you can just press Enter to repeat the last command.
Below is an example of using both s and n to step through the code. I enter s initially because I want to “step into” the function get_path() and stop. Then I enter n once to “stay local” or “step over” any other function calls and just press Enter to repeat the n command until I get to the last source line.
Note the lines —Call— and —Return— . This is pdb letting you know why execution was stopped. n (next) and s (step) will stop before a function returns. That’s why you see the —Return— lines above.
Also note ->’.’ at the end of the line after the first —Return— above:
When pdb stops at the end of a function before it returns, it also prints the return value for you. In this example it’s ‘.’ .
Listing Source Code
Don’t forget the command ll (longlist: list the whole source code for the current function or frame). It’s really helpful when you’re stepping through unfamiliar code or you just want to see the entire function for context.
Here’s an example:
To see a shorter snippet of code, use the command l (list). Without arguments, it will print 11 lines around the current line or continue the previous listing. Pass the argument . to always list 11 lines around the current line: l .
Using Breakpoints
Breakpoints are very convenient and can save you a lot of time. Instead of stepping through dozens of lines you’re not interested in, simply create a breakpoint where you want to investigate. Optionally, you can also tell pdb to break only when a certain condition is true.
Use the command b (break) to set a breakpoint. You can specify a line number or a function name where execution is stopped.
The syntax for break is:
If filename: is not specified before the line number lineno , then the current source file is used.
Note the optional 2nd argument to b : condition . This is very powerful. Imagine a situation where you wanted to break only if a certain condition existed. If you pass a Python expression as the 2nd argument, pdb will break when the expression evaluates to true. We’ll do this in an example below.
In this example, there’s a utility module util.py . Let’s set a breakpoint to stop execution in the function get_path() .
Here’s the source for the main script example4.py :
Here’s the source for the utility module util.py :
First, let’s set a breakpoint using the source filename and line number:
The command c (continue) continues execution until a breakpoint is found.
Next, let’s set a breakpoint using the function name:
Enter b with no arguments to see a list of all breakpoints:
You can disable and re-enable breakpoints using the command disable bpnumber and enable bpnumber . bpnumber is the breakpoint number from the breakpoints list’s 1st column Num . Notice the Enb column’s value change:
To delete a breakpoint, use the command cl (clear):
Now let’s use a Python expression to set a breakpoint. Imagine a situation where you wanted to break only if your troubled function received a certain input.
In this example scenario, the get_path() function is failing when it receives a relative path, i.e. the file’s path doesn’t start with / . I’ll create an expression that evaluates to true in this case and pass it to b as the 2nd argument:
After you create the breakpoint above and enter c to continue execution, pdb stops when the expression evaluates to true. The command a (args) prints the argument list of the current function.
In the example above, when you’re setting the breakpoint with a function name rather than a line number, note that the expression should use only function arguments or global variables that are available at the time the function is entered. Otherwise, the breakpoint will stop execution in the function regardless of the expression’s value.
If you need to break using an expression with a variable name located inside a function, i.e. a variable name not in the function’s argument list, specify the line number:
You can also set a temporary breakpoint using the command tbreak . It’s removed automatically when it’s first hit. It uses the same arguments as b .
Continuing Execution
So far, we’ve looked at stepping through code with n (next) and s (step) and using breakpoints with b (break) and c (continue).
There’s also a related command: unt (until).
Use unt to continue execution like c , but stop at the next line greater than the current line. Sometimes unt is more convenient and quicker to use and is exactly what you want. I’ll demonstrate this with an example below.
Let’s first look at the syntax and description for unt :
| Command | Syntax | Description |
|---|---|---|
| unt | unt(il) [lineno] | Without lineno , continue execution until the line with a number greater than the current one is reached. With lineno , continue execution until a line with a number greater or equal to that is reached. In both cases, also stop when the current frame returns. |
Depending on whether or not you pass the line number argument lineno , unt can behave in two ways:
- Without lineno , continue execution until the line with a number greater than the current one is reached. This is similar to n (next). It’s an alternate way to execute and “step over” code. The difference between n and unt is that unt stops only when a line with a number greater than the current one is reached. n will stop at the next logically executed line.
- With lineno , continue execution until a line with a number greater or equal to that is reached. This is like c (continue) with a line number argument.
In both cases, unt stops when the current frame (function) returns, just like n (next) and s (step).
The primary behavior to note with unt is that it will stop when a line number greater or equal to the current or specified line is reached.
Use unt when you want to continue execution and stop farther down in the current source file. You can treat it like a hybrid of n (next) and b (break), depending on whether you pass a line number argument or not.
In the example below, there is a function with a loop. Here, you want to continue execution of the code and stop after the loop, without stepping through each iteration of the loop or setting a breakpoint:
Here’s the example source for example4unt.py :
And the console output using unt :
The ll command was used first to print the function’s source, followed by unt . pdb remembers the last command entered, so I just pressed Enter to repeat the unt command. This continued execution through the code until a source line greater than the current line was reached.
Note in the console output above that pdb stopped only once on lines 10 and 11 . Since unt was used, execution was stopped only in the 1st iteration of the loop. However, each iteration of the loop was executed. This can be verified in the last line of output. The char variable’s value ‘y’ is equal to the last character in tail ’s value ‘example4unt.py’ .
Displaying Expressions
Similar to printing expressions with p and pp , you can use the command display [expression] to tell pdb to automatically display the value of an expression, if it changed, when execution stops. Use the command undisplay [expression] to clear a display expression.
Here’s the syntax and description for both commands:
| Command | Syntax | Description |
|---|---|---|
| display | display [expression] | Display the value of expression if it changed, each time execution stops in the current frame. Without expression , list all display expressions for the current frame. |
| undisplay | undisplay [expression] | Do not display expression any more in the current frame. Without expression , clear all display expressions for the current frame. |
Below is an example, example4display.py , demonstrating its use with a loop:
In the output above, pdb automatically displayed the value of the char variable because each time the breakpoint was hit its value had changed. Sometimes this is helpful and exactly what you want, but there’s another way to use display .
You can enter display multiple times to build a watch list of expressions. This can be easier to use than p . After adding all of the expressions you’re interested in, simply enter display to see the current values:
Python Caller ID
In this last section, we’ll build upon what we’ve learned so far and finish with a nice payoff. I use the name “caller ID” in reference to the phone system’s caller identification feature. That is exactly what this example demonstrates, except it’s applied to Python.
Here’s the source for the main script example5.py :
Here’s the utility module fileutil.py :
In this scenario, imagine there’s a large code base with a function in a utility module, get_path() , that’s being called with invalid input. However, it’s being called from many places in different packages.
How do you find who the caller is?
Use the command w (where) to print a stack trace, with the most recent frame at the bottom:
Don’t worry if this looks confusing or if you’re not sure what a stack trace or frame is. I’ll explain those terms below. It’s not as difficult as it might sound.
Since the most recent frame is at the bottom, start there and read from the bottom up. Look at the lines that start with -> , but skip the 1st instance since that’s where pdb.set_trace() was used to enter pdb in the function get_path() . In this example, the source line that called the function get_path() is:
The line above each -> contains the filename, line number (in parentheses), and function name the source line is in. So the caller is:
That’s no surprise in this small example for demonstration purposes, but imagine a large application where you’ve set a breakpoint with a condition to identify where a bad input value is originating.
Now we know how to find the caller.
But what about this stack trace and frame stuff?
A stack trace is just a list of all the frames that Python has created to keep track of function calls. A frame is a data structure Python creates when a function is called and deletes when it returns. The stack is simply an ordered list of frames or function calls at any point in time. The (function call) stack grows and shrinks throughout the life of an application as functions are called and then return.
When printed, this ordered list of frames, the stack, is called a stack trace. You can see it at any time by entering the command w , as we did above to find the caller.
To understand better and get more out of pdb, let’s look more closely at the help for w :
What does pdb mean by “current frame”?
Think of the current frame as the current function where pdb has stopped execution. In other words, the current frame is where your application is currently paused and is used as the “frame” of reference for pdb commands like p (print).
p and other commands will use the current frame for context when needed. In the case of p , the current frame will be used for looking up and printing variable references.
When pdb prints a stack trace, an arrow > indicates the current frame.
How is this useful?
You can use the two commands u (up) and d (down) to change the current frame. Combined with p , this allows you to inspect variables and state in your application at any point along the call stack in any frame.
Here’s the syntax and description for both commands:
| Command | Syntax | Description |
|---|---|---|
| u | u(p) [count] | Move the current frame count (default one) levels up in the stack trace (to an older frame). |
| d | d(own) [count] | Move the current frame count (default one) levels down in the stack trace (to a newer frame). |
Let’s look at an example using the u and d commands. In this scenario, we want to inspect the variable full_fname that’s local to the function get_file_info() in example5.py . In order to do this, we have to change the current frame up one level using the command u :
The call to pdb.set_trace() is in fileutil.py in the function get_path() , so the current frame is initially set there. You can see it in the 1st line of output above:
To access and print the local variable full_fname in the function get_file_info() in example5.py , the command u was used to move up one level:
Note in the output of u above that pdb printed the arrow > at the beginning of the 1st line. This is pdb letting you know the frame was changed and this source location is now the current frame. The variable full_fname is accessible now. Also, it’s important to realize the source line starting with -> on the 2nd line has been executed. Since the frame was moved up the stack, fileutil.get_path() has been called. Using u , we moved up the stack (in a sense, back in time) to the function example5.get_file_info() where fileutil.get_path() was called.
Continuing with the example, after full_fname was printed, the current frame was moved to its original location using d , and the local variable fname in get_path() was printed.
If we wanted to, we could have moved multiple frames at once by passing the count argument to u or d . For example, we could have moved to module level in example5.py by entering u 2 :
It’s easy to forget where you are when you’re debugging and thinking of many different things. Just remember you can always use the aptly named command w (where) to see where execution is paused and what the current frame is.
Essential pdb Commands
Once you’ve spent a little time with pdb, you’ll realize a little knowledge goes a long way. Help is always available with the h command.
Just enter h or help <topic> to get a list of all commands or help for a specific command or topic.
For quick reference, here’s a list of essential commands:
| Command | Description |
|---|---|
| p | Print the value of an expression. |
| pp | Pretty-print the value of an expression. |
| n | Continue execution until the next line in the current function is reached or it returns. |
| s | Execute the current line and stop at the first possible occasion (either in a function that is called or in the current function). |
| c | Continue execution and only stop when a breakpoint is encountered. |
| unt | Continue execution until the line with a number greater than the current one is reached. With a line number argument, continue execution until a line with a number greater or equal to that is reached. |
| l | List source code for the current file. Without arguments, list 11 lines around the current line or continue the previous listing. |
| ll | List the whole source code for the current function or frame. |
| b | With no arguments, list all breaks. With a line number argument, set a breakpoint at this line in the current file. |
| w | Print a stack trace, with the most recent frame at the bottom. An arrow indicates the current frame, which determines the context of most commands. |
| u | Move the current frame count (default one) levels up in the stack trace (to an older frame). |
| d | Move the current frame count (default one) levels down in the stack trace (to a newer frame). |
| h | See a list of available commands. |
| h <topic> | Show help for a command or topic. |
| h pdb | Show the full pdb documentation. |
| q | Quit the debugger and exit. |
Python Debugging With pdb: Conclusion
In this tutorial, we covered a few basic and common uses of pdb:
- printing expressions
- stepping through code with n (next) and s (step)
- using breakpoints
- continuing execution with unt (until)
- displaying expressions
- finding the caller of a function
I hope it’s been helpful to you. If you’re curious about learning more, see:
- pdb’s full documentation at a pdb prompt near you: (Pdb) h pdb
The source code used in the examples can be found on the associated GitHub repository. Be sure to check out our printable pdb Command Reference, which you can use as a cheat sheet while debugging:
Free Bonus: Click here to get a printable «pdb Command Reference» (PDF) that you can keep on your desk and refer to while debugging.
Also, if you’d like to try a GUI-based Python debugger, read our Python IDEs and Editors Guide to see what options will work best for you. Happy Pythoning!
Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Python Debugging With pdb
Get a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

About Nathan Jennings
Nathan is a member of the Real Python tutorial team who started his programmer career with C a long time ago, but eventually found Python. From web applications and data collection to networking and network security, he enjoys all things Pythonic.
Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:
How To Use the Python Debugger
In software development, debugging is the process of looking for and then resolving issues that prevent the software from running correctly.
The Python debugger provides a debugging environment for Python programs. It supports setting conditional breakpoints, stepping through the source code one line at a time, stack inspection, and more.
Prerequisites
You should have Python 3 installed and a programming environment set up on your computer or server. If you don’t have a programming environment set up, you can refer to the installation and setup guides for a local programming environment or for a programming environment on your server appropriate for your operating system (Ubuntu, CentOS, Debian, etc.)
Working Interactively with the Python Debugger
The Python debugger comes as part of the standard Python distribution as a module called pdb . The debugger is also extensible, and is defined as the class Pdb . You can read the official documentation of pdb to learn more.
Info: To follow along with the example code in this tutorial, open a Python interactive shell on your local system by running the python3 command. Then you can copy, paste, or edit the examples by adding them after the >>> prompt.
We’ll begin by working with a short program that has two global variables, a function that creates a nested loop, and the if __name__ == ‘__main__’: construction that will call the nested_loop() function.
We can now run this program through the Python debugger by using the following command:
The -m command-line flag will import any Python module for you and run it as a script. In this case we are importing and running the pdb module, which we pass into the command as shown above.
Upon running this command, you’ll receive the following output:
In the output, the first line contains the current module name (as indicated with <module> ) with a directory path, and the printed line number that follows (in this case it’s 1 , but if there is a comment or other non-executable line it could be a higher number). The second line shows the current line of source code that is executed here, as pdb provides an interactive console for debugging. You can use the command help to learn its commands, and help command to learn more about a specific command. Note that the pdb console is different than the Python interactive shell.
The Python debugger will automatically start over when it reaches the end of your program. Whenever you want to leave the pdb console, type the command quit or exit . If you would like to explicitly restart a program at any place within the program, you can do so with the command run .
Using the Debugger to Move through a Program
When working with programs in the Python debugger, you’re likely to use the list , step , and next commands to move through your code. We’ll go over these commands in this section.
Within the shell, we can type the command list in order to get context around the current line. From the first line of the program looping.py that we displayed above — num_list = [500, 600, 700] — that will look like the following:
The current line is indicated with the characters -> , which in our case is the first line of the program file.
Since this is a relatively short program, we receive nearly all of the program back with the list command. Without providing arguments, the list command provides 11 lines around the current line, but you can also specify which lines to include, like so:
Here, we requested that the lines 3-7 be displayed by using the command list 3, 7 .
To move through the program line by line, we can use step or next :
The difference between step and next is that step will stop within a called function, while next executes called functions to only stop at the next line of the current function. We can see this difference when we work with the function.
The step command will iterate through the loops once it gets to the running of the function, showing exactly what the loop is doing, as it will first print a number with print(number) then go through to print the letters with print(letter) , return to the number, etc:
The next command, instead, will execute the entire function without showing the step-by-step process. Let’s quit the current session with the exit command and then begin the debugger again:
Now we can work with the next command:
While going through your code, you may want to examine the value passed to a variable, which you can do with the pp command, which will pretty-print the value of the expression using the pprint module:
Most commands in pdb have shorter aliases. For step that short form is s , and for next it is n . The help command will list available aliases. You can also call the last command you called by pressing the ENTER key at the prompt.
Breakpoints
You typically will be working with larger programs than the example above, so you’ll likely be wanting to look at particular functions or lines rather than going through an entire program. By using the break command to set breakpoints, you’ll run the program up until the specified breakpoint.
When you insert a breakpoint, the debugger assigns a number to it. The numbers assigned to breakpoints are successive integers that begin with the number 1, which you can refer to when working with breakpoints.
Breakpoints can be placed at certain line numbers by following the syntax of <program_file>:<line_number> as shown in the following:
Type clear and then y to remove all current breakpoints. You can then place a breakpoint where a function is defined:
To remove current breakpoints, type clear and then y . You can also set up a condition:
Now, if we issue the continue command, the program will break when the number x is evaluated to being greater than 500 (that is, when it is set equal to 600 in the second iteration of the outer loop):
To see a list of breakpoints that are currently set to run, use the command break without any arguments. You’ll receive information about the particularities of the breakpoint(s) you’ve set:
We can also disable a breakpoint with the command disable and the number of the breakpoint. In this session, we add another breakpoint and then disable the first one:
To enable a breakpoint, use the enable command, and to remove a breakpoint entirely, use the clear command:
Breakpoints in pdb provide you with a lot of control. Some additional functionalities include ignoring breakpoints during the current iteration of the program with the ignore command (as in ignore 1 ), triggering actions to occur at a breakpoint with the commands command (as in command 1 ), and creating temporary breakpoints that are automatically cleared the first time program execution hits the point with the command tbreak (for a temporary break at line 3, for example, you could type tbreak 3 ).
Integrating pdb into Programs
You can trigger a debugging session by importing the pdb module and adding the pdb function pdb.set_trace() above the line where you would like the session to begin.
In our sample program above, we’ll add the import statement and the function where we would like to enter into the debugger. For our example, let’s add it before the nested loop.
By adding the debugger into your code you do not need to launch your program in a special way or remember to set breakpoints.
Importing the pdb module and running the pdb.set_trace() function lets you begin your program as usual and run the debugger through its execution.
Modifying Program Execution Flow
The Python debugger lets you change the flow of your program at runtime with the jump command. This lets you skip forward to prevent some code from running, or can let you go backwards to run the code again.
We’ll be working with a small program that creates a list of the letters contained in the string sammy = "sammy" :
If we run the program as usual with the python letter_list.py command, we’ll receive the following output:
With the Python debugger, let’s show how we can change the execution by first jumping ahead after the first cycle. When we do this, we’ll notice that there is a disruption of the for loop:
The above debugging session puts a break at line 5 to prevent code from continuing, then continues through code (along with pretty-printing some values of letter to show what is happening). Next, we use the jump command to skip to line 6. At this point, the variable letter is set equal to the string ‘a’ , but we jump the code that adds that to the list sammy_list . We then disable the breakpoint to proceed with the execution as usual with the continue command, so ‘a’ is never appended to sammy_list .
Next, we can quit this first session and restart the debugger to jump back within the program to re-run a statement that has already been executed. This time, we’ll run the first iteration of the for loop again in the debugger:
In the debugging session above, we added a break at line 6, and then jumped back to line 5 after continuing. We pretty-printed along the way to show that the string ‘s’ was being appended to the list sammy_list twice. We then disabled the break at line 6 and continued running the program. The output shows two values of ‘s’ appended to sammy_list .
Some jumps are prevented by the debugger, especially when jumping in and out of certain flow control statements that are undefined. For example, you cannot jump into functions before arguments are defined, and you cannot jump into the middle of a try:except statement. You also cannot jump out of a finally block.
The jump statement with the Python debugger allows you to change the execution flow while debugging a program to see whether flow control can be modified to different purposes or to better understand what issues are arising in your code.
Table of Common pdb Commands
Here is a table of useful pdb commands along with their short forms to keep in mind while working with the Python debugger.
| Command | Short form | What it does |
|---|---|---|
| args | a | Print the argument list of the current function |
| break | b | Creates a breakpoint (requires parameters) in the program execution |
| continue | c or cont | Continues program execution |
| help | h | Provides list of commands or help for a specified command |
| jump | j | Set the next line to be executed |
| list | l | Print the source code around the current line |
| next | n | Continue execution until the next line in the current function is reached or returns |
| step | s | Execute the current line, stopping at first possible occasion |
| pp | pp | Pretty-prints the value of the expression |
| quit or exit | q | Aborts the program |
| return | r | Continue execution until the current function returns |
You can read more about the commands and working with the debugger from the Python debugger documentation.
Conclusion
Debugging is an important step of any software development project. The Python debugger pdb implements an interactive debugging environment that you can use with any of your programs written in Python.
With features that let you pause your program, look at what values your variables are set to, and go through program execution in a discrete step-by-step manner, you can more fully understand what your program is doing and find bugs that exist in the logic or troubleshoot known issues.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
Tutorial Series: Debugging Python Programs
Debugging is a part of the software development process where programmers look for and then resolve issues that prevent the software from running correctly or as expected. This series will explore different methods for debugging Python programs, including how to use the Python Debugger, how to work with the code module for debugging on an interactive console, and how to use logging to debug.
Tutorial Series: How To Code in Python
Introduction
Python is a flexible and versatile programming language that can be leveraged for many use cases, with strengths in scripting, automation, data analysis, machine learning, and back-end development. It is a great tool for both new learners and experienced developers alike.
Prerequisites
You should have Python 3 installed and a programming environment set up on your computer or server. If you don’t have a programming environment set up, you can refer to the installation and setup guides for a local programming environment or for a programming environment on your server appropriate for your operating system (Ubuntu, CentOS, Debian, etc.)