Как писать скрипты bash на python
If you are using any major operating system, you are indirectly interacting with bash. If you are running Ubuntu, Linux Mint, or any other Linux distribution, you are interacting with bash every time you use the terminal. Suppose you have written your bash script that needs to be invoked from python code. The two common modules for interacting with the system terminal are os and subprocess module.
Let’s consider such a simple example, presenting a recommended approach to invoking subprocesses. As an argument, you have to pass the command you want to invoke and its arguments, all wrapped in a list.
Пишем shell скрипты на Python и можно ли заменить им Bash
В этой небольшой статье речь пойдет о том, можно ли легко использовать Python для написания скриптов вместо Bash/Sh. Первый вопрос, который возникнет у читателя, пожалуй, а почему, собственно, не использовать Bash/Sh, которые специально были для этого созданы? Созданы они были достаточно давно и, на мой взгляд, имеют достаточно специфичный синтаксис, не сильно похожий на остальные языки, который достаточно сложно запомнить, если вы не администратор 50+ левела. Помните, ли вы навскидку как написать на нем простой if?
Элементарно правда? Интуитивно понятный синтаксис. 🙂
Тем не менее в python эти конструкции намного проще. Каждый раз когда я пишу что то на баше, то непременно лезу в поисковик чтобы вспомнить как писать простой if, switch или что-то еще. Присвоение я уже запомнил. 🙂 В Python все иначе. Я хоть и не пишу на нем круглые сутки, но никогда не приходилось лезть и смотреть как там сделать простой цикл, потому что синтаксис языка простой и интуитивный. Плюс ко всему он намного ближе к остальным мейнстримовым языкам типа java или c++, чем Bash/Sh.
Также в стандартной и прочих библиотеках Python есть намного более удобные библиотеки чем консольные утилиты. Скажем, вы хотите распарсить json, xml, yaml. Знаете какой я недавно видел код в баше чтобы сделать это? Правильно:
И это был не мой код. Это был код баше/питоно нейтрального человека.
То же самое с регексом, sed бесспорно удобная утилита, но как много людей помнит как правильно ее использовать? Ну кроме Lee E. McMahon, который ее создал. Да впринципе многие помнят, даже я помню как делать простые вещи. Но, на мой взгляд, в Python модуль re намного удобнее.
В этой небольшой статье я хотел бы представить вам диалект Python который называется shellpy и служит для того, чтобы насколько это возможно заменить Bash на python в скриптах.
Shell python ничем не отличается от простого Python кроме одной детали. Выражения внутри grave accent символов ( ` ) в отличие от Python не является eval, а обозначает выполнение команды в шелле. Например
выполнит ls -l как shell команду. Также возможно написать все это без ` в конце строки
и это тоже будет корректным синтаксисом.
Можно выполнять сразу несколько команд на разных строках
и команды, занимающие несколько строк
Выполнение каждого выражения в shellpy возвращается объект класса Result
Это можно быть либо Result либо InteractiveResult (Ссылки на гитхаб с документацией, можно и потом посмотреть 🙂 ). Давайте начнем с простого результата. Из него можно легко получить код возврата выполненной команды
И текст из stdout и stderr
Можно также пробежаться по всем строкам stdout выполненной команды в цикле
Для результата есть также еще очень много синтаксического сахара. Например, мы можем легко проверить, что код возврата выполняемой команды равен нулю
Или же более простым способом получить текст из stdout
Все вышеперечисленное — это обзор синтаксиса вкратце, чтобы просто понять основную идею и не грузить вас всеми-всеми деталями. Там есть еще много чего и для интерактивного взаимодействия с выполняемыми командами, для управления исполнением команд. Но это все детали, в которые можно окунуться в документации (на английском языке), если сама идея вам покажется интересной.
Это ж не валидный синтаксис Python получается, как все работает то?
Магия конечно, как еще 🙂 Да, друзья мои, мне пришлось использовать препроцессинг, каюсь, но другого способа я не нашел. Я видел другие библиотеки, которые делают нечто подобное, не нарушая синтаксиса языка вроде
Но меня такой синтаксис не устраивал, поскольку даже несмотря на сложности, хотелось получить best user experience ©, а для меня это значит насколько это возможно простое и близкое к его величеству Шеллу написание команд.
Знакомый с темой читатель спросит, чем IPython то тебя не устроил, там ж почти как у тебя только значок другой ставить надо, может ты просто велосипедист, которому лень заглянуть в поисковик? И правда он выглядит вот так:
Я его пытался использовать но встретил пару серьезных проблем, с которыми ужиться не смог. Самая главная из них, то что нет простого импорта как в Python. То есть ты не можешь написать какой-то код на самом ipython и легко его переиспользовать в других местах. Невозможно написать для своего ipython модуля
и чтобы все сразу заработало как в сказке. Единственный способ переиспользовать скрипт, это выполнить его. После выполнения в окружении у тебя появляются все функции и переменные, объявленные в выполняемом файле. Не кошерно на мой взгляд.
В shellpy код переиспользуется легко и импортируется точно так же как и в обычном python. Предположим у нас есть модуль common в котором мы храним очень полезный код. Заглянем в директорию с этим модулем
Итак, что у нас тут есть, ну во первых init, но с расширением .spy. Это и является отличительной чертой spy модуля от обычного. Посмотрим также внутрь файла common.spy, что там интересного
Мы видим что тут объявлена функция, которая внутри себя использует shellpy синтаксис чтобы вернуть результат выполнения `echo 5. Как этот модуль используется в коде? А вот как
Видите? Как в обычном Python, просто взяли и заимпортировали.
Как же все работает. Это работает с помощью PEP 0302 — New Import Hooks. Когда вы импортируете что-то в своем коде то вначале Python спрашивает у хука, нет ли тут чего-то твоего, хук просматривает PYTHONPATH на наличие файлов *.spy или модулей shellpython. Если ничего нет, то так и говорит: "Ничего нету, импортируй сам". Если же он находит что-то там, то хук занимается импортом самостоятельно. А именно, он делает препроцессинг файла в обычный python и складывает все это добро в temp директорию операционной системы. Записав новый Python файл или модуль он добавляет его в PYTHONPATH и за дело берется уже самый обыкновенный импорт.
Давайте же скорее посмотрим на какой-нибудь пример
Этот скрипт скачивает аватар юзера Python с Github и кладет его в temp директорию
Shellpython можно установить двумя способами: pip install shellpy или склонировав репозиторий и выполнив setup.py install . После этого у вас появится утилита shellpy .
Запустим же что-нибудь
После установки можно потестировать shellpython на примерах, которые доступны прямо в репозитории.
Также здесь есть allinone примеры, которые называются так, потому что тестируют все-все функции, которые есть в shellpy. Загляните туда, чтобы лучше узнать что же там еще такого есть, либо просто выполните
Для третьего Python команда выглядит вот так
Это работает на Linux и должно работать на Mac для Python 2.x и 3.x. На виндовсе пока не работает, но проблем никаких для работы нет, так как все писалось с использованием кроссплатформенных библиотек и ничего платформоспецифичного в коде нет. Просто не дошли руки еще, чтобы потестировать на виндовсе. Мака у меня тоже нет, но вроде у друга работало 🙂 Если у вас есть мак и у вас все нормально, скажите пожалуйста.
Если найдете проблемы — пишите в коммент, либо сюда либо телеграфируйте как-нибудь 🙂
Документация (на английском)
Можно ли законтрибьютить
Оно мне ничего в продакшене не разломает?
Сейчас версия 0.4.0, это не стейбл и продакшн процессы пока лучше не завязывать на скрипт, подождав пока все отладится. Но в девелопменте, CI можно использовать вполне. Все это покрыто тестами и работает 🙂
Writing “shell scripts” in Python with a funny lib Plumbum
The ease of writing text to a file or reading a file charmed me. Just (echo[some_string] > filename)() and it’s written. Similarly reading a file: contents = cat[filename]() and that’s it. Piping is quite nice too and allows you to dissect it into parts without much change.
While I wouldn’t recommend using the above for regular file writing or reading (going through invocations of another programs), this syntax enables you to transform a Bash script to Python script in a way that the result is very similar to the original.
An icing on the cake is the touch of magic that resides in plumbum.cmd . You can import anything from it:
It is equivalent to:
Even when a command has a dash — in it, you can just import the name with underscores in dashes’ places:
Why even bother writing shell scripts in Python?
Bash is concise, but if your script grows too big or has more advanced logic, it becomes a hardly testable abomination that strikes fear into the hearts of fellow programmers. When a Bash script grows too big for it’s own good, the solution is to use a high level language like Python that has capability to execute shell commands. The code becomes clean and testable (just mock the shell commands).
There are multiple ways to do it in Python. The most basic and quite compact is by using subprocess , like:
But then, as you can see, those are bytes. So:
The problem that I have with this module is that it’s just way uglier than it could be. It’s like os + sys vs pathlib . Ugh.
If you already use Fabric for deployments, why not use api.local() then? Seems compact too:
It prints to stderr what command was used ( [localhost] local: which bash ). You can disable this:
Then again, it doesn’t add newline at the end of commands (convenient at times, but detrimental at another — for example: read then write a file, seemingly without a change), so you have to add it yourself:
Name already in use
Work fast with our official CLI. Learn more about the CLI.
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Failed to load latest commit information.
Replacing Bash Scripting with Python
If I didn’t cover something you want to know about or you find another problem, please open an issue on github!
The Unix shell is one of my favorite inventions ever. It’s genius, plain and simple. The idea is that the user environment is a Turing-complete, imperative programming language. It has a dead-simple model for dealing with I/O and concurrency, which are notoriously difficult in most other languages.
For problems where the data can be expressed as a stream of similar objects separated by newlines to be processed concurrently through a series of filters and handles a lot of I/O, it’s difficult to think of a more ideal language than the shell. A lot of the core parts on a Unix or Linux system is designed to express data in such formats.
This tutorial is NOT about getting rid of bash altogether! In fact, one of the main goals of the section on Command-Line Interfaces is to show how to write programs that integrate well with the process orchestration faculties of the shell.
The problem is if you want to do basically anything else, e.g. write logic, use control structures, handle data. You’re going to have big problems. When Bash is coordinating external programs, it’s fantastic. When it’s doing any work whatsoever itself, it disintegrates into a pile of garbage.
For me, the fundamental problem with Bash and many other shell dialects is that text is identifiers and identifiers are text — and basically everything else is also text. In some sense, this makes the shell a homoiconic language, which theoretically means it might have an interesting metaprogramming story, until you realize that it basically just amounts to running eval on strings, which is a feature in basically any interpreted language today, and one that is frequently considered harmful. The problem with eval is that it’s a pretty direct path to arbitrary code execution. This is great if arbitrary code execution is actually what you’re trying to accomplish (like, say, in an HTML template engine), but it’s not generally what you want.
Bash basically defaults to evaling everything. This is very handy for interactive use, since it cuts down in the need for a lot of explicit syntax when all you really want to do is, say, open a file in a text editor. This is pretty darn bad in a scripting context because it turns the entire language into an injection honeypot. Yes, it is possible and not so difficult to write safe Bash once you know the tricks, but it takes extra consideration and it is easy to forget or be lazy about it. Writing three or four lines of safe Bash is easy; two-hundred is quite a bit more challenging.
Bash has other problems. The syntax that isn’t native to the Bourne Shell feels really ugly and bolted-on. For example, most modern shells have arrays. Let’s look at the syntax for iterating on an array, but let’s take the long way there.
What does this have to do with iterating on arrays? Unfortunately, the answer is «something.»
To properly iterate on the strings inside of an array (the only thing which an array can possibly contain), you also use variable interpolation syntax.
Why would string interpolation syntax ever be used to iterate over items in an array? I have some theories, but they are only that. I could tell you, but it wouldn’t make this syntax any less awful. If you’re not too familiar with Bash, you may also (rightly) wonder what this @ is, or why everything is in curly braces.
The answer to all these questions is more or less that they didn’t want to do anything that would break compatibility with ancient Unix shell scripts, which didn’t have these features. Everything just got shoe-horned in with the weirdest syntax you can imagine. Bash actually has a lot of features of modern programming languages, but the problem is that the syntax provided to access them is completely contrary to logic and dictated by legacy concerns.
The Bash IRC channel has a very helpful bot, greybot, written by one of the more important Bash community members and experts, greycat. This bot is written in Perl. I once asked why it wasn’t written in Bash, and only got one answer: «greycat wanted to remain sane.»
And really, that answer should be enough. Do you want to remain sane? Do you want people who maintain your code in the future not to curse your name? Don’t use Bash. Do your part in the battle against mental illness.
Ok, that was a little hyperbolic. For an opinion about when it’s aright to use Bash, see: Epilogue: Choose the right tool for the job.
No particular reason. Perl and Ruby are also flexible, easy-to-write languages that have robust support for administrative scripting and automation. I would recommend against Perl for beginners because it has some similar issues to Bash: it was a much smaller language when it was created, and a lot of the syntax for the newer features has a bolted-on feeling . However, if one knows Perl well and is comfortable with it, it’s well suited to the task and is still a much saner choice for non-trivial automation scripts, and that is one of its strongest domains.
The main reason I would recommend Python is if you already know it. If you don’t know anything besides BASH (or BASH and lower-level languages like C or even Java), Python is a reasonable choice for your next language. It has a lot of mature, fast third-party libraries in a lot of domains — science, math, web, machine learning, etc. It’s also generally considered easy to learn and has become a major teaching language.
The other very compelling reason to learn Python is that it is the language covered in this very compelling tutorial.
|||I’m referring specifically to Perl 5 here. Perl 6 is a better language, in my opinion, but suffers from a lack of adoption. https://perl6.org/|
This tutorial isn’t going to teach you the Python core language, though a few built-in features will be covered. If you need to learn it, I highly recommend the official tutorial, at least through chapter 5. Through chapter 9 would be even better, and you might as well just read the whole thing at that point.
If you’re new to programming, you might try the book Introducing Python or perhaps Think Python. Dive Into Python is another popular book that is available for free online. You may see a lot of recommendations for Learn Python the Hard Way. I think this method is flawed, though I do appreciate that it was written by someone with strong opinions about correctness, which has some benefits.
This tutorial assumes Python 3.5 or higher, though it may sometimes use idioms from newer versions, and I will attempt to document when have used an idiom which doesn’t work in 3.4, which is apparently the version that ships with the latest CentOS and SLES. Use at least 3.6 if you can. It has some cool new features, but the implementation of dictionaries (Python’s hash map) was also overhauled in this version of Python, which sort of undergirds the way the whole object system is implemented and therefore is a major win all around.
Basically, always try to use whatever the latest version of Python is. Do not use Python 2. It will be officially retired in 2020. That’s two years. If a library hasn’t been ported to Python 3 yet, it’s already dead, just that its maintainers might not know it yet.
One last note about this tutorial: It doesn’t explain so much. I have no desire to rewrite things that are already in the official documentation. It frequently just points to the relevant documentation for those wishing to do the kinds of tasks that Bash scripting is commonly used for.
If you’re going to do any kind of administration or automation on a Unix system, the idea of working with files is pretty central. The great coreutils like grep , sed , awk , tr , sort , etc., they are all designed to go over text files line by line and do. something with the content of that line. Any shell scripter knows that these «files» aren’t always really files. Often as not, it’s really dealing with the output of another process and not a file at all. Whatever the source, the organizing principle is streams of text divided by newline characters. In Python, this is what we’d call a «file-like object.»
Because the idea of working with text streams is so central to Unix programming, we start this tutorial with the basics of working with text files and will go from there to other streams you might want to work with.
One handy thing in the shell is that you never really need file handles. All you have to type to loop over lines in a file would be something like:
(Don’t use this code. You actually have to do some things with $IFS to make it safe. Don’t use any of my Bash examples. Don’t use Bash! The proper one is while IFS= read -r line , but that just raises more questions.)
In Python, you need to turn a path into a file object. The above loop would be something like this:
Let’s take that apart.
The open() function returns a file object. If you just send it the path name as a string, it’s going to assume it’s a text file in the default system encoding (UTF-8, right?), and it is opened only for reading. You can, of course, do my_file = open(‘my_file.txt’) as well. When you use with x as y: instead of assignment, it ensures the object is properly cleaned up when the block is exited using something called a «context manager». You can do my_file.close() manually, but the with block will ensure that happens even if you hit an error without having to write a lot of extra code.
The gross thing about context managers is that they add an extra level of indentation. Here’s a helper function you can use to open a context manager for something you want to be cleaned up after you loop.
and then you use it like this:
yield from means it’s a generator function, and it’s handing over control to a sub-iterator (the file object, in this case) until that iterator runs out of things to return. Don’t worry if that doesn’t make sense. It’s a more advanced Python topic and not necessary for administrative scripting.
If you don’t want to iterate on lines, which is the most memory-efficient way to deal with text files, you can slurp entire contents of a file at once like this:
You can also open files for writing with, like this:
The second argument of open() is the mode. The default mode is ‘r’ , which opens the file for reading text. ‘w’ deletes everything in the file (or creates it if it doesn’t exist) and opens it for writing. You can also use the mode ‘a’ . This goes to the end of a file and adds text there. In shell terms, ‘r’ is a bit like < , ‘w’ is a bit like > , and ‘a’ is a bit like >> .
This is just the beginning of what you can do with files. If you want to know all their methods and modes, check the official tutorial’s section on reading and writing files. File objects provide a lot of cool interfaces. These interfaces will come back with other «file-like objects» which will come up many times later, including in the very next section.
Unix scripting is all about filtering text streams. You have a stream that comes from lines in a file or output of a program and you pipe it through other programs. Unix has a bunch of special-purpose programs just for filtering text (some of the more popular of which are enumerated at the beginning of the previous chapter). Everyone using a *nix system has probably done something like this at one point or another:
This is the «normal» way to search through the output of a program for lines containing whatever it is you’re searching for. Your setting the stdout of program-that-prints-something to the stdin of grep .
Great CLI scripts should follow the same pattern so you can incorporate them into your shell pipelines. You can, of course, write your script with its own «interactive» interface and read lines of user input one at a time:
This is fine in some cases, but it doesn’t really promote the creation of reusable, multi-purpose filters. With that in mind, allow me to introduce the sys module.
The sys module has all kinds of great things as well as all kinds of things you shouldn’t really be messing with. We’re going to start with sys.stdin .
sys.stdin is a file-like object that, you guessed it, allows you to read from your script’s stdin . In Bash you’d write:
In Python, that looks like this:
Naturally, you can also slurp stdin in one go, though this isn’t the most Unix-y design choice and you could use up your RAM with a very large file:
As far as stdout is concerned, you can access it directly if you like, but you’ll typically just use the print() function.
Anything you print can be piped to another process. Pipelines are great. For stderr, it’s a similar story:
If you want more advanced logging functions, check out the logging module.
Using stdin , stdout and stderr , you can write python programs which behave as filters and integrate well into a Unix workflow.
Arguments are passed to your program as a list which you can access using sys.argv . This is a bit like $@ in Bash, or $1 $2 $3. etc. e.g.:
looks like this in Python:
Why sys.argv[1:] ? sys.argv is like $0 in Bash or argv in C. It’s the name of the executable. Just a refresher (because you read the tutorial, right?) a_list[1:] is list-slice syntax that returns a new list starting on the second item of a_list , going through to the end.
If you want to build a more complete set of flags and arguments for a CLI program, the standard library module for that is argparse. The tutorial in that link leaves out some useful info, so here are the API docs. click is a popular and powerful third-party module for building even more advanced CLI interfaces.
Ok, environment variables and config files aren’t necessarily only part of CLI interfaces, but they are part of the user interface in general, so I stuck them here. Environment variables are in the os.environ mapping, so you get to $HOME like this:
As far as config files, in Bash, you frequently just do a bunch of variable assignments inside of a file and source it. You can also just write valid python files and import them as modules or eval them. but don’t do that. Arbitrary code execution in a config file is generally not what you want.
The standard library includes configparser, which is a parser for .ini files, and also a json parser. I don’t really like the idea of human-edited json, but go ahead and shoot yourself in the foot if you want to. At least it’s flexible.
PyYAML, the YAML parser, and TOML are third-party libraries that are useful for configuration files.
So far, we’ve only seen paths as strings being passed to the open() function. You can certainly use strings for your paths, and the os and os.path modules contain a lot of portable functions for manipulating paths as strings. However, since Python 3.4, we have pathlib.Path, a portable, abstract type for dealing with file paths, which will be the focus of path manipulation in this tutorial.
Again, check out the documentation for more info. pathlib.Path. Since pathlib came out, more and more builtin functions and functions in the standard library that take a path name as a string argument can also take a Path instance. If you find a function that doesn’t, or you’re on an older version of Python, you can always get a string for a path that is correct for your platform by using str(my_path) . If you need a file operation that isn’t provided by the Path instance, check the docs for os.path and os and see if they can help you out. In fact, os is always a good place to look if you’re doing system-level stuff with permissions and UIDs and so forth.
If you’re doing globbing with a Path instance, be aware that, like ZSH, ** may be used to glob recursively. It also (unlike the shell) will include hidden files (files whose names begin with a dot). Given this and the other kinds of attribute testing you can do on Path instances, it can do a lot of the kinds of stuff find can do.
Oh. Almost forgot. p.stat() , as you can see, returns an os.stat_result instance. One thing to be aware of is that the st_mode , (i.e. permissions bits) is represented as an integer, so you might need to do something like oct(p.stat().st_mode) to show what that number will look like in octal, which is how you set it with chmod in the shell.
There are certain file operations which are really easy in the shell, but less nice than you might think if you’re using python file objects or the basic system calls in the os module. Sure, you can rename a file with os.rename() , but if you use mv in the shell, it will check if you’re moving to a different file system, and if so, copy the data and delete the source — and it can do that recursively without much fuss. shutil is the standard library module that fills in the gaps. The docstring gives a good summary: «Utility functions for copying and archiving files and directory trees.»
Here’s the overview:
That’s the thousand-foot view of the high-level functions you’ll normally be using. The module documentation is pretty good for examples, but it also has a lot of details about the functions used to implement the higher-level stuff I’ve shown which may or may not be interesting.
I should probably also mention os.link and os.symlink at this point. They create hard and soft links respectively (like link and link -s in the shell). Path instances also have .symlink_to() method, if you want that.
This section is for people who know how to use programs like sed , grep and awk and wish to get similar results in Python, though short explanations will be provided of what those utilities are commonly used for. The intent is not that you should use Python wherever you might use one-liners with these programs in the course of normal shell usage (or in the the middle of the kinds of process orchestration scripts that Bash does so well). The idea is rather that, when writing a Python script, you won’t be tempted to shell out for text processing.
I admit that writing simple text filters in Python will never be as elegant as it is in Perl, since Perl was more or less created to be like a super-powered version of the sh + awk + sed . The same thing can sort of be said about awk , the original text-filtering language on Unix. The main reason to use Python for these tasks is that the project is going to scale a lot more easily when you want to do something a bit more complex.
Another thing to keep in mind is that python has built-in operations that you can use if you just need to match a string, rather than a regular expression. Simple string operations are much faster than regular expressions, though not as powerful.
grep is the Unix utility that goes through each line of a file, tests if it contains a certain pattern, and then prints the lines that match. If you’re a programmer and you don’t use grep , start using it! Retrieving matching lines in a file is easy with Python, so we’ll start there.
If you don’t need pattern matching (i.e. something you could do with fgrep ), you don’t need regex to match a substring. You can simply use built-in syntax:
Otherwise, you need the regex module to match things:
I’m not going to go into the details of the «match object» that is returned at the moment. The main thing for now is that it evaluates to True in a boolean context. You may also notice I use raw strings r» . This is to keep Python’s normal escape sequences from being interpreted, since regex uses its own escapes.
So, to use these to filter through strings:
an_iterable_containing_strings here could be a list, a generator or even a file/file-like object. Anything that will give you strings when you iterate on it. I use generator expression syntax here instead of a list comprehension because that means each result is produced as needed with lazy evaluation. This will save your RAM if you’re working with a large file. You can invert the result, like grep -v simply by adding not to the if clause. There are also flags you can add to do things like ignoring the case ( flags=re.I ), etc. Check out the docs for more.
Say you want to look through the log file of a certain service on your system for errors. With grep, you might do something like this:
This will search through /var/log/some_service.log for any line containing the string error: , ignoring case. To do the same thing in Python:
The difference here is that the bash version will print all the lines, and the python version is just holding on to them for further processing. If you want to print them, the next step is print(*matches) or for line in matches: print(line, end=») . However, this is in the context of a script, so you probably want to extract further information from the line and do something programmatically with it anyway.
sed can do a LOT of things. It’s more or less «text editor» without a window. Instead of editing text manually, you give sed instructions about changes to apply to lines, and it does it all in one shot. (The default is to print what the file would look like with modification. The file isn’t actually changed unless you use a special flag.)
I’m not going to cover all of that. Back when I wrote more shell scripts and less Python, the vast majority of my uses for sed were simply to use the substitution facilities to change instances of one pattern into something else, which is what I cover here.
re.sub has a lot of additional features, including the ability to use a function instead of a string for the replacement argument. I consider this to be very useful. If you’re new to regex, note especially the section about backreferences in replacements. You may wish to check the section in the regex HOWTO about Search and Replace as well.
The sed section needed a little disclaimer. The awk section needs a bigger one. AWK is a Turing-complete text/table processing language. I’m not going to cover how to do everything AWK can do with Python idioms. 
However, inside of shell scripts, it’s most frequently used to extract fields from tabular data, such as tsv files. Basically, it’s used to split strings.
As is implied in this example, the str.split method splits on sections of contiguous whitespace by default. Otherwise, it will split on whatever is given as a delimiter. For more on splitting with regular expressions, see re.split and Splitting Strings.
|||It has been pointed out to me that sed is also Turing complete, and it seems to be the case. However, implementing algorithms in sed is not nice. AWK is really a rather pleasant language.|
I come to this section at the end of the tutorial because one generally should not be running a lot of processes inside of a Python script. One common strategy in the realm of complex administrative tasks is to do the orchestration in bash and hand data handling off to Python, which is one of the reasons it’s important for your program to have a good command-line interface. If you can read data from stdin and print to stdout and stderr, you’re in good shape!
However, there are times when this model of separation of domains between Python and the shell is not practical, and it’s easier simply to execute the external program from inside your Python script. Practicality beats purity.
Say you want to do some automation with packages on your system; you’d be nuts not to use apt or yum (spelled dnf these days) or whatever your package manager is. Same applies if you’re doing mkfs or using a very mature and featureful program like rsync . My general rule is that any kind of filtering utility should be avoided, but specialized programs for manipulating the system are fair game — However, in some cases, there will be a 3rd-party Python library that provides a wrapper on the underlying C code. The library will, of course, be faster than spawning a new process in most cases. Use your best judgment. Be extra judicious if you’re trying to write re-usable library code.
Another thing to keep in mind (and this goes for the shell as well, it’s just much more difficult to avoid it), is don’t spawn processes inside of hot loops. Spawning new processes is a relatively expensive job for the operating system. Spawning one instance or even ten is no big deal (depending on the program, of course). Spawning a process thousands or millions of times in a loop, no matter how lightweight the process is, is a terrible idea. On the other hand, using an optimized C program that can do a lot of work at one shot may well be faster than trying to do the same work natively in Python (provided there is no well-supported C library for Python).
There are a number of functions which shall not be named in the os module that can be used to spawn processes. They have a variety of problems. Some run processes in subshells (c.f. injection vulnerabilities). Some are thin wrappers on system calls in libc, which you may want to use if you implement your own processes library, but are not particularly fun to use. Some are simply older interfaces left in for legacy reasons, which have actually been re-implemented on top of the new module you’re supposed to use, subprocess. For administrative scripting, just use subprocess directly.
This tutorial focuses on using the Popen constructor and the run function, the latter of which was only added in Python 3.5. If You are using Python 3.4 or earlier, you need to use the old API, though a lot of what is said here will still be relevant.
The Popen API (over which the run function is a thin wrapper) is a very flexible, securely designed interface for running processes. Most importantly, it doesn’t open a subshell by default. That’s right, it’s completely safe from shell injection vulnerabilities — or, the injection vulnerabilities are opt-in. There’s always the shell=True option if you’re determined to write bad code.
On the other hand, it is a little cumbersome to work with, so there are a lot of third-party libraries to simplify it. Plumbum is probably the most popular of these. Sarge is also not bad. My own contribution to the field is easyproc (though the documentation needs to be completely rewritten).
There are also a couple of Python supersets that allow inlining shell commands in python code. xonsh is one, which also provides a fully functional interactive system shell experience and is the program that runs every time I open a terminal. I highly recommend it!
Anyway, on with the show.
As you see, the first and only required argument of the run function is a list (or any other iterable) of command arguments. stdout is not captured, it just goes wherever the stdout of the script goes. What is returned is a CompletedProcess instance, which has an args attribute and a returncode attribute. More attributes may also become available when certain keyword arguments are used with run .
Unlike most other things in Python, a process that fails doesn’t raise an exception by default.
This is the same way it works in the shell. However, you usually are going to want your script to stop if your command didn’t work, or at least try something else. You could, do this manually:
This would be most useful in cases where a non-zero exit code indicates something other than an error. For example, grep returns 1 if no lines were matched. Not really an error, but something you might want to check for.
However, in the majority of cases, you probably want a non-zero exit code to crash the program, especially during development. This is where you need the check parameter:
Much better! You can also use normal Python exception handling now, if you like.
If you want to capture the output of a process, you need to use the stdout parameter. If you wanted to redirect it to a file, it’s pretty straight-forward:
Pretty similar with input:
If you want to do something with input and output text inside the script itself, you need to use the special constant, subprocess.PIPE .
What’s this now? Oh, right. Streams to and from processes default to bytes, not strings. You can decode your string, or you can use the flag to ensure the stream is a python string, which, in their infinite wisdom, the authors of the subprocess module chose to call universal_newlines , as if that’s the most important distinction between bytes and strings in Python. Update: as of Python 3.7, `universal_newlines` is aliased to `text`
So that’s awkward. In fact, this madness was one of my primary motivations for writing easyproc.
If you want to send a string to the stdin of a process, you will use a different run parameter, input (again, requires bytes unless universal_newlines=True ).
Just as there is an stdout parameter, there is also an stderr parameter for dealing with messages from the process. It works as expected:
However, another common thing to do with stderr in administrative scripts is to combine it with stdout using the oh-so-memorable incantation shell incantation of 2>&1 . subprocess has a thing for that, too, the STDOUT constant.
You can also redirect stdout and stderr to /dev/null with the constant subprocess.DEVNULL .
There’s a lot more you can do with the run function, but that should be enough to be getting on with.
subprocess.run starts a process, waits for it to finish, and then returns a CompletedProcess instance that has information about what happened. This is probably what you want in most cases. However, if you want processes to run in the background or need to interact with them while they continue to run, you need the the Popen constructor.
If you simply want to start a process in the background while you get on with your script, it’s a lot like run .
This isn’t quite the same as backgrounding a process in the shell using & . I haven’t looked into what happens technically, but I can tell you that the process will keep going even if the terminal it was started from is closed. It’s a bit like nohup . However, if not redirected, stdout and stderr will still be printed to that terminal.
Other reasons to do this might be to kick off a process at the beginning of the script that you need output from, and then come back to it later to minimize wait-time. For example, I use a Python script to generate my ZSH prompt. Among other things, this script checks the git status of the folder. However, that can take some time and I want the script to do as much work as possible while it’s waiting on those commands.
Notice that stdout in this case is not a string. It’s a file-like object. This is perfect for dealing with output from a program line-by-line, as many system utilities do. This is particularly important if the program produces a lot of lines of output and reading the whole thing into a Python string could potentially use up a lot of RAM. It’s also useful for long-running programs that may produce output slowly, but you want to process it as it comes. e.g.:
You can also use this mechanism to pipe processes together, though the cases when you need to do this in python should be rare, since text filtering is best done in python itself. A case where you might want to pipe processes together could be extracting the content of an rpm package:
The subprocess module, as mentioned earlier, is safe from injection by default, unless shell=True is used. However, there are some programs that will give arguments to a shell after they are started. SSH is a classic example. Every argument you send with ssh gets parsed by a shell on the remote system.
As soon as a process gets a shell, you’re giving up one of the main benefits of using Python in the first place. You get back into the realm of injection vulnerabilities.
Basically, instead of this:
You need to do something like this:
shlex.quote will ensure that any spaces or shell metacharacters are properly escaped. The only trouble with it is that you actually have to remember to use it.
The shlex module also has a split function which will split a string into a list the same way the shell would split arguments. This is useful if you have a string that looks like a shell command and you want to send it to subprocess.run or subprocess.Popen .
This is where all the stuff goes that doesn’t really need detailed coverage in this tutorial, but it’s something you need to do often enough in shell scripts that it deserves pointers to additional resources.
In administrative scripting, one frequently wants to put a timestamp in a file name for naming logs or whatever. In a shell script, you just use the output of date for this. Python has two libraries for dealing with time, and either is good enough to handle this. The time module wraps time functions in libc. If you want to get a timestamp out of it, you do something like this:
This can use any of the format spec you see when you run $ man date . There is also a time.strptime function which will take a string as input and use the same kind of format string to parse the time out of it and into a tuple.
The datetime module provides classes for working with time at a high level. It’s a little cumbersome for very simple things, and incredibly helpful for more sophisticated things like math involving time. The one handy thing it can do for our case is to give us a string of the current time without the need for a format specifier.
This means that, if you’re happy with the default string representation of the datetime class, you can just do str(datetime.datetime.now()) to get the current timestamp. There is also a datetime.datetime.strptime() to generate a datetime instance from a timestamp.
I’m not sure if IPC is really part of bash scripting, but sometimes administrators might need to write a daemon or whatever that runs in the background, but is still able to receive communication from the user via a client.
The simplest way to do this is with a fifo, a.k.a. a named pipe.
That’s your server that you start with your init system. The simplest client could just be echo; echo some text > /tmp/myfifo . Of course, you can do a lot more with the client if you like. The limitation of a fifo is that it’s one-way communication. If you want two-way, you need two fifos. Alternatively, use a TCP socket.
Python has a dead-simple library for making a socket server, aptly named socketserver. Scroll down to the examples and they have basically everything you need to know for implementing your server and client. For a daemon that you’re just interacting with over localhost, you’re going to get better performance using the UnixStreamServer class, and you won’t use up a port. Plus, Unix sockets will make your Unix beard grow better.
The problem with either of these is that they just block until they get a message (unless you use the threaded socket server, which might be fine in some cases). If you want your daemon to do work while simultaneously listening for input, you need threads or asyncio. Unfortunately for you, this tutorial is about replacing Bash with Python, and I’m not about to try to teach you concurrency.
I’ll just say that the python threading module is fine for IO-bound multitasking on a small scale. If you need something large-scale, use asyncio. If you need real concurrent execution, know that Python threads are a lie, and asyncio doesn’t do that. You need multiprocessing. If you need concurrent execution, but processes are too expensive, use another programming language. Python has limitations in this area.
If you’re doing any kind of fancy http requests that require things like interacting with APIs, shooting data around, doing authentication, or basically anything besides downloading static assets, use requests. In fact, you should probably even use it for the simple case of downloading things. However, this is also possible with the standard library, and not particularly painful.
One of the main criticism of this tutorial (I suspect from people who haven’t read it very well) is that it goes against the philosophy of using the best tool for the job. My intention is not that people rewrite all existing Bash in Python (though sometimes rewrites might be a net gain), nor am I attempting to get people to entirely stop writing new Bash scripts.
The tutorial has also been accused of being a «commercial for Python.» I would have thought the Why Python? section would show that this is not the case, but if not, let me reiterate: Python is one of many languages well suited to administrative scripting. The others also provide a safer, clearer way to deal with data than the shell. My goal is not to get people to use Python as much as it is to try to get people to stop handling data in shell scripts.
The «founding fathers» of Unix had already recognized the fundamental limitations of the Bourne shell for handling data and created AWK, a complementary, string-centric data parsing language. Modern Bash, on the other hand, has added a lot of data related features which make it possible to do many of the things you might do in AWK directly in Bash. Do not use them. They are ugly and difficult to get right. Use AWK instead, or Perl or Python or whatever.
I do believe that for a program which deals primarily with starting processes and connecting their inputs and outputs, as well as certain kinds of file management tasks, the shell should still be the first candidate. A good example might be setting up a server. I keep config files for my shell environment in Git (like any sane person), and I use sh for all the setup. That’s fine. In fact, it’s great. Running some commands and symlinking files is a usecase that fits perfectly to the strengths of the shell.
I also have shell scripts for automating certain parts of my build, testing and publishing workflow for my programming, and I will probably continue to use such scripts for a long time. (I also use Python for some of that stuff. Depends on the nature of the task.)
Many people have rule about the length of their Bash scripts. It is oft repeated on the Internet that, «If your shell script gets to fifty lines, rewrite in another language,» or something similar. The number of lines varies from 10 to 20 to 50 to 100. Among the Unix old guard, «another language» is basically always Perl. I like Python because reasons, but the important thing is that it’s not Bash.
This kind of rule isn’t too bad. Length isn’t the problem, but length can be a side-effect of complexity, and complexity is sort of the arch-enemy of Bash. I look for the use of certain features to be an indicator that it’s time to consider a rewrite. (note that «rewrite» can mean moving certain parts of the logic into another language while still doing orchestration in Bash). These «warning signs are» listed in order of more to less serious.
- If you ever need to type the characters IFS= , rewrite immediately. You’re on the highway to Hell.
- If data is being stored in Bash arrays, either refactor so the data can be streamed through pipelines or use a different language. As with IFS , it means you’re entering the wild world of the shell’s string splitting rules. That’s not the world for you.
- If you find yourself using braced parameter expansion syntax, $
, and anything is between those braces besides the name of your variable, it’s a bad sign. For one, it means you might be using an array, and that’s not good. If you’re not using an array, it means you’re using the shell’s string manipulation capabilities. There are cases where this might be allowable (determining the basename of a file, for example), but the syntax for that kind of thing is very strange, and so many other languages supply better string manipulating tools. If you’re doing batch file renaming, pathlib provides a much saner interface, in my opinion.
- Dealing with process output in a loop is not a great idea. If you HAVE to do it, the only right way is with while IFS= read -r line . Don’t listen to anyone who tells you differently, ever. Always try to refactor this case as a one-liner with AWK or Perl, or write a script in another language to process the data and call it from Bash. If you have a loop like this, and you are starting any processes inside the loop, you will have major performance problems. This will eventually lead to refactoring with Bash built-ins. In the final stages, it results in madness and suicide.
- Bash functions, while occasionally useful, can be a sign of trouble. All the variables are global by default. It also means there is enough complexity that you can’t do it with a completely linear control flow. That’s also not a good sign for Bash. A few Bash functions might be alright, but it’s a warning sign.
- Conditional logic, while it can definitely be useful, is also a sign of increasing complexity. As with functions, using it doesn’t mean you have to rewrite, but every time you write one, you should ask yourself the question as to whether the task you’re doing isn’t better suited to another language.
Finally, whenever you use a $ in Bash (parameter expansion), you must use quotation marks. Always only ever use quotation marks. Never forget. Never be lazy. This is a security hazard. As previously mentioned, Bash is an injection honeypot. There are a few cases where you don’t need the quotation marks. They are the exceptions. Do not learn them. Just use quotes all the time. It is always correct.