In scripts written to automate certain tasks, we often want to run external programs and monitor their execution. When working with Python, we can use the subprocess module to create such scripts. This module is part of the standard language library. In this tutorial we will take a brief look at subprocess and learn the basics of its usage. After reading the article you will learn how to:
- Use the
run
function to start an external process. - Get standard process output and error information.
- Check the process return code and raise an exception in case of failure.
- Run the process using the shell as an intermediary.
- Set the time to wait for the process to complete.
- Use the
Popen
class directly to create apipe
between two processes.
Since the subprocess module is almost always used with Linux all the examples will be about Ubuntu. For Windows users I suggest to download Ubuntu 18.04 LTS terminal.
Table of Contents
The “run” function
The run
function was added to the subprocess module only in relatively recent versions of Python (3.5). Its use is now the recommended way to create processes and should solve the most common problems. First of all, let’s look at the simplest case of using the run
function. Suppose we want to run the command ls -al
; to do this we need to type the following instructions in the Python shell:
>>> import subprocess
>>> process = subprocess.run(['ls', '-l', '-a'])
The output of the external ls command is shown on the screen:
total 12
drwxr-xr-x 1 cnc cnc 4096 Apr 27 16:21 .
drwxr-xr-x 1 root root 4096 Apr 27 15:40 .
-rw------- 1 cnc cnc 2445 May 6 17:43 .bash_history
-rw-r--r-- 1 cnc cnc 220 Apr 27 15:40 .bash_logout
-rw-r--r-- 1 cnc cnc 3771 Apr 27 15:40 .bashrc
Here we just used the first mandatory argument of the run
function, which can be a sequence “describing” the command and its arguments (as in the example), or a string to be used when running with the argument shell=True
(we will consider the latter case later).
Capturing command output: stdout and stderr
What if we don’t want the process output to be displayed. We want instead to keep it: it can be referenced after the process has left? In this case we should set the capture_output
function argument to True
:
>>> process = subprocess.run(['ls', '-l', '-a'], capture_output=True)
How can we get the output(stdout
and stderr
) of a process afterwards? If you look at the above examples, you will see that we used the process
variable to reference the CompletedProcess
object returned by the run
function. This object represents the process run by the function and has many useful properties. Among others, stdout
and stderr
are used to “store” the corresponding command descriptors if, as already mentioned, the capture_output
argument is set to True
. In this case, to get stdout
, we have to use:
>>> process = subprocess.run(['ls', '-l', '-a'], capture_output=True)
>>> process.stdout
b'total 12\ndrwxr-xr-x 1 cnc cnc 4096 Apr 27 16:21 .\ndrwxr-xr-x 1 root 4096 Apr 27 15:40 ..\n-rw------- 1 cnc cnc 2445 May 6 17:43 .bash_history\n-rw-r--r-- 1 cnc cnc 220 Apr 27 15:40 .bash_logout...
By default, stdout
and stderr
are byte sequences. If we want them to be stored as strings, we must set the text
argument of the run
function to True
.
Managing process failures
The command that we ran in the previous examples was executed without errors. However, all cases must be taken into account when writing the program. So, what happens if the resulting process fails? Nothing “special” happens by default. Let us look at an example: we run the ls
command again, trying to list the contents of the /root directory, which is not readable by normal users:
>>> process = subprocess.run(['ls', '-l', '-a', '/root'])
We can know if the running process did not end with an error by checking its return code, which is stored in the returncode
property of the CompletedProcess
object:
>>> process.returncode
2
See? In this case, the returncode
is 2
, confirming that the process encountered an error related to insufficient access rights and was not successfully completed. We could check the process output so that an exception is raised when a failure occurs. Use the check
argument of the run
function: if it is set to True
, a CalledProcessError
exception occurs when the external process completes with an error:
>>> process = subprocess.run(['ls', '-l', '-a', '/root'])
>>> ls: cannot open directory '/root': Permission denied
Exception handling in Python is pretty simple. So to manage process failures we could write something like:
>>> try:
... process = subprocess.run(['ls', '-l', '-a', '/root'], check=True)
... except subprocess.CalledProcessError as e:
... print(f "Command Error {e.cmd}!")
...
ls: cannot open directory '/root': Permission denied
['ls', '-l', '-a', '/root'] failed!
>>>
CalledProcessError
exception, as we said before, occurs when the process return code is not 0
. This object has properties like returncode
, cmd
, stdout
, stderr
; what they represent is pretty obvious. For example, in the above example we just used the cmd
property to represent the sequence which was used to run the command when the exception occurred.
Executing a process in a shell
Processes started with the run
function are executed “directly”, this means that no shell is used to start them: therefore no environment variables are available to the process and no expression expansions or substitutions are performed. Let’s look at an example, which includes the use of the $HOME
variable:
>>> process = subprocess.run(['ls', '-al', '$HOME'])
ls: cannot access '$HOME': No such file or directory
As you can see, the $HOME
variable has not been replaced by an appropriate value. This way of running processes is recommended because it avoids potential security risks. However, in some cases where we need to call a shell as an intermediate process, it is sufficient to set the shell
parameter of the run function to True
. In such cases, it is advisable to specify the command and its arguments as a string:
>>> process = subprocess.run('ls -al $HOME', shell=True)
total 12
drwxr-xr-x 1 cnc cnc 4096 Apr 27 16:21 .
drwxr-xr-x 1 root root 4096 Apr 27 15:40 .
-rw------- 1 cnc cnc 2445 May 6 17:43 .bash_history
-rw-r--r-- 1 cnc cnc 220 Apr 27 15:40 .bash_logout
...
All variables that exist in the user environment can be used when calling the shell as an intermediate process. While this may seem convenient, this approach is a source of problems. Especially when dealing with potentially dangerous input, which can lead to the introduction of malicious shell code. Running the process with shell=True
is therefore not recommended and should only be used in safe cases.
Limiting the running time of the process
Generally, we do not want incorrectly running processes to execute themselves endlessly on our system once they have been started. If we use the timeout
parameter of the run
function, we can specify the amount of time, in seconds, within which a process must complete. If it does not complete within this time, the process will be stopped by the SIGKILL signal. Which we know can not be intercepted. Let us demonstrate this by starting a long process and giving the timeout
in seconds:
>>> process = subprocess.run(['ping', 'google.com'], timeout=5)
PING google.com (216.58.208.206) 56(84) bytes of data.
64 bytes from par10s21-in-f206.1e100.net (216.58.208.206): icmp_seq=1 ttl=118 time=15.8 ms
64 bytes from par10s21-in-f206.1e100.net (216.58.208.206): icmp_seq=2 ttl=118 time=15.7 ms
64 bytes from par10s21-in-f206.1e100.net (216.58.208.206): icmp_seq=3 ttl=118 time=19.3 ms
64 bytes from par10s21-in-f206.1e100.net (216.58.208.206): icmp_seq=4 ttl=118 time=15.6 ms
64 bytes from par10s21-in-f206.1e100.net (216.58.208.206): icmp_seq=5 ttl=118 time=17.0 ms
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.8/subprocess.py", line 495, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "/usr/lib/python3.8/subprocess.py", line 1028, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "/usr/lib/python3.8/subprocess.py", line 1894, in _communicate
self.wait(timeout=self._remaining_time(endtime))
File "/usr/lib/python3.8/subprocess.py", line 1083, in wait
return self._wait(timeout=timeout)
File "/usr/lib/python3.8/subprocess.py", line 1798, in _wait
raise TimeoutExpired(self.args, timeout)
subprocess.TimeoutExpired: Command '['ping', 'google.com']' timed out after 4.999637200000052 seconds
In the above example we ran the ping
command without specifying a fixed number of ECHO REQUEST packets, so it could potentially run forever. We also set the timeout to 5 seconds with the timeout
parameter. As we can see, ping was started and after 5 seconds a TimeoutExpired
exception occurred and the process was stopped.
The call, check_output and check_call functions
As we said before, the run function is the recommended way to start an external process. It should be used in most cases. Before it was introduced in Python 3.5, the three main high-level API functions used to create processes were call
, check_output
and check_call
; let’s take a brief look at them. First of all, the call
function: it is used to execute a command described by the args parameter; it waits for the command to complete; its result is the corresponding return code. This roughly corresponds to the basic use of the run function. The behavior of the check_call
function is almost the same as run
when the check
parameter is set to True
: it runs the specified command and waits for its completion. If the return code is not 0
, a CalledProcessError
exception occurs. Finally, the function check_output
. It works similarly to check_call
, but it returns the output of the running program, which means it is not displayed when the function is executed.
Working at a lower level with Popen class
So far we’ve studied the high-level API functions in the subprocess module, especially run
. They all use the Popen
class under the hood. Because of this, the vast majority of the time we don’t need to interact with it directly. However, when you need more flexibility, you can’t do without creating Popen
objects. Suppose, for example, that we want to connect two processes by recreating the behavior of a shell pipe. As we know when we send two commands to the shell, the standard output of the one on the left of the pipe “|” is used as the standard input of the one on the right. In the example below, the result of two pipelined commands is stored in a variable:
$ output="$(dmesg | grep sda)"
To recreate this behavior with the subprocess module without setting the shell
parameter to True
, as we saw earlier, we must directly use the Popen
class:
dmesg = subprocess.Popen(['dmesg'], stdout=subprocess.PIPE)
grep = subprocess.Popen(['grep', 'sda'], stdin=dmesg.stdout)
dmesg.stdout.close()
output = grep.comunicate()[0]
Considering this example, you should keep in mind that a process started using the Popen
class does not block the execution of the script. The first thing we did in the above code snippet was to create a Popen
object representing the process dmesg
. We set the stdout
of this process to subprocess.PIPE
. This value indicates that the PIPE to the specified thread should be open. Then we created another instance of the Popen
class for the grep
process. In the Popen
constructor we of course specified the command and its arguments, but here is the important thing, we set the standard output of the process dmesg
as the standard input for grep
(stdin=dmesg.stdout
) to recreate the behavior of the shell pipeline. After creating a Popen
object for the grep
command we closed the stdout
stream of the dmesg
process using the close()
method. This, as stated in the documentation, is necessary so that the first process can get the SIGPIPE signal. This is because normally when two processes are pipelined together, if the one on the right of “|”(grep
in our example) finishes before the one on the left(dmesg
), the latter gets the SIGPIPE signal (the pipeline is closed) and by default finishes its job too. There is, however, a problem in Python when replicating the pipelines between two commands. stdout
of the first process opens in the parent script as well as in the standard input of the other process. Thus, even if the grep
process terminates, the stdout stays open in the calling process (our script), so dmesg
will never get the SIGPIPE signal. This is why we need to close the stdout
stream of the first process in our main script after the second one runs. The last thing we did was to call the communicate()
method of the grep
object. This method can be used to optionally pass data to the stdin
of the process. It waits for the process to complete and returns a tuple. Where the first element is stdout
(to which the output
variable refers) and the second element is the process stderr
.
Conclusion
In this tutorial we have seen the recommended way to create external processes in Python using the subprocess module and the run
function. Using this function should be sufficient for most cases. However, when a higher level of flexibility is needed, you should use the Popen class directly. As always, we suggest you take a look at subprocess documentation
, to get complete information about the functions and classes available in this module.