How to start external processes using Python and the subprocess module

by Alex
How to start external processes using Python and the subprocess module

In scripts written to automate certain tasks, we often want to run external programs and monitor their execution. When working with Python, we can use the subprocess module to create such scripts. This module is part of the standard language library. In this tutorial we will take a brief look at subprocess and learn the basics of its usage. After reading the article you will learn how to:

  • Use the run function to start an external process.
  • Get standard process output and error information.
  • Check the process return code and raise an exception in case of failure.
  • Run the process using the shell as an intermediary.
  • Set the time to wait for the process to complete.
  • Use the Popen class directly to create apipe between two processes.

Since the subprocess module is almost always used with Linux all the examples will be about Ubuntu. For Windows users I suggest to download Ubuntu 18.04 LTS terminal.

The “run” function

The run function was added to the subprocess module only in relatively recent versions of Python (3.5). Its use is now the recommended way to create processes and should solve the most common problems. First of all, let’s look at the simplest case of using the run function. Suppose we want to run the command ls -al; to do this we need to type the following instructions in the Python shell:


>>> import subprocess
>>> process = subprocess.run(['ls', '-l', '-a'])

The output of the external ls command is shown on the screen:

total 12
drwxr-xr-x 1 cnc cnc 4096 Apr 27 16:21 .
drwxr-xr-x 1 root root 4096 Apr 27 15:40 .
-rw------- 1 cnc cnc 2445 May 6 17:43 .bash_history
-rw-r--r-- 1 cnc cnc 220 Apr 27 15:40 .bash_logout
-rw-r--r-- 1 cnc cnc 3771 Apr 27 15:40 .bashrc

Here we just used the first mandatory argument of the run function, which can be a sequence “describing” the command and its arguments (as in the example), or a string to be used when running with the argument shell=True (we will consider the latter case later).

Capturing command output: stdout and stderr

What if we don’t want the process output to be displayed. We want instead to keep it: it can be referenced after the process has left? In this case we should set the capture_output function argument to True:


>>> process = subprocess.run(['ls', '-l', '-a'], capture_output=True)

How can we get the output(stdout and stderr) of a process afterwards? If you look at the above examples, you will see that we used the process variable to reference the CompletedProcess object returned by the run function. This object represents the process run by the function and has many useful properties. Among others, stdout and stderr are used to “store” the corresponding command descriptors if, as already mentioned, the capture_output argument is set to True. In this case, to get stdout, we have to use:

>>> process = subprocess.run(['ls', '-l', '-a'], capture_output=True)
>>> process.stdout
b'total 12\ndrwxr-xr-x 1 cnc cnc 4096 Apr 27 16:21 .\ndrwxr-xr-x 1 root 4096 Apr 27 15:40 ..\n-rw------- 1 cnc cnc 2445 May 6 17:43 .bash_history\n-rw-r--r-- 1 cnc cnc 220 Apr 27 15:40 .bash_logout...

By default, stdout and stderr are byte sequences. If we want them to be stored as strings, we must set the text argument of the run function to True.

Managing process failures

The command that we ran in the previous examples was executed without errors. However, all cases must be taken into account when writing the program. So, what happens if the resulting process fails? Nothing “special” happens by default. Let us look at an example: we run the ls command again, trying to list the contents of the /root directory, which is not readable by normal users:


>>> process = subprocess.run(['ls', '-l', '-a', '/root'])

We can know if the running process did not end with an error by checking its return code, which is stored in the returncode property of the CompletedProcess object:


>>> process.returncode
2

See? In this case, the returncode is 2, confirming that the process encountered an error related to insufficient access rights and was not successfully completed. We could check the process output so that an exception is raised when a failure occurs. Use the check argument of the run function: if it is set to True, a CalledProcessError exception occurs when the external process completes with an error:

>>> process = subprocess.run(['ls', '-l', '-a', '/root'])
>>> ls: cannot open directory '/root': Permission denied

Exception handling in Python is pretty simple. So to manage process failures we could write something like:

>>> try:
...     process = subprocess.run(['ls', '-l', '-a', '/root'], check=True)
... except subprocess.CalledProcessError as e:
...     print(f "Command Error {e.cmd}!")
...
ls: cannot open directory '/root': Permission denied
['ls', '-l', '-a', '/root'] failed!
>>>

CalledProcessError exception, as we said before, occurs when the process return code is not 0. This object has properties like returncode, cmd, stdout, stderr; what they represent is pretty obvious. For example, in the above example we just used the cmd property to represent the sequence which was used to run the command when the exception occurred.

Executing a process in a shell

Processes started with the run function are executed “directly”, this means that no shell is used to start them: therefore no environment variables are available to the process and no expression expansions or substitutions are performed. Let’s look at an example, which includes the use of the $HOME variable:

>>> process = subprocess.run(['ls', '-al', '$HOME'])
ls: cannot access '$HOME': No such file or directory

As you can see, the $HOME variable has not been replaced by an appropriate value. This way of running processes is recommended because it avoids potential security risks. However, in some cases where we need to call a shell as an intermediate process, it is sufficient to set the shell parameter of the run function to True. In such cases, it is advisable to specify the command and its arguments as a string:

>>> process = subprocess.run('ls -al $HOME', shell=True)
total 12
drwxr-xr-x 1 cnc cnc 4096 Apr 27 16:21 .
drwxr-xr-x 1 root root 4096 Apr 27 15:40 .
-rw------- 1 cnc cnc 2445 May 6 17:43 .bash_history
-rw-r--r-- 1 cnc cnc 220 Apr 27 15:40 .bash_logout
...

All variables that exist in the user environment can be used when calling the shell as an intermediate process. While this may seem convenient, this approach is a source of problems. Especially when dealing with potentially dangerous input, which can lead to the introduction of malicious shell code. Running the process with shell=True is therefore not recommended and should only be used in safe cases.

Limiting the running time of the process

Generally, we do not want incorrectly running processes to execute themselves endlessly on our system once they have been started. If we use the timeout parameter of the run function, we can specify the amount of time, in seconds, within which a process must complete. If it does not complete within this time, the process will be stopped by the SIGKILL signal. Which we know can not be intercepted. Let us demonstrate this by starting a long process and giving the timeout in seconds:

>>> process = subprocess.run(['ping', 'google.com'], timeout=5)
PING google.com (216.58.208.206) 56(84) bytes of data.
64 bytes from par10s21-in-f206.1e100.net (216.58.208.206): icmp_seq=1 ttl=118 time=15.8 ms
64 bytes from par10s21-in-f206.1e100.net (216.58.208.206): icmp_seq=2 ttl=118 time=15.7 ms
64 bytes from par10s21-in-f206.1e100.net (216.58.208.206): icmp_seq=3 ttl=118 time=19.3 ms
64 bytes from par10s21-in-f206.1e100.net (216.58.208.206): icmp_seq=4 ttl=118 time=15.6 ms
64 bytes from par10s21-in-f206.1e100.net (216.58.208.206): icmp_seq=5 ttl=118 time=17.0 ms
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.8/subprocess.py", line 495, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "/usr/lib/python3.8/subprocess.py", line 1028, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "/usr/lib/python3.8/subprocess.py", line 1894, in _communicate
self.wait(timeout=self._remaining_time(endtime))
File "/usr/lib/python3.8/subprocess.py", line 1083, in wait
return self._wait(timeout=timeout)
File "/usr/lib/python3.8/subprocess.py", line 1798, in _wait
raise TimeoutExpired(self.args, timeout)
subprocess.TimeoutExpired: Command '['ping', 'google.com']' timed out after 4.999637200000052 seconds

In the above example we ran the ping command without specifying a fixed number of ECHO REQUEST packets, so it could potentially run forever. We also set the timeout to 5 seconds with the timeout parameter. As we can see, ping was started and after 5 seconds a TimeoutExpired exception occurred and the process was stopped.

The call, check_output and check_call functions

As we said before, the run function is the recommended way to start an external process. It should be used in most cases. Before it was introduced in Python 3.5, the three main high-level API functions used to create processes were call, check_output and check_call; let’s take a brief look at them. First of all, the call function: it is used to execute a command described by the args parameter; it waits for the command to complete; its result is the corresponding return code. This roughly corresponds to the basic use of the run function. The behavior of the check_call function is almost the same as run when the check parameter is set to True: it runs the specified command and waits for its completion. If the return code is not 0, a CalledProcessError exception occurs. Finally, the function check_output. It works similarly to check_call, but it returns the output of the running program, which means it is not displayed when the function is executed.

Working at a lower level with Popen class

So far we’ve studied the high-level API functions in the subprocess module, especially run. They all use the Popen class under the hood. Because of this, the vast majority of the time we don’t need to interact with it directly. However, when you need more flexibility, you can’t do without creating Popen objects. Suppose, for example, that we want to connect two processes by recreating the behavior of a shell pipe. As we know when we send two commands to the shell, the standard output of the one on the left of the pipe “|” is used as the standard input of the one on the right. In the example below, the result of two pipelined commands is stored in a variable:

$ output="$(dmesg | grep sda)"

To recreate this behavior with the subprocess module without setting the shell parameter to True, as we saw earlier, we must directly use the Popen class:


dmesg = subprocess.Popen(['dmesg'], stdout=subprocess.PIPE)
grep = subprocess.Popen(['grep', 'sda'], stdin=dmesg.stdout)
dmesg.stdout.close()
output = grep.comunicate()[0]

Considering this example, you should keep in mind that a process started using the Popen class does not block the execution of the script. The first thing we did in the above code snippet was to create a Popen object representing the process dmesg. We set the stdout of this process to subprocess.PIPE. This value indicates that the PIPE to the specified thread should be open. Then we created another instance of the Popen class for the grep process. In the Popen constructor we of course specified the command and its arguments, but here is the important thing, we set the standard output of the process dmesg as the standard input for grep(stdin=dmesg.stdout) to recreate the behavior of the shell pipeline. After creating a Popen object for the grep command we closed the stdout stream of the dmesg process using the close() method. This, as stated in the documentation, is necessary so that the first process can get the SIGPIPE signal. This is because normally when two processes are pipelined together, if the one on the right of “|”(grep in our example) finishes before the one on the left(dmesg), the latter gets the SIGPIPE signal (the pipeline is closed) and by default finishes its job too. There is, however, a problem in Python when replicating the pipelines between two commands. stdout of the first process opens in the parent script as well as in the standard input of the other process. Thus, even if the grep process terminates, the stdout stays open in the calling process (our script), so dmesg will never get the SIGPIPE signal. This is why we need to close the stdout stream of the first process in our main script after the second one runs. The last thing we did was to call the communicate() method of the grep object. This method can be used to optionally pass data to the stdin of the process. It waits for the process to complete and returns a tuple. Where the first element is stdout (to which the output variable refers) and the second element is the process stderr.

Conclusion

In this tutorial we have seen the recommended way to create external processes in Python using the subprocess module and the run function. Using this function should be sufficient for most cases. However, when a higher level of flexibility is needed, you should use the Popen class directly. As always, we suggest you take a look at subprocess documentation, to get complete information about the functions and classes available in this module.

Related Posts

LEAVE A COMMENT