For years, when using Python, I’ve wanted the ability to save a subprocess’s stdout/stderr for later processing while also having it print to the terminal in real time.
I have finally figured out how to do it.
My solution
Expand the full run()
function definition
import io
import select
import subprocess
import sys
logger = logging.getLogger()
def run(cmd: str | list, print_output=True, log_output=False, check=True, *args, **kwargs) -> subprocess.Popen:
"""Run a command, with superpowers
* cmd: The command to run. If a string, it will be passed to a shell.
* print_output: Print the command's stdout/stderr in to the controlling terminal's stdout/stderr in real time.
stdout/stderr is always captured and returned, whether this is True or False (superpowers).
The .stdout and .stderr properties are always strings, not bytes
(which is required because we must use universal_newlines=True).
* <https://gist.github.com/nawatts/e2cdca610463200c12eac2a14efc0bfb>
* <https://stackoverflow.com/questions/4417546/constantly-print-subprocess-output-while-process-is-running>
* log_output: Log the command's stdout/stderr in a single log message (each) after the command completes.
* check: Raise an exception if the command returns a non-zero exit code.
Unlike subprocess.run, this is True by default.
* *args, **kwargs: Passed to subprocess.Popen
Do not pass the following arguments, as they are used internally:
* shell: Determined automatically based on the type of cmd
* stdout: Always subprocess.PIPE
* stderr: Always subprocess.PIPE
* universal_newlines: Always True
* bufsize: Always 1
The Popen object is always returned.
"""
shell = isinstance(cmd, str)
logger.debug(f"Running command: {cmd}")
process = subprocess.Popen(
cmd,
shell=shell,
bufsize=1, # Output is line buffered, required to print output in real time
universal_newlines=True, # Required for line buffering
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
*args,
**kwargs,
)
stdoutbuf = io.StringIO()
stderrbuf = io.StringIO()
stdout_fileno = process.stdout.fileno()
stderr_fileno = process.stderr.fileno()
# This returns None until the process terminates
while process.poll() is None:
# select() waits until there is data to read (or an "exceptional case") on any of the streams
readready, writeready, exceptionready = select.select(
[process.stdout, process.stderr],
[],
[process.stdout, process.stderr],
0.5,
)
# Check if what is ready is a stream, and if so, which stream.
# Copy the stream to the buffer so we can use it,
# and print it to stdout/stderr in real time if print_output is True.
for stream in readready:
if stream.fileno() == stdout_fileno:
line = process.stdout.readline()
stdoutbuf.write(line)
if print_output:
sys.stdout.write(line)
elif stream.fileno() == stderr_fileno:
line = process.stderr.readline()
stderrbuf.write(line)
if print_output:
sys.stderr.write(line)
else:
raise Exception(f"Unknown file descriptor in select result. Fileno: {stream.fileno()}")
# If what is ready is an exceptional situation, blow up I guess;
# I haven't encountered this and this should probably do something more sophisticated.
for stream in exceptionready:
if stream.fileno() == stdout_fileno:
raise Exception("Exception on stdout")
elif stream.fileno() == stderr_fileno:
raise Exception("Exception on stderr")
else:
raise Exception(f"Unknown exception in select result. Fileno: {stream.fileno()}")
# We'd like to just seek(0) on the stdout/stderr buffers, but "underlying stream is not seekable",
# So we create new buffers above, write to them line by line, and replace the old ones with these.
process.stdout.close()
stdoutbuf.seek(0)
process.stdout = stdoutbuf
process.stderr.close()
stderrbuf.seek(0)
process.stderr = stderrbuf
# Note that .getvalue() is not (alaways?) available on normal Popen stdout/stderr,
# but it is available on our StringIO objects.
# .getvalue() doesn't change the seek position.
logger.info(f"Command completed with return code {process.returncode}: {cmd}")
if log_output:
print(f"stdout: {process.stdout.getvalue()}")
print(f"stderr: {process.stderr.getvalue()}")
if check and process.returncode != 0:
raise Exception(f"Command failed with exit code {process.returncode}: {cmd}")
return process
This requires a recent Python 3 – I’ve tested with 3.10.
Imperfections
I haven’t tested this on Windows at all, and I expect the select()
stuff will need changes on that platform.
I am not sure, but I think my solution might not clean up its stdout/stderr file descriptors properly.
I notice issues when running this in a program that I’m executing over ssh,
like ssh server /path/to/script-containing-this-code.py
.
When I do that, the ssh
command hangs indefinitely even after the Python script has terminated on the remote host.
To work around this, I end up doing the equivalent of:
ssh -tt server python -u /path/to/script-containing-this-code.py </dev/null
.
That is:
- Use
-tt
to force sshd to create a terminal - Use
python -u
to print to stdout/stderr without buffering - Redirect stdin to
/dev/null
.
Using -tt
fixes the hanging problem, and may be good enough for other use cases.
However, when running the ssh
client inside a local Python session1,
I also need to disable output buffering and redirect stdin,
or else the output from the script running over ssh is garbled.
It’s actually still not a perfect rendition of what the output should be –
in particular, commands that print a string without a newline always end up with a newline appended –
but it’s good enough for my use case.
Note that there are probably performance implications for this too. I wouldn’t recommend using this to run code that dumps thousands of lines to stdout.
Prior art
- nawatts’
capture-and-print-subprocess-output.py
was the original inspiration for this. It’s simpler if you don’t need to keep stdout and stderr separate, as I do. - tokland’s Stack Overflow answer is even simpler, and it is said to work on Windows.
My code, in context
I want this for a system configuration package I’m writing. Here are links from the commit that I implemented it – these have the advantage of working in the future if I change my repo layout, but the disadvantage of lacking any future fixes or enhancements.
-
That is, I am running a Python script, which runs
ssh
to connect to a server and runs another Python script. ↩︎