CherryPy: Override the base URL behind a reverse proxy

2017-04-26

I wanted to write a CherryPy web application that could be mounted at any URL, even behind a reverse proxy.

That is, I wanted to be able to run the CherryPy app on a machine that’s on a private network, where its URL is something like http://internal-dns-name/. Then, I wanted to use a reverse HTTP proxy running Apache that has a public domain name at a URL like https://example.com/application/. Note that neither the protocol, nor the hostname, nor the path match between the two. When configuring this, it will serve web pages out of the box, but all the links that CherryPy generates are http://internal-dns-name/ links, whcih mean that CSS and images and JavaScript assets all fail to load.

The first things I read indicated that Apache can handle the host part (and possibly the protocol part?), by setting ProxyPreservehost On in the frontend Apache configuration. However, that would mean that I’d have to configure CherryPy to serve the same path (like /application/) as the reverse proxy server, which I didn’t want to do because it would tie the configuration for the frontend to the configuration for the backend.

The Apache documentation suggests using mod_proxy_html to sovle this problem. That module examines HTML coming from the backend service and munges links based on a regular expression, but I didn’t want to do this. It seemed like a fragile hack, and I was also worried about the security implications of an XML parser written in C. Lol.

However, I finally discovered that I could configure CherryPy to handle the link rewriting itself, as long as I passed in the URL that was coming in on the frontend somehow.

I found some TurboGears documentation referencing CherryPy that was doing HTTPS termination on the frontend server, and reverse proxying to the private CherryPy server over HTTP. It looked like this could be adapted to do what I wanted, except that it was for CherryPy 2.x, which is quite old. I had to port the same idea to use CherryPy 3.x concepts - and in CherryPy 3, “filters” are now “tools” (1, 2).

When I did that, I ended up with BaseUrlOverride. This does what I want, but has a few moving parts. (The below code has been modified from the link above to be more generic.)

  1. I have a class implementing the tool

    class BaseUrlOverride(cherrypy.Tool):
    
        def __init__(self):
            cherrypy.Tool.__init__(self, 'before_request_body', self.setbaseurl)
    
        def setbaseurl(self, baseurl=None):
            if baseurl:
                cherrypy.request.base = baseurl
    
  2. I check an environment variable - which are easy to pass to my application using my Docker tooling - for the proper base URL

    if "BASEURL_OVERRIDE" in os.environ.keys():
        configuration['baseurloverride'] = os.environ["BASEURL_OVERRIDE"]
    
    • It’s possible I could get Apache to pass a header with this information, but I haven’t looked into that
  3. I enable the tool in the CherryPy config

    cherrypy.tools.baseurloverride = util.BaseUrlOverride()
    cherrypy.tree.mount(server, '/', {
        '/': {
            'tools.baseurloverride.baseurl': configuration['baseurloverride'],
            'tools.baseurloverride.on': True},
    

Two things to note about this that I didn’t find spelled out anywhere

  1. Setting 'tools.baseurloverride.baseurl': something passes something as an argument named baseurl to the setbaseurl() method. If that argument name doesn’t exist in setbaseurl(), CherryPy will throw an exception.
  2. Setting 'tools.baseurloverride.on': True enables the tool site-wide. Without setting this, I’d need to use a decorator for each method I wanted to use the tool on, calling something like @cherrypy.tools.baseurloverride() before each method declaration in my CherryPy application class.