An Ansible filter is a pure function

2020-06-28

I recently had a two-part realization.

  • An Ansible filter is a pure function
  • An Ansible action is a function with side effects

I think this is an intentional part of the Ansible design, but I haven’t seen it stated outright before.

The first part of the realization will change the way I use Ansible when I am working with complicated roles. I have found myself more than once wishing for a way to write Python code to handle data manipulation in an Ansible role. Writing an Ansible filter is the best way to accomplish that.

Technically, you can do non-pure things in a filter – you can write any Python you want in a filter – but they’re clearly intended for data manipulation. Pure function stuff. Initial data goes in, code executes on data, modified data goes out.

The second part of the realization follows from the first. What if you want to write code that is not a pure function, but a function with side effects? That’s what an action plugin is for. I don’t have any action plugins to share in this post as all the ones I’ve written are owned by work, but I bring them up because both types of plugins are best understood together.

Filter plugins instead of inline Python code

For whatever reason, the solution I thought I wanted was a way to write Python code inline, within an Ansible role itself. (Which, of course, is not possible in Ansible.)1 I think this was sort of a “faster horses” conceptualization, keeping me from realizing that filters were the intended solution to this.

I have, unfortunately, modified data structures before using just set_fact tasks and the various built-in filters. It was a horrible hack and I was not pleased with the result. I had a dict I was using to configure a complex set of tasks, and I needed to update the dict part of the way through those tasks. I did it in YAML, and the result was so unreadable that I left an apology in a comment. It works to this day, reliably, and we use it for some very critical functionality at work. However, it’s extremely difficult to read and understand.

To my past self and anyone else in the same boat, if you ever find yourself writing insane-looking tasks to manipulate a data structure, if you ever wish you could just write some Python inside of an Ansible role, stop what you’re doing and write a filter instead.

Using Ansible filters

Ansible ships with a long list of built-in filters.

They can be simple to use. For example, the random filter returns a random item from a list:

- set_fact:
    rand_item: "{{ ['a','b','c'] | random }}"

They can be very powerful. The ipaddr filter can be used to extract lots of information from a string that contains an IP address, including its subnet, whether it’s IPv4 or IPv6, the address in CIDR notation, and more.

Writing Ansible filter plugins

Implementing a custom filter is very easy. A filter plugin is a Python source file with a small amount of Ansible boilerplate which lives in your Ansible repo under filter_plugins. That’s it.

Example filter: dictlist_combine_uniqkey

I wrote a filter to create a new list from two input lists where each element in the list is a dict. Use one element in each dict as the identifier. If a dict with the same identifier exists in both lists, the dict in the second list replaces the dict in the first. If a dict has an identifier that is only in one list, it is copied to the output list without modification.

The code is here. You can also see code for another filter, to apply a list of default values to a list of dicts, which I wrote but turned out not to need. It’s pretty simple:

class FilterModule(object):

    def filters(self):
        return {
            'dictlist_combine_uniqkey': self.dictlist_combine_uniqkey,
        }

    def dictlist_combine_uniqkey(self, lista, listb, key):
        result = {}
        lista = lista if isinstance(lista, list) else [lista]
        listb = listb if isinstance(listb, list) else [listb]
        listb_uniqkeys = [bitem[key] for bitem in listb]
        result = [aitem for aitem in lista if aitem[key] not in listb_uniqkeys]
        result += listb
        return result

You would use it in a task like this:

- name: dictlist_combine_uniqkey example
  vars:
    list1:
    - src: /tmp/felix
      dest: /etc/felix
    - src: /var/tmp/francis
      dest: /etc/francis
    list2:
    - src: /home/clou/felix
      dest: /etc/felix
    - src: /home/clou/billiam
      dest: /etc/billiam
  set_fact:
    result_list: "{{ list1 | dictlist_combine_uniqkey(list2, key) }}"

# Result:
result_list:
- src: /home/clou/felix
  dest: /etc/felix
- src: /var/tmp/francis
  dest: /etc/francis
- src: /home/clou/billiam
  dest: /etc/billiam

My use case: configuring Syncthing

I wanted to configure Syncthing using its HTTP API. The configuration API is very simple - there is one endpoint which can be used to GET the existing whole configuration, or POST a whole new configuration. I would have to retrieve the config, make changes to the config JSON document, and submit the whole document back to the server.

The simple case was updating a string, number, or boolean property in the JSON document. What made me wish for inline Python was updating a property that contained a list of other devices to connect to, which had a unique device ID. I wanted to update specific devices in that list, based on their device ID, without replacing the entire list.

The filter I wrote above solved that problem for me. It’s now working beautifully. You can see my Syncthing configuration role on GitHub.

Example filter: nbuuid

I wrote a namespace UUID filter plugin. It lets you define a namespace for your UUIDs, and will always return the same UUID for the same name in the same namespace. This is really useful if you need UUIDs to be idempotent.

You can find the code buried in my deprecated personal fork of the Algo VPN on GitHub. (I guess I should move this somewhere else at some point.) The code is again very simple:

import uuid

class FilterModule(object):
    """Name-based UUID filter
    """

    @staticmethod
    def nbuuid(name, namespace):
        """Generate a name-based UUID
        Per RFC4122, a name-based UUID is generated from a 'name' (any input) and a 'namespace'
        (itself a UUID which uniquely represents the namespace).
        Given the same name and namespace, will always return the same value.
        name        Any input
        namespace   A UUID representing the namespace
                    Must be either a valid uuid.UUID object
                    or a string that can be passed to the uuid.UUID constructor
        """
        if not isinstance(namespace, uuid.UUID):
            namespace = uuid.UUID(namespace)
        return str(uuid.uuid5(namespace, str(name)))

    @staticmethod
    def url_nbuuid(name):
        """Generate a name-based UUID in the URL namespace
        """
        return str(uuid.uuid5(uuid.NAMESPACE_URL, name))

    @staticmethod
    def dns_nbuuid(name):
        """Generate a name-based UUID in the DNS namespace
        """
        return str(uuid.uuid5(uuid.NAMESPACE_DNS, name))

    def filters(self):
        return {
            'nbuuid': self.nbuuid,
            'url_nbuuid': self.url_nbuuid,
            'dns_nbuuid': self.dns_nbuuid,
        }

You can use it in a task like this:

- name: Get a UUID for my domain
  vars:
    domain: example.com
  set_fact:
    domain_uuid: "{{ domain | dns_nbuuid }}"
- name: Get UUIDs for some specific things
  set_fact:
    domain_uuid_thing1: "{{ 'thing1' | nbuuid(domain_uuid) }}"
    domain_uuid_thing2: "{{ 'thing2' | nbuuid(domain_uuid) }}"

It uses the Python standard library’s implementation of RFC 4122 name-based UUIDs. The RFC defines some existing namespaces, such as the DNS namespace and the URL namespace.

Each namespace is actually just a UUID, allowing for namespaces to be nested. So you can define a namespace for example.com by applying the UUID algorithm with example.com as the name parameter, and the DNS namespace as the namespace parameter. The result is a new UUID.

Since namespaces are just UUIDs, example.com now has its own namespace, and can generate new UUIDs for things specific to that domain.

My use case: idempotent UUIDs for mobileconfig documents

I wrote this for a (now deprecated) fork of the Algo VPN which I maintained for personal use.

Upstream Algo is intended to be rebuild periodically, incorporating disposable infrastructure to improve security posture. I see the value in this but I wanted something that was more set and forget, that I didn’t have to keep re-deploying to all my endpoint devices.

Those endpoints include iOS and macOS devices. Deploying a VPN to Apple devices like these uses a mobileconfig file. The mobileconfig file uses UUIDs.

In upstream Algo, since the whole infrastructure is disposable, the UUIDs are just regenerated every time. In my fork, I needed the UUIDs to be idempotent. I used this filter to define an idempotent namespace for my VPN’s hostname, and a namespace within that for each client, and a UUID within that of each type required by the mobileconfig.

With this implementation, the UUIDs were the same every time. It worked for me very well until I switched to a much simpler Wireguard VPN which I no longer use Algo to deploy.


  1. Yes, you could shell out to python -c '...real python code...'. You should write a plugin instead. ↩︎