How to Use Docker to run Workflows in Digdag

There is a good chance that you want to allow multiple developers to run their dags/tasks on your Digdag server. It's painful to maintain the packages/dependencies they need to run their tasks. It specially becomes difficult if they conflict with each other. One way to solve it, is by having some kind of independent virtual environments for each task to run. Even then you need to manage those on the server.

How to Use Docker to run Workflows in Digdag

How to Use Docker to run Workflows in Digdag

Another important thing is to isolate the underlying Digdag server environment from the tasks (workflows). This is important from the security and stability point of view.

The best way to achieve this is to run the tasks/workflows inside a container. Digdag supports Docker containers. In the below how to we will see an example. Start with adding the docker option inside your .dig file. This will force the Digdag to use the mentioned image to run your workflows.

    timezone: UTC
    _export:
      docker:
        image: docker_alpine_python:latest
        pull_always: false
        
    +setup:
      echo>: start ${session_time}
    
    +status_check:
      py>: status_check.run
            
    +teardown:
      echo>: finish ${session_time}

This assumes that you have installed the Docker on the server (or local if you are testing locally) and docker daemon is running. It also assumes that image your are using to run the workflows is available for Digdag (It can also pull from any available Docker container registry). We won't get into installing Docker but you can find about the available images by running

    docker images

It should list the image you are planning to use. As you can see in the .dig file I am using **docker_alpine_python** image. Its a custom image built on the top of the base **alpine** image by adding python and some python packages that are required for my workflow to run. As you can see the workflow code is not part of the image. Just the run-time and libraries required to run. Check my Dockerfile below. Now build the image Dockerfile using

    FROM alpine:latest
    
    RUN apk add --update \
        python \
        python-dev \
        py-pip \
        build-base \
      && pip install plumbum requests\
      && rm -rf /var/cache/apk/*
    docker build -t docker_alpine_python .

My workflow is called status_check which runs a python script. The python script just prints and webhooks the OS parameters that it is running under. So regardless of your server OS. It should print your Docker OS details. This is just to assure you that the workflow code is running inside the Docker container.


    import sys
    import json
    import digdag
    from plumbum import local
    import requests
    
    def run():
      cat = local["cat"]
      os = cat("/etc/os-release")
      os_properties = {}
      for part in os.split("\n"):
        key_value = part.split("=")
        if len(key_value) > 1:
          os_properties[key_value[0]] = key_value[1]
        
      #current date
      date = local["date"]
      os_properties["os_date"] = date()
    
      #print str(version)
      pip = local["pip"]
      #print str(pip("list","--format","json"))
      packages = json.loads(pip("list","--format","json"))
      
      version = str(sys.version_info[0])+"."+str(sys.version_info[1])
      
      #session time
      session_time = digdag.env.params["session_time"]
      
      payload = {"os":os_properties,"digdag" : {"session_time":session_time}, "python":{"packages":packages, "version": version}}
      print str(payload)
      x = requests.post('http://webhook.site/2856291a-acd7-4669-8223-2ff349186668', json=payload)

To run locally

    $digdag run docker_digdag_test.dig --rerun

Now you can push the same dag to Digdag server and run it there as long as the server has the the Docker image or access to image (through for example dockerhub.com).

If your script is simple and doesn't depend on any external library you can use one of the pre-built images like alpine. This will remove the step of building and maintaining the Docker image.

All code is on github for you to explore. Hope you found this interesting and useful.