This post is Part 2 of the “Offensive Infrastructure with Modern Technologies” series, and is going to focus on an automated deployment of the HashiCorp stack (i.e., the HashiStack).

Part 1 explained how to configure Consul in both single and multi node deployments using docker-compose, while here I’m going to provide a step-by-step walkthrough that will allow you to automatically deploy the full stack with Ansible.


High Level Design

Before jumping to the core of this post I want to quickly introduce the final setup, and, in particular, both the physical and logical deployments of the different components of the HashiStack.

Multi Node (Physical) Deployment

For this post we are going to use two nodes: one playing the role of the “master” (node-1) which will run the the server counterparts of the different components, and the other (node-2) playing the role of the “client” on which we will run our applications.

The Multi Node Setup
The Multi Node Setup.

We are going to use Vagrant to quickly spin up these nodes. But more on this below.

Logical Deployment of the HashiStack

You have probably noticed I’ve been mentioning the term “HashiStack”. But what does actually compose this stack? Well, for this setup we are going to deploy the following components:

Component Use Case
Consul Service Discovery.
Vault Secrets Management.
Nomad Applications Management.
Docker Applications Containerization.
Dnsmasq Internal Hostnames Resolution.
Traefik Reverse Proxy.

[If you need a refresher of all the different components, please refer to Part 1].

We will also need to ensure that these components are installed in the correct order, as they have interdependencies between each other. I have to thanks the team at Mockingbird Consulting for tackling this problem and producing a graph (here modified by myself to adapt it to our needs) showing an ordered deployment:

The Deployment Order.
The Deployment Order.

We are starting by provisioning the two hosts, followed by docker and consul, so to provide the service discovery on top of which all other components are going to interact with. Once Consul is installed, we are going to deploy dnsmasq and vault. With Vault setup properly, we will add nomad. Finally, we are going to add traefik to simplify access to the applications.

As mentioned previously, we are going to use node-1 as a server and node-2 as a client, so this is how we are going to distribute the different components:

Logical Deployment of the HashiStack
Logical Deployment of the HashiStack.

Environment Setup

Before jumping to the core of this post, we will have to configure a couple of requirements, namely Vagrant and Ansible.

Code Structure

To provide some reference (in case you’ll get lost in the walkthrough), the one below is the tree of the final folder structure (yes, we will reconstruct it file by file):

host:~/hashistack ❯ tree .
.
├── ansible
│   ├── ansible.cfg
│   ├── Dockerfile
│   ├── requirements.yml
│   └── roles
│       ├── brianshumate.consul
│       ├── brianshumate.nomad
│       ├── brianshumate.vault
│       ├── geerlingguy.docker
│       └── kibatic.traefik
├── playbooks
│   ├── apps
│   │   ├── redis.nomad
│   │   ├── simpleserver.nomad
│   │   └── traefik
│   │       └── traefik.toml
│   ├── apps.yml
│   ├── hashistack.yml
│   ├── inventory
│   │   ├── group_vars
│   │   │   └── docker_instances
│   │   └── hosts
│   └── restart.yml
└── Vagrantfile

Vagrant Setup

We are using Vagrant in order to have an easy-to-use development environment. This will also allow you to follow along, in case you want to replicate this setup locally on your laptop.

Let’s analyze the Vagrantfile:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
host:~/hashistack  cat Vagrantfile
# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure("2") do |config|
    (1..2).each do |i|
        config.vm.define "node-#{i}" do |node|
            node.vm.box = "ubuntu/trusty64"
            node.vm.hostname = "node-#{i}"
            # Bridged network
            node.vm.network "public_network", bridge: "en0: Wi-Fi (AirPort)", ip:"192.168.0.11#{i}"
            # Provider-specific configuration
            node.vm.provider "virtualbox" do |vb|
                # Customize the amount of memory on the VM
                vb.memory = "2048"
                # Specify machine name
                vb.name = "hashistack_node-#{i}"
            end
        end
    end
end
  • Line 6: we are using a loop to create 2 nodes (node-1 and node-2) with similar characteristics.
  • Line 8: both are based on Ubuntu Server 14.04 (ubuntu/trusty64). I concur that this is not super recent, but Vagrant doesn’t play nice with most recent versions of Ubuntu Server. For local development Trusty will do, but in the following post we will switch to the most recent Ubuntu LTS.
  • Line 9: as hostname, we are going to use the machine name (node-1, node-2).
  • Line 11: we are assigning a bridged network and static IP address (192.168.0.111 and 192.168.0.112, respectively). If you want to use this Vagrantfile, remember to change these IP addresses accordingly with your own local network.
  • By default, Vagrant will share the current directory to the /vagrant folder on the guest. If you prefer to mount it to a different location, you can manually share a folder with the guest by adding this line: node.vm.synced_folder ".", "/custom/path/hashistack/"

Let’s run Vagrant with this config file:

host:~/hashistack ❯ vagrant up
Bringing machine 'node-1' up with 'virtualbox' provider...
Bringing machine 'node-2' up with 'virtualbox' provider...
==> node-1: Importing base box 'ubuntu/trusty64'...
==> node-1: Matching MAC address for NAT networking...
==> node-1: Checking if box 'ubuntu/trusty64' is up to date...
==> node-1: Setting the name of the VM: hashistack_node-1
==> node-1: Clearing any previously set forwarded ports...
==> node-1: Clearing any previously set network interfaces...
==> node-1: Preparing network interfaces based on configuration...
    node-1: Adapter 1: nat
    node-1: Adapter 2: bridged
==> node-1: Forwarding ports...
    node-1: 22 (guest) => 2222 (host) (adapter 1)
==> node-1: Running 'pre-boot' VM customizations...
==> node-1: Booting VM...
==> node-1: Waiting for machine to boot. This may take a few minutes...
    node-1: SSH address: 127.0.0.1:2222
    node-1: SSH username: vagrant
    node-1: SSH auth method: private key
    node-1:
    node-1: Vagrant insecure key detected. Vagrant will automatically replace
    node-1: this with a newly generated keypair for better security.
    node-1:
    node-1: Inserting generated public key within guest...
    node-1: Removing insecure key from the guest if it is present...
    node-1: Key inserted! Disconnecting and reconnecting using new SSH key...
==> node-1: Machine booted and ready!
==> node-1: Checking for guest additions in VM...
==> node-1: Setting hostname...
==> node-1: Configuring and enabling network interfaces...
==> node-1: Mounting shared folders...
    node-1: /vagrant => /Users/.../hashistack
==> node-2: Importing base box 'ubuntu/trusty64'...
==> node-2: Matching MAC address for NAT networking...
==> node-2: Checking if box 'ubuntu/trusty64' is up to date...
==> node-2: Setting the name of the VM: hashistack_node-2
==> node-2: Clearing any previously set forwarded ports...
==> node-2: Fixed port collision for 22 => 2222. Now on port 2201.
==> node-2: Clearing any previously set network interfaces...
==> node-2: Preparing network interfaces based on configuration...
    node-2: Adapter 1: nat
    node-2: Adapter 2: bridged
==> node-2: Forwarding ports...
    node-2: 22 (guest) => 2201 (host) (adapter 1)
==> node-2: Running 'pre-boot' VM customizations...
==> node-2: Booting VM...
==> node-2: Waiting for machine to boot. This may take a few minutes...
    node-2: SSH address: 127.0.0.1:2201
    node-2: SSH username: vagrant
    node-2: SSH auth method: private key
    node-2:
    node-2: Vagrant insecure key detected. Vagrant will automatically replace
    node-2: this with a newly generated keypair for better security.
    node-2:
    node-2: Inserting generated public key within guest...
    node-2: Removing insecure key from the guest if it is present...
    node-2: Key inserted! Disconnecting and reconnecting using new SSH key...
==> node-2: Machine booted and ready!
==> node-2: Checking for guest additions in VM...
==> node-2: Setting hostname...
==> node-2: Configuring and enabling network interfaces...
==> node-2: Mounting shared folders...
    node-2: /vagrant => /Users/.../hashistack

Now you can access your newly provisioned VMs straight from Vagrant itself:

host:~/hashistack ❯ vagrant ssh node-1
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-164-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

 System information disabled due to load higher than 1.0

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

0 packages can be updated.
0 updates are security updates.

New release '16.04.5 LTS' available.
Run 'do-release-upgrade' to upgrade to it.


[email protected]:~$ ifconfig
eth0      Link encap:Ethernet
          inet addr:10.0.2.15  Bcast:10.0.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1148 errors:0 dropped:0 overruns:0 frame:0
          TX packets:870 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:127518 (127.5 KB)  TX bytes:111188 (111.1 KB)

eth1      Link encap:Ethernet
          inet addr:192.168.0.111  Bcast:192.168.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:253 errors:0 dropped:50 overruns:0 frame:0
          TX packets:112 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:20003 (20.0 KB)  TX bytes:11371 (11.3 KB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:16 errors:0 dropped:0 overruns:0 frame:0
          TX packets:16 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1368 (1.3 KB)  TX bytes:1368 (1.3 KB)

Once both virtual machines are created, we will have our multi node setup up and running, as outlined in the “Multi Node (Physical) Deployment” section:

Multi Node Setup
Multi Node Setup.

In a future post I will show how to replace Vagrant and automate the deployment on a proper infrastructure (bare metal or cloud) using Terraform. For the time being, I’ve found this cheatsheet useful to remember all commands available from Vagrant.

Ansible Setup

With our hosts up and running, we need a mean to quickly deploy our components in an automated fashion. That’s where we introduce Ansible to automate provisioning.

Ansible Worker

Since I don’t want to run Ansible straight from my host, I created a docker container (hereinafter called the “worker”) to play the role of the control machine.

First, let’s start by preparing the Ansible configuration file (a fully commented template can be found on the Ansible GitHub page):

1
2
3
4
5
6
7
8
9
10
11
host:~/hashistack ❯ cat ansible/ansible.cfg
[defaults]
host_key_checking   = False
system_warnings     = False
retry_files_enabled = False
remote_user         = vagrant
roles_path          = /etc/ansible/roles
inventory           = /playbooks/inventory
ask_pass            = True
# ask_vault_pass      = True
# vault_password_file = /playbooks/inventory/ansible_vault_pwd.txt
  • Line 6: vagrant is the default user configured by Vagrant while setting up the virtual machines. Remember to modify this if you are not using Vagrant.
  • Line 7: we are telling Ansible that we will store our roles in the /etc/ansible/roles folder of the worker.
  • Line 8: similar to the line above, we are telling Ansible that we will place inventory-related files in the /playbooks/inventory folder of the worker.
  • Line 9: we are requesting Ansible to ask for the vagrant user’s password upon running a playbook.
  • Line 11: this is the location of a file containing the Vault password used to encrypt the group_vars files. More on this in the Nomad Section below.

Next, we are going to create a Dockerfile for this worker: using alpine as a base image, we are adding a few packages and folders needed by Ansible to work properly:

host:~/hashistack ❯ cat ansible/Dockerfile
FROM alpine:latest

# Install dependencies
RUN apk update && \
    apk upgrade &&  \
    apk add --no-cache --update ansible libffi-dev py-netaddr openssh sshpass zip
# Initialize
RUN mkdir -p /etc/ansible
RUN mkdir -p /playbooks
WORKDIR /playbooks
# RUN
CMD ["/bin/sh"]

Let’s build this image and let’s call it ansibleworker:

host:~/hashistack ❯ docker build -t ansibleworker:1.0 ansible/

Now, we can run the container by sharing the ansible folder on the host straight to /etc/ansible on the worker (so to have the config files already in place to be picked up by Ansible), and the playbooks folder to /playbooks (where we are going to store our main playbooks):

host:~/hashistack ❯ docker run -ti --rm -v $(pwd)/ansible:/etc/ansible -v $(pwd)/playbooks:/playbooks ansibleworker:1.0
/playbooks $ ls /etc/ansible/
Dockerfile        ansible.cfg       requirements.yml  roles

Ansible Roles

My original plan was to go the really hard way by manually configuring and deploying every single component, but luckily I’ve recently came across a nice post from Mockingbird Consulting in which they suggested to use Ansible roles pre-made and customised for the main HashiCorp’s products. I’ve considered it for a while, but I ended up agreeing on this solution, which definitely saved me a considerable amount of time.

As suggested in that post, we can use Ansible Galaxy to download some pre-configured roles:

This is reflected in the requirements.yml file:

host:~/hashistack ❯ cat ansible/requirements.yml
- src: geerlingguy.docker
- src: brianshumate.consul
- src: griggheo.consul-template
- src: brianshumate.vault
- src: brianshumate.nomad
- src: kibatic.traefik

Once ready, let’s use our docker container to pull these roles into the /ansible/roles/ folder:

/playbooks $ ansible-galaxy install --roles-path /etc/ansible/roles -r /etc/ansible/requirements.yml

- downloading role 'docker', owned by geerlingguy
- downloading role from https://github.com/geerlingguy/ansible-role-docker/archive/2.5.2.tar.gz
- extracting geerlingguy.docker to /etc/ansible/roles/geerlingguy.docker
- geerlingguy.docker (2.5.2) was installed successfully
- downloading role 'consul', owned by brianshumate
- downloading role from https://github.com/brianshumate/ansible-consul/archive/v2.3.1.tar.gz
- extracting brianshumate.consul to /etc/ansible/roles/brianshumate.consul
- brianshumate.consul (v2.3.1) was installed successfully
- downloading role 'consul-template', owned by griggheo
- downloading role from https://github.com/griggheo/ansible-consul-template/archive/1.2.1.tar.gz
- extracting griggheo.consul-template to /etc/ansible/roles/griggheo.consul-template
- griggheo.consul-template (1.2.1) was installed successfully
- downloading role 'vault', owned by brianshumate
- downloading role from https://github.com/brianshumate/ansible-vault/archive/v2.1.0.tar.gz
- extracting brianshumate.vault to /etc/ansible/roles/brianshumate.vault
- brianshumate.vault (v2.1.0) was installed successfully
- downloading role 'nomad', owned by brianshumate
- downloading role from https://github.com/brianshumate/ansible-nomad/archive/v1.8.0.tar.gz
- extracting brianshumate.nomad to /etc/ansible/roles/brianshumate.nomad
- brianshumate.nomad (v1.8.0) was installed successfully
- downloading role 'traefik', owned by kibatic
- downloading role from https://github.com/kibatic/ansible-traefik/archive/1.7.6.tar.gz
- extracting kibatic.traefik to /etc/ansible/roles/kibatic.traefik
- kibatic.traefik (1.7.6) was installed successfully

/etc/ansible $ ls -l roles/
total 0
drwxr-xr-x   19 root     root           646 Jan 21 19:32 brianshumate.consul
drwxr-xr-x   19 root     root           646 Jan 21 19:32 brianshumate.nomad
drwxr-xr-x   19 root     root           646 Jan 21 19:32 brianshumate.vault
drwxr-xr-x   12 root     root           408 Jan 21 19:31 geerlingguy.docker
drwxr-xr-x   23 root     root           782 Jan 21 19:32 griggheo.consul-template
drwxr-xr-x   15 root     root           510 Jan 21 19:32 kibatic.traefik

With this step completed, we now have everything we need to finally start provisioning the HashiCorp stack.


Core Components

Core Component 1: Consul (+ dnsmasq)

The glue of the entire infrastructure is provided by Consul and its service discovery capabilities, so we are going to deploy it first. [Please refer to Part 1 for an in-depth description of Consul].

The hosts file of our inventory is the place where we can instruct Ansible on which service needs to be placed on which host. We are going to reflect the distribution defined in the “Logical Deployment” section, but please ensure to change the IP addresses accordingly with the ones set in your Vagrantfile:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
host:~/hashistack ❯ cat playbooks/inventory/hosts
server01.consul ansible_host=192.168.0.111
server02.consul ansible_host=192.168.0.112

[consul_instances]
server01.consul consul_node_role=bootstrap
server02.consul consul_node_role=client

[vault_instances]
server01.consul

[docker_instances]
server01.consul nomad_node_role=both
server02.consul nomad_node_role=client

[traefik_instances]
server01.consul

[nomad_deployer]
server02.consul

Now that we have defined the inventory, we can start deploying the different pieces of our stack. We are going to use a custom Ansible playbook for this (playbooks/hashistack.yml), starting with Consul, and adding components as we go:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
host:~/hashistack ❯ cat playbooks/hashistack.yml
- hosts: consul_instances
  become: true
  become_user: root
  roles:
      - {role: brianshumate.consul, tags: ['consul']}
  vars:
      # VARIABLES TO EDIT
      consul_iface: eth1
      consul_datacenter: "lab"
      consul_acl_datacenter: "lab"
      consul_domain: "consul"
      consul_recursors: ['1.1.1.1', '1.0.0.1']
      # CONSTANTS
      consul_client_address: "0.0.0.0"
      consul_dnsmasq_enable: yes
      consul_acl_master_token_display: yes
  • Line 2: the role is going to be applied only to the hosts listed in the consul_instances group of the hosts file.
  • Line 6: we are using the Consul role downloaded from Ansible Galaxy.
  • Line 9: to be able to interact with the services directly from the host, you’ll need to specify the interface of your Vagrant boxes bridged to your local network.
  • Line 10-11: this is the name of the datacenter. The usual default is dc1, but here I changed it to lab.
  • Line 12: the name of the Consul domain we are about to create. If you decide to change this variable, please ensure all variables ending in _domain are set to the same value.
  • Line 13: the DNS servers you want to use for external lookups. Here I set them to use the Cloudflare ones.

If you are using Vagrant, you shouldn’t need to change any variable, so we can use the Ansible worker to run the playbook against our hosts:

/playbooks $ ansible-playbook hashistack.yml
SSH password:

PLAY [consul_instances] ****************************************************************

TASK [Gathering Facts] *****************************************************************

[... omitted for brevity ...]

PLAY RECAP *****************************************************************************
server01.consul            : ok=38   changed=15   unreachable=0    failed=0
server02.consul            : ok=28   changed=11   unreachable=0    failed=0

If everything goes well, you should now have both Consul and Dnsmasq installed.

Now, the way the Ansible playbook spawns the Consul servers (unordered), means that sometimes the clients might have problems joining the cluster. To solve this, I created a playbook (playbooks/restart.yml) just to restart the services in the same order defined in the “Logical Deployment” section:

  1. Consul Servers
  2. Consul Clients
  3. Dnsmasq
  4. Vault
  5. Nomad Servers
  6. Nomad Clients
  7. Traefik

I’m not pasting its entire content here, but you can find it in the GitHub repository of this blogpost. Let’s run this playbook to restart the Consul servers/clients. You might get some errors as we haven’t deployed neither Vault nor Nomad yet. You can ignore them for now:

/playbooks $ ansible-playbook restart.yml
SSH password:

PLAY [consul_instances] *******************************************

TASK [Gathering Facts] *******************************************
ok: [server01.consul]
ok: [server02.consul]

TASK [Restart Consul Servers] *******************************************
skipping: [server02.consul]
changed: [server01.consul]

TASK [Restart Consul Clients] *******************************************
skipping: [server01.consul]
changed: [server02.consul]

TASK [Restart dnsmasq] *******************************************
changed: [server01.consul]
changed: [server02.consul]

[... omitted for brevity ...]

PLAY RECAP *******************************************
server01.consul            : ok=7    changed=2    unreachable=0    failed=0
server02.consul            : ok=5    changed=2    unreachable=0    failed=0

You can now browse to http://192.168.0.111:8500 (or http://consul.service.lab.consul:8500) to access the Consul’s web UI:

Nodes Composing the Consul Cluster.
Nodes Composing the Consul Cluster.

At the same time, you can verify the setup by inspecting Consul’s logs and the DNS resolution:

host:~/hashistack ❯ vagrant ssh node-1
[email protected]:~$ consul monitor
2019/01/22 18:55:13 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:299c7c71-ec50-4eec-b50a-09dfbff11914 Address:192.168.0.111:8300}]
2019/01/22 18:55:13 [INFO] serf: EventMemberJoin: server01.lab 192.168.0.111
2019/01/22 18:55:13 [INFO] serf: EventMemberJoin: server01 192.168.0.111
2019/01/22 18:55:13 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
2019/01/22 18:55:13 [INFO] raft: Node at 192.168.0.111:8300 [Follower] entering Follower state (Leader: "")
2019/01/22 18:55:13 [INFO] consul: Adding LAN server server01 (Addr: tcp/192.168.0.111:8300) (DC: lab)
2019/01/22 18:55:13 [INFO] consul: Handled member-join event for server "server01.lab" in area "wan"
2019/01/22 18:55:13 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
2019/01/22 18:55:13 [INFO] agent: Started HTTP server on [::]:8500 (tcp)
2019/01/22 18:55:13 [INFO] agent: started state syncer
2019/01/22 18:55:13 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce k8s os packet scaleway softlayer triton vsphere
2019/01/22 18:55:13 [INFO] agent: Joining LAN cluster...
2019/01/22 18:55:13 [INFO] agent: (LAN) joining: [192.168.0.111]
2019/01/22 18:55:13 [INFO] agent: (LAN) joined: 1 Err: <nil>
2019/01/22 18:55:13 [INFO] agent: Join LAN completed. Synced with 1 initial agents
2019/01/22 18:55:14 [WARN] raft: Heartbeat timeout from "" reached, starting election
2019/01/22 18:55:14 [INFO] raft: Node at 192.168.0.111:8300 [Candidate] entering Candidate state in term 13
2019/01/22 18:55:14 [INFO] raft: Election won. Tally: 1
2019/01/22 18:55:14 [INFO] raft: Node at 192.168.0.111:8300 [Leader] entering Leader state
2019/01/22 18:55:14 [INFO] consul: cluster leadership acquired
2019/01/22 18:55:14 [INFO] consul: New leader elected: server01
2019/01/22 18:55:14 [INFO] consul: member 'server02' reaped, deregistering
2019/01/22 18:55:14 [INFO] agent: Synced node info
2019/01/22 18:55:18 [INFO] serf: EventMemberJoin: server02 192.168.0.112
2019/01/22 18:55:18 [INFO] consul: member 'server02' joined, marking health alive

[email protected]:/home/vagrant# dig +short server01.node.lab.consul
192.168.0.111
[email protected]:/home/vagrant# dig +short consul.service.lab.consul
192.168.0.111

Core Component 2: Vault

Next in our list is Vault. Vault is a tool that provides a unified interface for securely accessing secrets (e.g., API keys, passwords, or certificates), while providing tight access control and recording a detailed audit log.

We are going to update the main playbook (playbooks/hashistack.yml) to add this new component:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
host:~/hashistack ❯ cat playbooks/hashistack.yml
 - hosts: consul_instances
  become: true
  ...

- hosts: vault_instances
  become: true
  become_user: root
  roles:
      - {role: brianshumate.vault, tags: ['vault']}
  vars:
      # VARIABLES TO EDIT
      vault_datacenter: "lab"
      vault_domain: "consul"
      # CONSTANTS
      vault_ui: yes
      vault_cluster_disable: yes
  • Line 13-14: ensure these variables have the same value as those specified for the consul_instances

Run the playbook again to deploy Vault:

/playbooks $ ansible-playbook hashistack.yml
[... omitted for brevity ...]
TASK [brianshumate.vault : extract systemd version] ****************************************************************
skipping: [server01.consul]

TASK [brianshumate.vault : systemd unit] ***************************************************************************
skipping: [server01.consul]

TASK [brianshumate.vault : Start Vault] ****************************************************************************
changed: [server01.consul]

RUNNING HANDLER [brianshumate.vault : Restart vault] ***************************************************************
changed: [server01.consul]

TASK [brianshumate.vault : Vault API reachable?] *******************************************************************
FAILED - RETRYING: Vault API reachable? (30 retries left).
FAILED - RETRYING: Vault API reachable? (29 retries left).

At some point you will see the task failing while trying to reach Vault. This is happening because, at this point, Vault is still sealed. Head to http://192.168.0.111:8200/ (or http://active.vault.service.lab.consul:8200) to unseal it:

  • First, setup the master keys: for this deployment 1 is a “good enough” value for both “Key Shares” and “Key Threshold”, but in a later post we will deploy a hardened configuration with multiple shares. Once done, click on “Initialize”:
Unseal Vault: Setup Master Keys.
Unseal Vault: Setup Master Keys.
  • Next, download the master key and the root token:
Unseal Vault: Obtain Root Token.
Unseal Vault: Obtain Root Token.
  • You are going to obtain something like the below (remember to store these values!):
{
  "keys": [
    "9bba87f37cdac08e1fd393b35911fd5e40ee7b5f174dbf3d08c2c528e9951c04"
  ],
  "keys_base64": [
    "m7qH83zawI4f05OzWRH9XkDue18XTb89CMLFKOmVHAQ="
  ],
  "root_token": "s.4OH2DO2shJLNnG79pmpVwriA"
}
  • Click on “Continue to Unseal”, and enter your master key (keys in the snippet above) to finally unseal Vault:
Unseal Vault.
Unseal Vault.
  • You will be redirected to the web UI login page, where you can use your root token (root_token in the snippet above) as your authentication token:
Vaul Web UI Login.
Vault Web UI Login.
Vault Web UI.
Vault Web UI.
  • Once Vault is unsealed, the Ansible task will resume and complete successfully:
TASK [brianshumate.vault : Vault API reachable?] *******************************************************************
FAILED - RETRYING: Vault API reachable? (30 retries left).
FAILED - RETRYING: Vault API reachable? (29 retries left).
FAILED - RETRYING: Vault API reachable? (28 retries left).
FAILED - RETRYING: Vault API reachable? (27 retries left).
FAILED - RETRYING: Vault API reachable? (26 retries left).
FAILED - RETRYING: Vault API reachable? (25 retries left).
ok: [server01.consul]

PLAY RECAP *********************************************************************************************************
server01.consul            : ok=57   changed=30   unreachable=0    failed=0
server02.consul

Just remember you will have to unseal Vault every time you restart it.

Core Component 3: Nomad (+ docker)

The third core component we are going to deploy is Nomad. Nomad is a tool designed for managing a cluster of machines and running applications on them. It abstracts away machines and the location of applications, and instead enables users to declare what they want to run. Nomad will then handle where they should run and how to run them.

We are going to update playbooks/hashistack.yml once again:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
host:~/hashistack ❯ cat playbooks/hashistack.yml
 - hosts: consul_instances
  become: true
  ...

- hosts: vault_instances
  become: true
  ...

- hosts: docker_instances
  become: true
  become_user: root
  roles:
      - {role: geerlingguy.docker, tags: ['docker']}
      - {role: brianshumate.nomad, tags: ['docker', 'nomad']}
  vars:
      # VARIABLES TO EDIT
      nomad_datacenter: "lab"
      nomad_domain: "consul"
      nomad_iface: eth1
      nomad_network_interface: eth1
      nomad_servers: ['server01.node.lab.consul:4647']
      # CONSTANTS
      nomad_consul_address: http://127.0.0.1:8500
      nomad_vault_address: http://active.vault.service.lab.consul:8200
      nomad_bind_address: "0.0.0.0"
      nomad_use_consul: yes
      nomad_vault_enabled: yes
      nomad_options: {  'driver.raw_exec.enable': '1' }
      nomad_network_speed: 10
      nomad_bootstrap_expect: 1
      nomad_docker_enable: yes
      permanent: yes
      nomad_ports_http: 4646
      nomad_ports_rpc: 4647
      nomad_ports_serf: 4648
  • Line 14-15: we are going to use both the Docker and Nomad role downloaded from Ansible Galaxy.
  • Line 18-19: ensure these variables have the same value as those specified for the consul_instances.
  • Line 20-21: specify the interface of your Vagrant boxes bridged to your local network.
  • Line 22: a list of at least one Nomad server, used by the clients to join the cluster. Here we are using the hostname of server01 in the Consul domain.

This time we can’t run the playbook straight away, buy we will rather have to provide Nomad with a Vault Token.

Nomad Vault Token Configuration

To use the Vault integration, Nomad servers must be provided a Vault token. This token can either be a root token or a periodic token with permissions to create from a token role. The root token is the easiest way to get started, so we are going to employ it here, but we are going to replace it with a role-based token in a following post.

We are therefore going to add the Vault root token by creating an Ansible-Vault file at playbooks/inventory/group_vars/docker_instances:

/playbooks $ mkdir -p /playbooks/inventory/group_vars
/playbooks $ ansible-vault create /playbooks/inventory/group_vars/docker_instances
New Vault password:
Confirm New Vault password:

After choosing a password to protect this file (note it down!), ansible-vault will spawn your preferred editor, where you can paste the following content:

---
nomad_vault_token: <Vault Root Token>

We now just need to tell Ansible how to fetch this password at runtime, so that it can decrypt the content of this file. Edit the ansible/ansible.cfg file and uncomment the line containing ask_vault_pass = True. Potentially, if you are not in a production scenario, you could also decide to place this password in a text file, and then provide it to Ansible with the vault_password_file variable.

[defaults]
host_key_checking   = False
system_warnings     = False
retry_files_enabled = False
remote_user         = vagrant
roles_path          = /etc/ansible/roles
inventory           = /playbooks/inventory
ask_pass            = True
ask_vault_pass      = True
# vault_password_file = /playbooks/inventory/ansible_vault_pwd.txt

Re-run Ansible against the server: Nomad will be configured with the appropriate token, and it will be able to authenticate against Vault and retrieve secrets:

/playbooks $ ansible-playbook hashistack.yml
SSH password:
Vault password:

PLAY [consul_instances] ****************************************************************
[...]

TASK [brianshumate.nomad : Create directories] *****************************************
changed: [server01.consul] => (item=/var/nomad)
changed: [server02.consul] => (item=/var/nomad)
changed: [server01.consul] => (item=/var/log/nomad)
changed: [server02.consul] => (item=/var/log/nomad)

TASK [brianshumate.nomad : Create config directory] ************************************
changed: [server01.consul]
changed: [server02.consul]

TASK [brianshumate.nomad : Base configuration] *****************************************
changed: [server01.consul]
changed: [server02.consul]

TASK [brianshumate.nomad : Server configuration] ***************************************
skipping: [server02.consul]
changed: [server01.consul]

TASK [brianshumate.nomad : Client configuration] ***************************************
changed: [server01.consul]
changed: [server02.consul]

TASK [brianshumate.nomad : Start Nomad] ************************************************
changed: [server02.consul]
changed: [server01.consul]

PLAY RECAP *****************************************************************************
server01.consul            : ok=33   changed=14   unreachable=0    failed=0
server02.consul            : ok=32   changed=14   unreachable=0    failed=0

If you didn’t incur in any error, you can head to http://192.168.0.111:4646 (or http://nomad-servers.service.lab.consul:4646) to access Nomad’s Web UI.

Nomad We bUI.
Nomad Web UI.

Launch Nomad Jobs

Let’s test our installation by running a job on Nomad while also registering it with Consul.

The playbooks/apps folder is where we are going to store Nomad job specifications, config files, and everything else that our applications might need. We are going to start with a simple Redis container deployed through the Docker driver for Nomad, as shown below. Going into details of Nomad job specifications is out of scope for this blog post, but the HashiCorp website has good tutorials: 1, 2.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
host:~/hashistack ❯ cat playbooks/apps/redis.nomad
job "example" {
  datacenters = ["lab"]
  type = "service"
  group "sampleservice" {
    count = 1
    restart {
      attempts = 2
      interval = "30m"
      delay = "15s"
      mode = "fail"
    }
    task "sampleservice-db" {
      driver = "docker"
      config {
        image = "redis:3.2"
        network_mode = "bridge"
        port_map {
          sampleservicedbport = 6379
        }
        labels {
          group = "label_db"
        }
        dns_servers = [
          "consul.service.lab.consul"
        ]
      }
      resources {
        network {
          mbits = 1
          port "sampleservicedb_port" {}
        }
      }
      service {
        name = "sampleservice-db-service"
        tags = ["sampleservice", "redis"]
        port = "sampleservicedb_port"
        check {
          name     = "alive"
          type     = "tcp"
          port     = "sampleservicedb_port"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

To deploy it, we can copy this yaml file to one of our hosts, and then issue the nomad job run command:

[email protected]:/vagrant$ nomad job run /vagrant/playbooks/apps/redis.nomad
==> Monitoring evaluation "f0e4f8b2"
    Evaluation triggered by job "example"
    Allocation "0d623665" created: node "332508ea", group "sampleservice"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "f0e4f8b2" finished with status "complete"
Sample Job Running.
Sample Job Running.
Sample Job Registered in Consul.
Sample Job Registered in Consul.

Core Component 4: Traefik

The final component is Traefik. Traefik is a reverse proxy/load balancer that natively integrates with every major cluster technology, Consul included. The idea is to use it as the public access point to the infrastructure and provide reverse proxying to the underlying hosts, so that we won’t have to keep remembering host/ports combination. In addition, it has full support for Consul as a service catalog, meaning it can detect new services without the need of a restart.

First, we need to create two additional files:

  1. The Traefik config file (playbooks/apps/traefik/traefik.toml) where we enable both the Docker and Consul integration of Traefik. I’m not pasting the full content here, but you can find this file in the GitHub repository of this blogpost. [For a full explanation of the Traefik configuration file, please refer to the official documentation.]
  2. An init script (playbooks/apps/traefik/traefik_debianinit.j2) to replace the startup method (systemd) used in the kibatic.traefik role. Indeed, while putting this post together I realised that the main version of Ubuntu supported by Vagrant (trusty) is too old to properly support systemd. I’m not pasting this file as well (given its length), but it will basically allow us to restart the Traefik service even without systemd.

One final update to the hashistack.yml playbook:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
host:~/hashistack ❯ cat playbooks/hashistack.yml
- hosts: consul_instances
  become: true
  ...

- hosts: vault_instances
  become: true
  ...

- hosts: docker_instances
  become: true
  ...

- hosts: traefik_instances
  become: true
  become_user: root
  ignore_errors: True
  post_tasks:
    - name: Copy Traefik Config File
      template:
        src: /playbooks/apps/traefik/traefik.toml
        dest: /etc/traefik.toml
    - name: Copy Traefik init script
      template:
        src: /playbooks/apps/traefik/traefik_debianinit.j2
        dest: /etc/init.d/traefik
        owner: root
        group: root
        mode: 0755
  roles:
       - {role: kibatic.traefik, tags: ['docker', 'nomad', 'traefik']}
  vars:
      # VARIABLES TO EDIT
      traefik_bind_ip: "192.168.0.111"
      traefik_consul_master: "consul.service.lab.consul"
  • Line 17: since the kibatic.traefik role requires systemd to restart the Traefik service, this will cause the entire role to fail on our Vagrant boxes running Ubuntu Trusty. So, only as a workaround, I was forced to add this ignore_errors: True to make Ansible perform also the post_tasks. In the next post of this series we are going to move to a more recent version of Ubuntu in order to address this issue.
  • Line 19: in a first post_task to be performed after the kibatic.traefik role (line 31), we are going to copy the playbooks\apps\traefik\traefik.toml template onto the host running Traefik.
  • Line 23: in the second post_task we are copying the init script on the same host.
  • Line 34: this should be set to the public IP of your infrastructure (in this case I’m using the same local IP of node-1).

Run the playbook one more time and then restart all services:

/playbooks $ ansible-playbook hashistack.yml
...
/playbooks $ ansible-playbook restart.yml
...

If you now browse to the Traefik Web UI at http://192.168.0.111:8081/, you will be able to see the Consul catalog in realtime:

Consul Catalog as shown by Traefik.
Consul Catalog as shown by Traefik.

Sample Application

With the stack fully deployed, let’s see how to run an application on Nomad, register it to the Consul catalog, and access it via Traefik. For this post we will limit ourselves to use a simple webserver, whereas in a future post we are going to focus on real applications/services that can be used during a penetration test and/or red teaming engagement.

We first need to create a Nomad job description (playbooks/apps/simpleserver.nomad):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
host:~/hashistack ❯ cat playbooks/apps/simpleserver.nomad
job "simple-server" {
  datacenters = ["lab"]
  type = "service"
  group "simpleserver" {
    count = 1
    restart {
      attempts = 10
      interval = "5m"
      delay = "25s"
      mode = "delay"
    }
    ephemeral_disk {
      size = 300
      sticky = true
      migrate = true
    }

    task "httpserver0" {
      driver = "docker"

      config {
        image = "python:3.6.4-alpine3.7"
        command = "python3"
        args = [
          "-m",
          "http.server",
          "8000"
        ]
        port_map {
          http = 8000
        }
        dns_servers = [
          "consul.service.lab.consul"
        ]
        work_dir = "/var/www/html/"
      }

      resources {
        network {
          mbits = 1
          port "http" {
            static = 8000
          }
        }
      }

      service {
        name = "httpserver0"
        tags = [
          "traefik.tags=service",
          "traefik.frontend.rule=PathPrefixStrip:/0/",
        ]
        port = "http"
        check {
          name     = "alive"
          type     = "tcp"
          port     = "http"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }

  }
}
  • Lines 20-28: we are going to use the docker driver, to run a python:3.6.4-alpine3.7 container on which we will run a command to start a webserver listening on port 8000 (python3 -m http.server 8000)
  • Lines 50-53: we are using tags to override the default frontend rule, specifying that this app can be reached at the http://<IP>/0/ URL (more on this below).

Now, we could manually copy this file over to a host, and then again manually run Nomad against it, but since I wanted to automate the process as much as possible I’ve created a new playbook to simplify the process of launching new applications on the infrastructure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
host:~/hashistack ❯ cat playbooks/apps.yml
- hosts: nomad_deployer
  become: true
  become_user: root
  # TASKS
  tasks:
  - name: Deploy Simple Server
    block:
      - name: Copy Nomad Job
        copy:
          src: /playbooks/apps/simpleserver.nomad
          dest: /tmp/simpleserver.nomad
      - name: Run Nomad Job
        command: nomad job run /tmp/simpleserver.nomad

This playbook will first copy the job description on the nomad_deployer host (defined in playbooks/inventory/hosts), and then it will schedule it on Nomad:

/playbooks $ ansible-playbook apps.yml
SSH password:
Vault password:

PLAY [nomad_deployer] ******************************************************************

TASK [Gathering Facts] *****************************************************************
ok: [server02.consul]

TASK [Copy Nomad Job] ******************************************************************
changed: [server02.consul]

TASK [Run Nomad Job] *******************************************************************
changed: [server02.consul]

PLAY RECAP *****************************************************************************
server02.consul            : ok=3    changed=2    unreachable=0    failed=0
New Frontend Added to Traefik.
New Frontend Added to Traefik.

As a quick smoke test, we can use curl to query some of the frontends:

❯ curl http://192.168.0.111/0/
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Directory listing for /</title>
</head>
<body>
<h1>Directory listing for /</h1>
<hr>
<ul>
</ul>
<hr>
</body>
</html>

❯ curl -H Host:vault.service.consul http://192.168.0.111/ui/
<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta http-equiv="cache-control" content="no-store" />
    <meta http-equiv="expires" content="0" />
    <meta http-equiv="pragma" content="no-cache" />
    <title>Vault</title>
    <meta name="viewport" content="width=device-width, initial-scale=1">
  </head>
  <body>

    <script src="/ui/assets/vendor-9e6bb7bde315fb2e3f4c03b643b80ad9.js" integrity="sha256-qY6bwmjLIRtXBYXqbjGa6A04QwXcy4ZqdNJCSXwUwvc= sha512-rC/1hBN/UP8hl3mrFS2NvdwrW0+f1rPtI6F+WbXtfoJnuNscqAzdTs47Ej3Gwn9LUUOhrKcbj+aAr7PEVlbFcQ==" ></script>
    <script src="/ui/assets/vault-6f5373908448768c1c4fcffcf835b5fd.js" integrity="sha256-pcQtWwBPivXuTQYaevxDsmXZ236qdCuMNfUHaAt00DM= sha512-4+rJ5+jQQctOCD2dBp2T9/b2RnVq910SnsnceA25INnXUTGx2aSWklpT2xkh0rDsz+qeQXLNDh/rKl6xK4mc2A==" ></script>

    <div id="ember-basic-dropdown-wormhole"></div>
  </body>
</html>

Conclusion

This post was Part 2 of the “Offensive Infrastructure with Modern Technologies” series, where I hope I gave out enough information to enable you to automatically deploy the full HashiCorp stack with Ansible.

The full source code used in this article, together with a handy cheatsheet, can be found in the related Github repository: https://github.com/marco-lancini/offensive-infrastructure.

In Part 3 we will continue on this path by hardening the setup we deployed in this post.

I hope you found this post useful and interesting, and I’m keen to get feedback on it! Therefore, if you find the information shared in this series is useful, or if something is missing, or if you have ideas on how to improve it, please leave a comment in the area below, or let me know on Twitter.