This post is Part 1 of the “Offensive Infrastructure with Modern Technologies” series, and is going to provide an introduction to the HashiCorp suite, and to Consul in particular.

We will pave the way to start building an offensive infrastructure, by explaining how Consul can be used as a Service Mesh both in single and multi node deployments, in simple and hardened configurations.

The HashiCorp Stack

The HashiCorp stack is going to play a pivotal part for the setup of this infrastructure, as it provides consistent workflows to provision, secure, connect, and run any infrastructure for any application. Plus, it is even open source!

At a high-level, the suite is composed by 6 main components, as shown in the image below:

  • Vagrant: a tool for building and managing virtual machines in a single workflow, by providing an easy way to configure reproducible and portable work environments.
  • Packer: a tool for creating identical machine images for multiple platforms from a single source configuration. It enables modern configuration management by using automated scripts to install and configure the software within Packer-made images. We will see how we can plug Ansible into Packer.
  • Terraform: a tool for building, changing, and versioning infrastructure. It is an open source tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
  • Vault: a tool for securely accessing secrets (e.g., API keys, passwords, or certificates). Vault provides a unified interface to any secret, while providing tight access control and recording a detailed audit log.
  • Nomad: a tool for managing a cluster of machines and running applications on them. Nomad abstracts away machines and the location of applications, and instead enables users to declare what they want to run. Nomad will then handle where they should run and how to run them.
  • Consul: a service mesh solution providing a fully featured control plane with service discovery, configuration, KV store, and segmentation functionality.
The HashiCorp suite
The HashiCorp suite. Image courtesy of Discoposse.

In addition, we are also going to rely on:

  • Ansible: an engine that automates provisioning, configuration management, application deployment, and intra-service orchestration.
  • Docker: now ubiquitous, we will use it to containerize all services/applications.

In this post we start by focusing on Consul, but we will go into details with Packer, Terraform, and Vault as we progress in the series.

Consul as a Service Mesh

Consul will be the building block for this setup, thanks to its main features:

Feature Affected Area Description
Service Discovery Connectivity Service Registry enables services to register and discover each other.
Service Segmentation Security Secure service-to-service communication with automatic TLS encryption and identity-based authorization.
Service Configuration Runtime Configuration Key/Value store to easily configure services.
Key/Value Store Deployments Consul can be used as Vault and Terraform backend.

Although describing every single feature in detail would require an entire blog post on its own, I’m going to cover the majority of these aspects while building the infrastructure. In addition, the HashiCorp website has a very good documentation, which is definitely worth a read.

What is important to keep in mind before we start is Consul’s high-level architecture:

The Consul's Architecture
The Consul's architecture. Image courtesy of HashiCorp.

The Hardware Prerequisites

For this post I’m assuming you’ll have access to a hypervisor of some sort (preferably VMWare) which you can use for network virtualization and resource allocation. A cloud-based setup is going to be covered in a later article in this series.

In the current setup, summarized in the figure below, an “Admin VPN” connection can be used by admins (i.e., myself) to connect to the vSphere Web Client so to manage the hypervisor itself. In addition, I created a separate network (let’s call it “Production Network”) which we are going to assign to every new virtual machine created. Members of the security team will have access to this network by connecting to a dedicated VPN.

The VMWare Network Setup
The VMWare Network Setup.

The same end goal can be obtained by using a single VPN connection (or by just connecting everything to the same local network), but here I opted for a more defined segregation of roles.

For this blog post, and in order to get accustomed with Consul, I created 2 Ubuntu Server virtual machines, and I attached them to the “Production Network”:

  • TESTING-NODE1 with IP address of 10.10.100.11
  • TESTING-NODE2 with IP address of 10.10.100.12

Consul - Basic Configuration

Single Node Deployment

In its most basic configuration, we are going to have Consul, dnsmasq, and the containers hosting our applications all on the same host, as depicted in the image below.

High-Level Network Diagram: Single Node Configuration
High-Level Network Diagram: Single Node Configuration.

This section is going to focus on how to achieve this setup on TESTING-NODE1.

Configuring dnsmasq

By default, DNS is served from port 53, and, on most operating systems, this requires elevated privileges. As per HashiCorp’s documentation on DNS, instead of running Consul with an administrative or root account, it is possible to instead forward appropriate queries to Consul, running on an unprivileged port, from another DNS server or port redirect.

First, we need to install dnsmasq:

user@TESTING-NODE1 ❯ sudo apt install dnsmasq

Once installed, dnsmasq is typically configured via a dnsmasq.conf or a series of files in the /etc/dnsmasq.d directory. In dnsmasq’s configuration file (e.g., /etc/dnsmasq.d/consul.conf), add the following:

user@TESTING-NODE1 ❯ cat /etc/dnsmasq.d/consul.conf
# Enable forward lookup of the 'consul' domain:
server=/consul/10.10.100.11#8600

Remember that the IP 10.10.100.11 above is the IP of TESTING-NODE1 within the “Production Network”. Once that configuration is created, restart the dnsmasq service.

user@TESTING-NODE1 ❯ /etc/init.d/dnsmasq restart

Configuring Consul

Each host in a Consul cluster runs the Consul agent, a long running daemon that can be started in client or server mode. Each cluster has at least 1 agent in server mode, and usually 3 or 5 for high availability. The server agents participate in a consensus protocol, maintain a centralized view of the cluster’s state, and respond to queries from other agents in the cluster. The rest of the agents in client mode participate in a gossip protocol to discover other agents and check them for failures, and they forward queries about the cluster to the server agents.

These concepts also apply when running Consul in Docker. Typically, we will need to run a single Consul agent container on each host, running alongside the Docker daemon. For this single node deployment, we are going to have one single Consul agent deployed in server mode.

Let’s start by creating 2 directories that we will use for the remainder of the series:

  • /docker/data/: will be used as a mount point for volumes to pass to our Docker containers, so to persist their data
  • /docker/services/: will be used to host the containers’ configurations (i.e., different docker-compose.yml and other configuration files)
user@TESTING-NODE1 ❯ sudo mkdir /docker
user@TESTING-NODE1 ❯ sudo chown user:user /docker/
user@TESTING-NODE1 ❯ mkdir -p /docker/{data,services}
user@TESTING-NODE1 ❯ tree /docker/
/docker/
├── data
└── services

2 directories, 0 files

With the basic folders in place, it’s time to create an additional directory (/docker/services/consul_server/) that will host the specification for our first container:

user@TESTING-NODE1 ❯ tree /docker/services/consul_server/
/docker/services/consul_server/
├── config
│   └── local.json
└── docker-compose.yml

1 directory, 2 files

As you can see from the directory tree above, we are going to use docker-compose to spin up a Consul server. Here is the content of the docker-compose.yml file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
user@TESTING-NODE1:/docker/services/consul_server ❯ cat docker-compose.yml
version: '2'

services:
    # ------------------------------------------------------------------------------------
    # CONSUL SERVER
    # ------------------------------------------------------------------------------------
    consul_server:
        container_name: consul_server
        image: consul:latest
        restart: always
        network_mode: "host"
        volumes:
            - /docker/data/consul/:/consul/data/
            - ./config/:/consul/config/

Let’s analyze this file (almost) line by line:

  • Line 9: we are going to use the official docker image of Consul (1, 2).
  • Line 12: as suggested by HashiCorp, Consul should always be run with --net=host in Docker because Consul’s consensus and gossip protocols are sensitive to delays and packet loss, so the extra layers involved with other networking types are usually undesirable and unnecessary.
  • Line 14: the Consul container exposes by default the volume /consul/data, a path where Consul will place its persisted state. Here we map this folder to /docker/data/consul/ on the host, so to persist data across restarts and failures. It has to be noted that, for client agents, this directory will store some information about the cluster and the client’s health checks in case the container is restarted. For server agents, this will store the client information, plus snapshots and data related to the consensus algorithm and other state like Consul’s key/value store and catalog.
  • Line 15: the container has a Consul configuration directory set up at /consul/config, here mapped to ./config/ on the host. The agent will load any configuration file placed here.

The other key component is the config/local.json file, which will provide a configuration to the Consul agent:

1
2
3
4
5
6
7
8
9
user@TESTING-NODE1:/docker/services/consul_server ❯ cat config/local.json
{
    "log_level": "INFO",
    "server": true,
    "ui": true,
    "bootstrap": true,
    "client_addr":"0.0.0.0",
    "bind_addr":"10.10.100.11"
}
  • Line 4: the server flag is used to control if an agent is in server or client mode. Here we start it in server mode.
  • Line 5: the ui flag enables the built-in web UI server and the required HTTP routes (more on this below).
  • Line 6: the bootstrap flag is used to control if a server is in “bootstrap” mode. Technically, a server in bootstrap mode is allowed to self-elect as the Raft leader. It is important that only a single node is in this mode; otherwise, consistency cannot be guaranteed as multiple nodes are able to self-elect themselves.
  • Line 7: the client_addr specifies the address to which Consul will bind client interfaces, including the HTTP and DNS servers. By default, this is 127.0.0.1, allowing only loopback connections. Here we set it to bind to all interfaces (i.e., 0.0.0.0). Since this will make the Consul REST API accessible for everyone, we will have to configure Consul’s ACLs to restrict access.
  • Line 8: bind_addr specifies the address that should be bound to for internal cluster communications. This is an IP address that should be reachable by all other nodes in the cluster. Here we are using 10.10.100.11 which is the IP of TESTING-NODE1 within the “Production Network”.

Starting Consul

We can now use docker-compose to create and start the Consul container. At startup, the agent will read the JSON config files from /consul/config, and data will be persisted in the /consul/data volume.

user@TESTING-NODE1:/docker/services/consul_server ❯ docker-compose up -d

When you run the Consul agent, it listens on 6 different ports, all of which serve different functions:

Function Ports Description
Server RPC 8300 /TCP Used by servers to handle incoming requests from other agents.
Serf LAN 8301 /TCP-UDP Used by all agents to handle gossip in the LAN.
Serf WAN 8302 /TCP-UDP Used by servers to gossip over the WAN to other servers.
CLI RPC 8400 /TCP Used by all agents to handle RPC from the CLI.
HTTP API 8500 /TCP Used by clients to talk to the HTTP API, and used by servers to handle HTTP API requests from clients and the web UI.
DNS Interface 8600 /TCP-UDP Used to resolve DNS queries.

Once started, we can inspect the container’s logs to verify Consul started properly. From the output below you can see that TESTING-NODE1 elected itself as leader for the cluster:

user@TESTING-NODE1:/docker/services/consul_server ❯ docker logs consul_server
bootstrap = true: do not enable unless necessary
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v1.2.2'
           Node ID: 'f9e86344-9522-f14a-8d38-4f01924282d9'
         Node name: 'TESTING-NODE1'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: true)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 10.10.100.11 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

    2018/08/22 20:25:34 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:f9e86344-9522-f14a-8d38-4f01924282d9 Address:10.10.100.11:8300}]
    2018/08/22 20:25:34 [INFO] serf: EventMemberJoin: TESTING-NODE1.dc1 10.10.100.11
    2018/08/22 20:25:34 [INFO] raft: Node at 10.10.100.11:8300 [Follower] entering Follower state (Leader: "")
    2018/08/22 20:25:34 [WARN] serf: Failed to re-join any previously known node
    2018/08/22 20:25:34 [INFO] serf: EventMemberJoin: TESTING-NODE1 10.10.100.11
    2018/08/22 20:25:34 [INFO] consul: Adding LAN server TESTING-NODE1 (Addr: tcp/10.10.100.11:8300) (DC: dc1)
    2018/08/22 20:25:34 [WARN] serf: Failed to re-join any previously known node
    2018/08/22 20:25:34 [INFO] consul: Handled member-join event for server "TESTING-NODE1.dc1" in area "wan"
    2018/08/22 20:25:34 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
    2018/08/22 20:25:34 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
    2018/08/22 20:25:34 [INFO] agent: Started HTTP server on [::]:8500 (tcp)
    2018/08/22 20:25:34 [INFO] agent: started state syncer
    2018/08/22 20:25:41 [ERR] agent: failed to sync remote state: No cluster leader
    2018/08/22 20:25:41 [WARN] raft: Heartbeat timeout from "" reached, starting election
    2018/08/22 20:25:41 [INFO] raft: Node at 10.10.100.11:8300 [Candidate] entering Candidate state in term 2
    2018/08/22 20:25:41 [INFO] raft: Election won. Tally: 1
    2018/08/22 20:25:41 [INFO] raft: Node at 10.10.100.11:8300 [Leader] entering Leader state
    2018/08/22 20:25:41 [INFO] consul: cluster leadership acquired
    2018/08/22 20:25:41 [INFO] consul: New leader elected: TESTING-NODE1
    2018/08/22 20:25:41 [INFO] consul: member 'TESTING-NODE1' joined, marking health alive
    2018/08/22 20:25:43 [INFO] agent: Synced node info

We now can also load the Web UI, available on port 8500 on TESTING-NODE1:

  • http://10.10.100.11:8500/ui/
The Consul Web UI
The Consul Web UI.
The Nodes Composing the Cluster
The Nodes Composing the Cluster.

Registering Containers

There are several approaches you can use to register services running in containers with Consul.

The first one consists of manual configuration, where the containers can use the local agent’s APIs to register and deregister themselves. I personally find this approach quite cumbersome, and not viable because it will require to modify every pre-existing container.

A second approach is provided by running containers under HashiCorp’s Nomad scheduler, which has first class support for Consul. The Nomad agent runs on each host alongside the Consul agent. When jobs are scheduled on a given host, the Nomad agent automatically takes care of syncing the Consul agent with the service information. Although quite interesting, I decided to leave Nomad out of this setup (at least for now).

Finally, there are open source solutions like Registrator from Glider Labs. Registrator works by running a Registrator instance on each host, alongside the Consul agent. Registrator monitors the Docker daemon for container stop and start events, and handles service registration with Consul using the container names and exposed ports as the service information.

Including Registrator in our setup is fairly easy. First of all, we need to add a new service to our docker-compose.yml file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
user@TESTING-NODE1:/docker/services/consul_server ❯ cat docker-compose.yml
version: '2'

services:
    # ------------------------------------------------------------------------------------
    # CONSUL SERVER
    # ------------------------------------------------------------------------------------
    ... as before ...
    # ------------------------------------------------------------------------------------
    # REGISTRATOR
    # ------------------------------------------------------------------------------------
    registrator:
        container_name: registrator
        image: gliderlabs/registrator:latest
        restart: always
        network_mode: "host"
        env_file:
            - .env
        volumes:
            - /var/run/docker.sock:/tmp/docker.sock
        depends_on:
            - consul_server
        dns: ${LOCAL_IP}
        command: consul://${LOCAL_IP}:8500

We also need to create a new .env file, so to keep the docker-compose.yml file tidy without hardcoded variables (lines 23 and 24). In particular, this file defines the LOCAL_IP variable which points to the IP address of TESTING-NODE1 (in this case 10.10.100.11).

user@TESTING-NODE1:/docker/services/consul_server ❯ cat .env
LOCAL_IP=10.10.100.11

In the end, the /docker/services/consul_server/ directory should look like this:

user@TESTING-NODE1 ❯ tree -a /docker/services/consul_server/
/docker/services/consul_server/
├── config
│   └── local.json
├── docker-compose.yml
└── .env

1 directory, 3 files

Finally, we can start the Registrator:

user@TESTING-NODE1:/docker/services/consul_server ❯ docker-compose down
user@TESTING-NODE1:/docker/services/consul_server ❯ docker-compose up -d

Linking Container Images

Now that we have Registrator in place, we can start publishing our services into the Consul catalog.

Applications running on a host communicate only with their local Consul agent, using its HTTP APIs or DNS interface. As suggested by the HashiCorp’s documentation, services on the host are also registered with the local Consul agent, which syncs the information with the Consul servers. Doing the most basic DNS-based service discovery using Consul, an application queries for foo.service.consul and gets a subset of all the hosts providing service “foo”. This allows applications to locate services and balance the load without any intermediate proxies. Several HTTP APIs are also available for applications doing a deeper integration with Consul’s service discovery capabilities, as well as its other features such as the key/value store.

The key to making everything work is to ensure that our containers point to the right address when resolving DNS queries or connecting to Consul’s HTTP API. To do so, when starting new containers, we will have to configure them so that they use the dnsmasq server as their resolver: we can use the --dns switch to force them to use Consul’s DNS server, mapped to port 53. Here’s an example:

user@TESTING-NODE1:/docker/services/consul_server ❯ docker run -d --name=webservice \
                  -e CONSUL_HTTP_ADDR=TESTING-NODE1.node.consul:8500 \
                  -e SERVICE_NAME=webservice \
                  --dns 10.10.100.11 \
                  -P nginx:latest

We can verify the status of the successful registration by querying the logs for the registrator container, and we will see that a new service (webservice) exposing port 80 has been added to the catalogue:

user@TESTING-NODE1:/docker/services/consul_server ❯ docker logs registrator
2018/08/22 20:45:43 Starting registrator v7 ...
2018/08/22 20:45:43 Using consul adapter: consul://10.10.100.11:8500
2018/08/22 20:45:43 Connecting to backend (0/0)
2018/08/22 20:45:43 consul: current leader
2018/08/22 20:45:43 Listening for Docker events ...
2018/08/22 20:45:43 Syncing services on 1 containers
2018/08/22 20:45:43 ignored: 5b75b4bb53f4 no published ports
2018/08/22 20:45:43 ignored: 5b16fec85606 no published ports
2018/08/22 20:45:53 added: 2a4764ce1f91 TESTING-NODE1:webservice:80
The new Webservice Listed in the Catalogue
The new Webservice listed in the Catalogue.

We can also query the service catalog to obtain metedata related to this new service:

user@TESTING-NODE1:/docker/services/consul_server ❯ curl http://TESTING-NODE1.node.consul:8500/v1/catalog/service/webservice?pretty
[
    {
        "ID": "3f66f3d4-1aa0-625c-5d81-a70c64439a0c",
        "Node": "TESTING-NODE1",
        "Address": "10.10.100.11",
        "Datacenter": "dc1",
        "TaggedAddresses": {
            "lan": "10.10.100.11",
            "wan": "10.10.100.11"
        },
        "NodeMeta": {
            "consul-network-segment": ""
        },
        "ServiceKind": "",
        "ServiceID": "TESTING-NODE1:webservice:80",
        "ServiceName": "webservice",
        "ServiceTags": [],
        "ServiceAddress": "",
        "ServiceMeta": {},
        "ServicePort": 32768,
        "ServiceEnableTagOverride": false,
        "ServiceProxyDestination": "",
        "ServiceConnect": {
            "Native": false,
            "Proxy": null
        },
        "CreateIndex": 35,
        "ModifyIndex": 35
    }
]

As we can see, we are using Consul’s DNS resolver for the *.consul domain:

user@TESTING-NODE1:/docker/services/consul_server ❯ dig +short TESTING-NODE1.node.consul
10.10.100.11
user@TESTING-NODE1:/docker/services/consul_server ❯ dig +short webservice.service.consul
10.10.100.11

For anyone wanting to keep things simple, this setup can be considered as complete. You can now add new services by spinning up Docker containers and linking them to the local Consul agent as shown for the webservice example.

For everyone else, we are now moving to a multi node deployment. But, first, let’s remove the container running the webservice:

user@TESTING-NODE1:/docker/services/consul_server ❯ docker stop webservice && docker rm webservice

Multi Node Deployment

In this configuration, we are going to have one host to play the role of the “master” by running dnsmasq and Consul in server mode. All the other hosts will play the role of clients hosting Docker containers (and Consul in client mode).

High-Level Network Diagram: Multi Node Configuration
High-Level Network Diagram: Multi Node Configuration.

In the previous section we already configured TESTING-NODE1 as a master, so this section is going to focus on how to configure TESTING-NODE2 to play the role of a client.

Configuring Consul & Registrator

Even in this case, we need to create our working directories, /docker/data/ and /docker/services/:

user@TESTING-NODE2 ❯ sudo mkdir /docker
user@TESTING-NODE2 ❯ sudo chown user:user /docker/
user@TESTING-NODE2 ❯ mkdir -p /docker/{data,services}
user@TESTING-NODE2 ❯ tree /docker/
/docker/
├── data
└── services

2 directories, 0 files

With the basic folders in place, it’s time to create an additional directory (/docker/services/consul_client/) that will host the specification for the Consul client.

user@TESTING-NODE2 ❯ tree -a /docker/services/consul_client/
/docker/services/consul_client/
├── config
│   └── local.json
├── docker-compose.yml
└── .env

1 directory, 3 files

Here is the content of the docker-compose.yml file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
user@TESTING-NODE2:/docker/services/consul_client ❯ cat docker-compose.yml
version: '2'

services:
    # ------------------------------------------------------------------------------------
    # CONSUL CLIENT
    # ------------------------------------------------------------------------------------
    consul_client:
        container_name: consul_client
        image: consul:latest
        restart: always
        network_mode: "host"
        dns: ${MASTER_IP}
        env_file:
            - .env
        volumes:
            - /docker/data/consul/:/consul/data/
            - ./config/:/consul/config/

    # ------------------------------------------------------------------------------------
    # REGISTRATOR
    # ------------------------------------------------------------------------------------
    registrator:
        container_name: registrator
        image: gliderlabs/registrator:latest
        restart: always
        network_mode: "host"
        env_file:
            - .env
        volumes:
            - /var/run/docker.sock:/tmp/docker.sock
        depends_on:
            - consul_client
        dns: ${MASTER_IP}
        command: consul://${LOCAL_IP}:8500

Overall, this looks pretty similar to the one we created for our master node. The only main difference is in lines 13 and 34: to make DNS resolution available transparently to the container, we need to force the use of TESTING-NODE1 (10.10.100.11) as nameserver. Here we avoid hardcoding the actual IP by using the CONSUL_IP variable defined in the .env file (lines 14-15).

The actual differences are in the configuration provided to the Consul agent (config/local.json):

1
2
3
4
5
6
7
8
user@TESTING-NODE2:/docker/services/consul_client ❯ cat config/local.json
{
    "log_level": "INFO",
    "server": false,
    "client_addr":"0.0.0.0",
    "bind_addr":"10.10.100.12",
    "retry_join": ["TESTING-NODE1.node.consul"]
}
  • Line 4: the server flag is used to control if an agent is in server or client mode. Here we start it in client mode.
  • Line 6: bind_addr specifies the address that should be bound to for internal cluster communications. This is an IP address that should be reachable by all other nodes in the cluster. Here we are using 10.10.100.12 which is the IP of TESTING-NODE2 within the “Production Network”.
  • Line 7: retry_join specifies the external IP of one other agent in the cluster to use to join at startup. Here we are pointing it to our master node (TESTING-NODE1).

As for the master node, we use a .env file to avoid hardcoding IP addresses in the main docker-compose.yml file. Here, MASTER_IP points to the IP address of our master node (10.10.100.11), whereas LOCAL_IP is simply the local IP of our client node (10.10.100.12).

user@TESTING-NODE2:/docker/services/consul_client ❯ cat .env
MASTER_IP=10.10.100.11
LOCAL_IP=10.10.100.12

Starting Consul

We can now start Consul and Registrator with docker-compose, and inspect the container’s logs to verify Consul started properly. From the output below you can see that TESTING-NODE2 started in client mode (Server: false) and then joined TESTING-NODE1.node.consul:

user@TESTING-NODE2:/docker/services/consul_client ❯ docker-compose up -d
user@TESTING-NODE2:/docker/services/consul_client ❯ docker logs consul_client
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v1.2.2'
           Node ID: '80f9fffd-4f7a-3bc5-8485-aab7502941f0'
         Node name: 'TESTING-NODE2'
        Datacenter: 'dc1' (Segment: '')
            Server: false (Bootstrap: false)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 10.10.100.12 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

    2018/08/28 20:12:15 [INFO] serf: EventMemberJoin: TESTING-NODE2 10.10.100.12
    2018/08/28 20:12:15 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
    2018/08/28 20:12:15 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
    2018/08/28 20:12:15 [INFO] agent: Started HTTP server on [::]:8500 (tcp)
    2018/08/28 20:12:15 [INFO] agent: started state syncer
    2018/08/28 20:12:15 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce os packet scaleway softlayer triton vsphere
    2018/08/28 20:12:15 [INFO] agent: Joining LAN cluster...
    2018/08/28 20:12:15 [INFO] agent: (LAN) joining: [TESTING-NODE1.node.consul]
    2018/08/28 20:12:15 [WARN] manager: No servers available
    2018/08/28 20:12:15 [ERR] agent: failed to sync remote state: No known Consul servers
    2018/08/28 20:12:15 [INFO] serf: EventMemberJoin: TESTING-NODE1 10.10.100.11
    2018/08/28 20:12:15 [WARN] memberlist: Refuting a suspect message (from: TESTING-NODE2)
    2018/08/28 20:12:15 [INFO] agent: (LAN) joined: 1 Err: <nil>
    2018/08/28 20:12:15 [INFO] agent: Join LAN completed. Synced with 1 initial agents
    2018/08/28 20:12:15 [INFO] consul: adding server TESTING-NODE1 (Addr: tcp/10.10.100.11:8300) (DC: dc1)
    2018/08/28 20:13:00 [INFO] agent: Synced node info

The Web UI can give us additional confirmation of the correct setup of this new node:

The Nodes Composing the Cluster.
The Nodes Composing the Cluster.

Starting Services

Now that we have a Consul agent running locally, and Registrator ready to pickup newly deployed containers, we are finally ready to start our services.

Here we are going to use a simple nginx container to test our setup, but in the next post of this series we are going to start deploying services useful for penetration testing activities:

user@TESTING-NODE2 ❯ docker run -d --name=webservice2 \
                  -e CONSUL_HTTP_ADDR=TESTING-NODE1.node.consul:8500 \
                  -e SERVICE_NAME=webservice2 \
                  --dns 10.10.100.11 \
                  -P nginx:latest

We can verify the status of the successful registration by querying the logs for the registrator container, and we will see that a new service (webservice2) exposing port 80 has been added to the catalogue:

user@TESTING-NODE2 ❯ docker logs registrator
2018/08/28 20:32:20 Starting registrator v7 ...
2018/08/28 20:32:20 Using consul adapter: consul://10.10.100.12:8500
2018/08/28 20:32:20 Connecting to backend (0/0)
2018/08/28 20:32:20 consul: current leader  10.10.100.11:8300
2018/08/28 20:32:20 Listening for Docker events ...
2018/08/28 20:32:20 Syncing services on 1 containers
2018/08/28 20:32:20 ignored: 56f18e412f36 no published ports
2018/08/28 20:32:20 ignored: 5f1a68b03959 no published ports
2018/08/28 20:33:40 added: a6abc1317900 TESTING-NODE2:webservice2:80
The new Webservice2 listed in the Catalogue
The new Webservice2 listed in the Catalogue.
The new Webservice2 hosted on TESTING-NODE2
The new Webservice2 hosted on TESTING-NODE2.

You can now add new services by spinning up Docker containers, and even new “client” nodes. This setup should fit most purposes, but I think some hardening is needed. Therefore, the next section will focus on a hardened configuration of this setup.


Consul - Hardened Configuration

Running Consul as a Non-Privileged User

This is a quick win, as the official docker image of Consul already uses gosu to run Consul as the non-root user “consul” for improved security.

Configuring Access Control Lists

The following step towards a more robust deployment is a proper configuration of Access Control Lists (ACLs).

Indeed, Consul provides an optional ACL system which can be used to control access to data and APIs. The ACL system is Capability-based, and relies on tokens to which fine grained rules can be applied. It has a similar approach to AWS IAM in many ways.

Tokens and Policies

Tokens are the crucial part in the Consul’s ACL setup. Every token has an ID, name, type, and rule set:

  • The ID is a randomly generated UUID, making it infeasible to guess.
  • The name is opaque to Consul and human readable.
  • The type is either “client” (meaning the token cannot modify ACL rules) or “management” (meaning the token is allowed to perform all actions).
  • The rule set (or policy) control which Consul resources the token has access to.

The token ID is passed along with each RPC request to the servers. If no token is provided, the rules associated with a special, configurable anonymous token are automatically applied.

ACL policies are written in HCL (i.e., the HashiCorp Language), and can be created programmatically via the Consul API or via the WebUI. For now, we just have to remember that Consul provides three types of policies:

  • read: for reading data.
  • write: for reading and writing.
  • deny: to deny reading and writing privileges.

Policies can be defined in either a whitelist or blacklist mode depending on the configuration of the acl_default_policy configuration parameter. By default, Consul will use an “allow” all policy, which, as the name states, will allow all actions. Since this is probably not the best approach, we will set the default policy to “deny” all actions, and then we will use token rules to whitelist only the specific actions we need.

Enabling ACLs on Consul Servers

To enable ACLs, we need to configure all servers in our cluster (in this case, only TESTING-NODE1), by adding a new configuration file, config/master-token.json.

First, we need to create both a “master token” and a “client token”. We can use uuidgen for this:

user@TESTING-NODE1:/docker/services/consul_server ❯ uuidgen
D1A1A4BD-AAA9-4178-B517-5A5664DD7292
user@TESTING-NODE1:/docker/services/consul_server ❯ uuidgen
CD524DDA-1A52-2F09-4FD0-6C90674884A2

We can take this output and populate the content of config/master-token.json:

1
2
3
4
5
6
7
8
user@TESTING-NODE1:/docker/services/consul_server ❯ cat config/master-token.json
{
    "acl_master_token":"D1A1A4BD-AAA9-4178-B517-5A5664DD7292",
    "acl_agent_token": "CD524DDA-1A52-2F09-4FD0-6C90674884A2",
    "acl_datacenter":"dc1",
    "acl_default_policy":"deny",
    "acl_down_policy":"extend-cache"
}
  • Line 3: the acl_master_token holds the first ID we just generated with uuidgen.
  • Line 4: the acl_agent_token holds the second of the IDs just generated.
  • Line 6: we set the default policy to deny, so to block everything and only enable specific actions.
  • Line 7: a down policy of extend-cache means that we will ignore token TTLs during an outage (not much of a use for this post).

With this file in place, we need to restart all Consul Servers to make the ACLs active:

user@TESTING-NODE1:/docker/services/consul_server ❯ docker-compose down
user@TESTING-NODE1:/docker/services/consul_server ❯ docker-compose up -d

If you are monitoring the logs of the Consul server, you will see repeating errors related to missing ACLs starting to appear. These errors are because we haven’t specified an ACL policy (yet) to allow the agent to perform even its most basic internal operations:

user@TESTING-NODE1:/docker/services/consul_server ❯ docker logs consul_server
...
    2018/09/06 19:01:35 [INFO] raft: Node at 10.10.100.11:8300 [Leader] entering Leader state
    2018/09/06 19:01:35 [INFO] consul: cluster leadership acquired
    2018/09/06 19:01:35 [INFO] consul: Created ACL master token from configuration
    2018/09/06 19:01:35 [INFO] consul: ACL bootstrap disabled, existing management tokens found
    2018/09/06 19:01:35 [INFO] consul: New leader elected: TESTING-NODE1
    2018/09/06 19:01:35 [INFO] connect: initialized CA with provider "consul"
    2018/09/06 19:01:35 [INFO] consul: member 'TESTING-NODE1' joined, marking health alive
    2018/09/06 19:01:35 [ERR] agent: failed to sync remote state: ACL not found
    2018/09/06 19:01:36 [ERR] agent: failed to sync remote state: ACL not found
    2018/09/06 19:02:01 [ERR] agent: failed to sync remote state: ACL not found
    2018/09/06 19:02:02 [ERR] agent: Coordinate update error: ACL not found

If you now open the Web UI of the Consul server (http://10.10.100.11:8500/ui/) and click on “ACL” page (top bar), we can obtain an overview of all tokens available in Consul:

Currently Active ACL Tokens
Currently Active ACL Tokens.
  • Anonymous Token: with a type of “client” (as mentioned in the “Tokens and Policies” section above), this tokens allows to see only the consul service, and nothing else. Due to the deny all policy implemented, this tokens can’t be used to create objects in the key/value store either. It will be used when no other token is provided during an HTTP/RPC call.
  • Master Token: of type “management”, it is allowed to perform any action.

As you can see, we do have a “master token” already configured, but we are missing the “client token”. We are now going to provide Consul the “agent token” we created with uuidgen using the ACL API:

1
2
3
4
5
6
7
8
9
10
11
12
13
user@TESTING-NODE1:/docker/services/consul_server ❯ curl \
      --request PUT \
      --header "X-Consul-Token: D1A1A4BD-AAA9-4178-B517-5A5664DD7292" \
      --data \
         '{
         "Name": "Agent Token",
         "Type": "client",
         "Rules": "node \"\" { policy = \"write\" } service \"\" { policy = \"read\" }",
         "ID": "CD524DDA-1A52-2F09-4FD0-6C90674884A2"
         }' http://10.10.100.11:8500/v1/acl/create
{
    "ID": "CD524DDA-1A52-2F09-4FD0-6C90674884A2"
}
  • Line 3: we are passing the “master token” via the X-Consul-Token header, so to authorize this operation.
  • Line 6: we are naming this token as Agent Token.
  • Line 7: the type is client.
  • Line 8: this is the actual policy written in HCL. We are telling Consul to allow write and read (write) operations on resources of type node, and only read operations on services. Beautified, this policy looks like this:
node "" {
    policy = "write"
}
service "" {
    policy = "read"
}
  • Line 9: we are providing also the ID to use, which is the same token we provided through the config/master-token.json file in the acl_agent_token parameter.

If we refresh the “ACL” page on the Web UI we will be able to see the new Agent Token has been generated:

Currently Active ACL Tokens
Currently Active ACL Tokens.

We can verify the server is working properly by double checking its logs, which will show the server started to sync node information successfully (agent: Synced node info):

user@TESTING-NODE1:/docker/services/consul_server ❯ docker logs consul_server
...
    2018/09/06 19:06:41 [INFO] consul: member 'TESTING-NODE1' joined, marking health alive
    2018/09/06 19:06:42 [ERR] agent: failed to sync remote state: ACL not found
    2018/09/06 19:06:45 [ERR] agent: failed to sync remote state: ACL not found
    2018/09/06 19:07:03 [ERR] agent: Coordinate update error: ACL not found
    2018/09/06 19:07:08 [INFO] agent: Synced node info

Now, there is a caveat: Consul currently doesn’t persist the acl_agent_token, and therefore the token must be set every time the agent restarts. There is also an issue open on GitHub to track this behavior. Hopefully HashiCorp will introduce this feature soon. In the meantime, let’s remember to re-perform the curl request above every time we restart the server.

Creating a Registrator Policy

Since we specified a deny all default policy, Consul will now deny Registrator to register new services. Indeed, if we try to start a new container, we will see the following in the Registrator logs:

user@TESTING-NODE1:/docker/services/consul_server ❯ docker logs registrator
...
2018/09/06 19:42:20 register failed: &{TESTING-NODE1:webservice:80 webservice 32775
[] map[] 0 {32775  80 172.17.0.2 tcp e31e351309ee
e31e351309ee11c797d4d395de43f120476089abe2760dd1ccb888eb93cdd3a7  0xc2080ac000}}
Unexpected response code: 403 (Permission denied)

To fix this, we will first have to create an ACL specific to Registrator, and then update its configuration to reflect this new policy. Let’s start by generating a new ID:

user@TESTING-NODE1:/docker/services/consul_server ❯ uuidgen
A9955E0C-8C96-4F60-8974-716B41B4C55B

Now that we have a new ID, we have to introduce the CONSUL_HTTP_TOKEN environment variable with this ID as value, so that Registrator can pick it up at startup. We can simply append this variable to the .env file and then restart our services:

user@TESTING-NODE1:/docker/services/consul_server ❯ cat .env
LOCAL_IP=10.10.100.11
CONSUL_HTTP_TOKEN=A9955E0C-8C96-4F60-8974-716B41B4C55B

user@TESTING-NODE1:/docker/services/consul_server ❯ docker-compose down
user@TESTING-NODE1:/docker/services/consul_server ❯ docker-compose up -d

As mentioned above, Consul doesn’t persist tokens across restarts, so we will first have to recreate the Agent Token (look for the curl command marked by the red pencil icon above). Once recreated, we can proceed by creating the Registrator Token using the ACL API:

1
2
3
4
5
6
7
8
9
10
11
12
13
user@TESTING-NODE1:/docker/services/consul_server ❯ curl \
      --request PUT \
      --header "X-Consul-Token: D1A1A4BD-AAA9-4178-B517-5A5664DD7292" \
      --data \
         '{
         "Name": "Registrator",
         "Type": "client",
         "Rules": "service \"\" { policy = \"write\" }",
         "ID": "A9955E0C-8C96-4F60-8974-716B41B4C55B"
         }' http://10.10.100.11:8500/v1/acl/create
{
    "ID": "A9955E0C-8C96-4F60-8974-716B41B4C55B
}
  • in the Name field you can enter Registrator;
  • as type, select client;
  • in the Policy field, let’s create a rule that will allow Registrator to new services by granting write privileges over service objects:
1
2
3
service "" {
  policy = "write"
}
  • Line 9: as ID we are providing the one we generated moments ago with uuidgen.
Currently Active ACL Tokens
Currently Active ACL Tokens.

If, as a test, we start another nginx instance, we are going to be able to see that the container has been added to the Consul catalogue:

user@TESTING-NODE1:/docker/services/consul_server ❯ docker logs registrator
...
2018/09/06 20:04:10 added: 0399fb89d5a1 TESTING-NODE1:webservice:80

Configuring the Anonymous Policy for DNS Resolving

At this point, ACLs are bootstrapped with the configured “client” and “Registrator” tokens, but there are no other policies set up apart for those two. Even basic operations like DNS resolution will be restricted by the ACL default policy of deny.

user@TESTING-NODE1:/docker/services/consul_server ❯ dig @127.0.0.1 consul.service.consul

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @127.0.0.1 consul.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 57125
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;consul.service.consul.         IN      A

;; AUTHORITY SECTION:
consul.                 0       IN      SOA     ns.consul. hostmaster.consul. 1536325067 3600 600 86400 0

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; MSG SIZE  rcvd: 100

To solve this, we can use the policies associated with the special anonymous token to configure Consul’s behavior when no token is supplied. The anonymous token is managed like any other ACL token, except that the term “anonymous” is used for the ID.

Here we are going to give the anonymous token read privileges for all nodes and for the consul service. Like the others, even this request will need to be re-performed after every restart:

user@TESTING-NODE1:/docker/services/consul_server ❯ curl \
    --request PUT \
    --header "X-Consul-Token: D1A1A4BD-AAA9-4178-B517-5A5664DD7292" \
    --data \
        '{
        "ID": "anonymous",
        "Type": "client",
        "Rules": "node \"\" { policy = \"read\" } service \"consul\" { policy = \"read\" }"
        }' http://127.0.0.1:8500/v1/acl/update
{
    "ID": "anonymous"
}

The anonymous token is implicitly used if no token is supplied, so now we can run DNS lookups without supplying a token (also because there is no way to pass a token as part of a DNS request):

user@TESTING-NODE1:/docker/services/consul_server ❯ dig @127.0.0.1 consul.service.consul
; <<>> DiG 9.10.3-P4-Ubuntu <<>> @127.0.0.1 consul.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5238
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;consul.service.consul.         IN      A

;; ANSWER SECTION:
consul.service.consul.  0       IN      A       10.10.100.11

;; ADDITIONAL SECTION:
consul.service.consul.  0       IN      TXT     "consul-network-segment="

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; MSG SIZE  rcvd: 102

Persisting Tokens

As I was mentioning in the section above, at the moment Consul can’t persist tokens across restarts (see the related issue on GitHub), hence we are forced to manually re-perform the curl requests needed to reconfigure the “agent”, “registrator”, and “anonymous” tokens every time we need to restart our containers.

Since I am still building the infrastructure, I find myself having to restart the services frequently, as I incrementally add new features. To avoid wasting time by copy-pasting those curl requests over and over, I created a quick bash script that will first start the containers, and then recreate the tokens we need:

user@TESTING-NODE1:/docker/services/consul_server ❯ cat run.sh
# Start containers
docker-compose down
docker-compose up -d

# Wait for them to be responsive
until echo | nc 127.0.0.1 8500 > /dev/null; do
  echo "Waiting for Docker to start..."
  sleep 1
done

# Recreate tokens
echo "[+] Recreating Agent Token..."
curl \
    --request PUT \
    --header "X-Consul-Token: D1A1A4BD-AAA9-4178-B517-5A5664DD7292" \
    --data \
        '{
        "Name": "Agent Token",
        "Type": "client",
        "Rules": "node \"\" { policy = \"write\" } service \"\" { policy = \"read\" }",
        "ID": "CD524DDA-1A52-2F09-4FD0-6C90674884A2"
        }' http://10.10.100.11:8500/v1/acl/create

echo "[+] Recreating Registrator Token..."
curl \
    --request PUT \
    --header "X-Consul-Token: D1A1A4BD-AAA9-4178-B517-5A5664DD7292" \
    --data \
        '{
        "Name": "Registrator",
        "Type": "client",
        "Rules": "service \"\" { policy = \"read\" }",
        "ID": "A9955E0C-8C96-4F60-8974-716B41B4C55B"
        }' http://10.10.100.11:8500/v1/acl/create

echo "[+] Recreating Anonymous Token..."
curl \
    --request PUT \
    --header "X-Consul-Token: D1A1A4BD-AAA9-4178-B517-5A5664DD7292" \
    --data \
        '{
        "ID": "anonymous",
        "Type": "client",
        "Rules": "node \"\" { policy = \"read\" } service \"consul\" { policy = \"read\" }"
        }' http://127.0.0.1:8500/v1/acl/update

Note that this is just a temporary solution while I’m still in dev mode, as I hardcoded all the tokens in this script. Now, when we have to restart our services we can simply hit ./run.sh.

Enabling ACLs on Consul Clients

We enabled ACLs on our server, then we enabled Configurator and the anonymous token. The last thing left to properly configure ACLs in Consul is to setup our clients (in this case, only TESTING-NODE2).

Since ACL enforcement also occurs on the Consul clients, we need to also restart them with a configuration file (config/master-token.json) that enables ACLs:

user@TESTING-NODE2:/docker/services/consul_client ❯ cat config/master-token.json
{
    "acl_agent_token": "CD524DDA-1A52-2F09-4FD0-6C90674884A2",
    "acl_datacenter":"dc1",
    "acl_down_policy":"extend-cache"
}

We used the same ACL agent token that we created for the servers, which will work since it was not specific to any node or set of service prefixes. In a more locked-down environment it is recommended that each client get an ACL agent token with node write privileges for just its own node name prefix, and service read privileges for just the service prefixes expected to be registered on that client.

Then, we have to specify the CONSUL_HTTP_TOKEN for the Registrator to work even on this host:

user@TESTING-NODE2:/docker/services/consul_client ❯ cat .env
MASTER_IP=10.10.100.11
LOCAL_IP=10.10.100.12
CONSUL_HTTP_TOKEN=A9955E0C-8C96-4F60-8974-716B41B4C55B

That’s all setup, after a restart this host will join the cluster and start working as expected.

user@TESTING-NODE2:/docker/services/consul_client ❯ docker-compose down
user@TESTING-NODE2:/docker/services/consul_client ❯ docker-compose up -d

Enabling Gossip Encryption

The Consul agent supports encrypting all of its network traffic. There are two separate encryption systems, one for gossip traffic and one for RPC.

Enabling gossip encryption only requires to set an encryption key when starting the Consul agent, via the encrypt parameter. This parameter specifies the secret key (in the form of a 16 Base64-encoded bytes) to use for encryption of Consul internal network traffic. We are therefore going to expand our configuration by adding a new file (config/encrypt.json) containing this encryption key.

All nodes within a cluster must share the same encryption key to communicate, so the same file should be placed on all hosts running Consul that will be part of the same cluster. The provided key is then automatically persisted to the data directory and loaded automatically whenever the agent is restarted.

The easiest way to create an encryption key is to use the keygen command provided by the Consul container:

user@TESTING-NODE1:/docker/services/consul_server ❯ docker run --rm --entrypoint consul consul:latest keygen
HM2a16fQ/u78/z7EPODJ/A==

We use the output of this command and place it in the config/encrypt.json file on both our nodes:

user@TESTING-NODE1:/docker/services/consul_server ❯ cat config/encrypt.json
{
    "encrypt": "HM2a16fQ/u78/z7EPODJ/A=="
}

user@TESTING-NODE2:/docker/services/consul_client ❯ cat config/encrypt.json
{
    "encrypt": "HM2a16fQ/u78/z7EPODJ/A=="
}

If we now restart our services, we will see that Consul enabled encryption for gossip traffic (Gossip: true on line 13 below):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
user@TESTING-NODE1:/docker/services/consul_server ❯ docker-compose down
user@TESTING-NODE1:/docker/services/consul_server ❯ docker-compose up -d
user@TESTING-NODE1:/docker/services/consul_server ❯ docker logs consul_server
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v1.2.2'
           Node ID: 'd7d261e1-2324-c10c-21ee-c02bc100b493'
         Node name: 'TESTING-NODE1'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: true)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 10.10.100.11 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

    2018/09/04 13:21:56 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:d7d261e1-2324-c10c-21ee-c02bc100b493 Address:10.10.100.11:8300}]
    2018/09/02 13:21:56 [INFO] serf: EventMemberJoin: TESTING-NODE1.dc1 10.10.100.11
    2018/09/02 13:21:56 [INFO] raft: Node at 10.10.100.11:8300 [Follower] entering Follower state (Leader: "")
    2018/09/02 13:21:56 [INFO] serf: EventMemberJoin: TESTING-NODE1 10.10.100.11
    2018/09/02 13:21:56 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
    2018/09/02 13:21:56 [INFO] consul: Adding LAN server TESTING-NODE1 (Addr: tcp/10.10.100.11:8300) (DC: dc1)
    2018/09/02 13:21:56 [INFO] consul: Handled member-join event for server "TESTING-NODE1.dc1" in area "wan"
    2018/09/02 13:21:56 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
    2018/09/02 13:21:56 [INFO] agent: Started HTTP server on [::]:8500 (tcp)
    2018/09/02 13:21:56 [INFO] agent: started state syncer
    2018/09/02 13:21:56 [WARN] raft: Heartbeat timeout from "" reached, starting election
    2018/09/02 13:21:56 [INFO] raft: Node at 10.10.100.11:8300 [Candidate] entering Candidate state in term 2
    2018/09/02 13:21:56 [INFO] raft: Election won. Tally: 1
    2018/09/02 13:21:56 [INFO] raft: Node at 10.10.100.11:8300 [Leader] entering Leader state
    2018/09/02 13:21:56 [INFO] consul: cluster leadership acquired
    2018/09/02 13:21:56 [INFO] consul: New leader elected: TESTING-NODE1
    2018/09/02 13:21:56 [INFO] connect: initialized CA with provider "consul"
    2018/09/02 13:21:56 [INFO] consul: member 'TESTING-NODE1' joined, marking health alive

Enabling RPC Encryption with TLS

Enabling RPC Encryption is, instead, a little bit more convoluted, and not as straightforward as gossip encryption. Here we are first going to go through the steps outlined in the “Creating Certificates” tutorial from the HashiCorp documentation to generate suitable certificates, and then we are going to update the configuration of Consul to enable encryption for both incoming and outgoing traffic.

One disclaimer before starting: I tried multiple alternative paths, but there doesn’t seem to be a way to make Registrator work when TLS is enabled. A pull request that should fix the issues has been open on GitHub since June 2017, but it still hasn’t been merged. Anyway, I will show you how to enable TLS encryption on Consul (for anyone who don’t need Registrator).

Creating a Certificate Authority

Before creating our certificates, we need to install Cloudflare’s PKI and TLS toolkit (cfssl):

user@TESTING-NODE1:~/ ❯ go get -u github.com/cloudflare/cfssl/cmd/cfssl
user@TESTING-NODE1:~/ ❯ go get -u github.com/cloudflare/cfssl/cmd/cfssljson
user@TESTING-NODE1:~/ ❯ mkdir ~/certs && cd ~/certs

Then, we can start our process:

  • Generate a default CSR:
user@TESTING-NODE1:~/certs ❯ cfssl print-defaults csr > ca-csr.json
user@TESTING-NODE1:~/certs ❯ cat ca-csr.json
{
    "CN": "example.net",
    "hosts": [
        "example.net",
        "www.example.net"
    ],
    "key": {
        "algo": "ecdsa",
        "size": 256
    },
    "names": [
        {
            "C": "US",
            "ST": "CA",
            "L": "San Francisco"
        }
    ]
}
  • Change the common name (CN) and hosts field to use node.consul, and the key field to use RSA with a size of 2048:
user@TESTING-NODE1:~/certs ❯ cat ca-csr.json
{
    "CN": "node.consul",
    "hosts": [
        "node.consul"
    ],
    "key": {
        "algo": "rsa",
        "size": 2048
    },
    "names": [
        {
            "C": "US",
            "ST": "CA",
            "L": "San Francisco"
        }
    ]
}
  • Increase the default certificate expiration time used by cfssl by creating a custom configuration file (cfssl.json) and tweaking the expiry field:
user@TESTING-NODE1:~/certs ❯ cat cfssl.json
{
    "signing": {
        "default": {
            "expiry": "87600h",
            "usages": [
                "signing",
                "key encipherment",
                "server auth",
                "client auth"
            ]
        }
    }
}
  • Generate the CA’s private key and certificate with cfssl:
user@TESTING-NODE1:~/certs ❯ cfssl gencert -initca ca-csr.json | cfssljson -bare consul-ca
2018/09/08 12:28:27 [INFO] generating a new CA key and certificate from CSR
2018/09/08 12:28:27 [INFO] generate received request
2018/09/08 12:28:27 [INFO] received CSR
2018/09/08 12:28:27 [INFO] generating key: rsa-2048
2018/09/08 12:28:28 [INFO] encoded CSR
2018/09/08 12:28:28 [INFO] signed certificate with serial number 258419411275665911315272384625033946569285048288

user@TESTING-NODE1:~/certs ❯ ls -l
total 16
-rw-rw-r-- 1 user user  286 Sep 08 12:27 ca-csr.json
-rw-r--r-- 1 user user 1041 Sep 08 12:28 consul-ca.csr
-rw------- 1 user user 1675 Sep 08 12:28 consul-ca-key.pem
-rw-rw-r-- 1 user user 1233 Sep 08 12:28 consul-ca.pem

The CA key (consul-ca-key.pem) will be used to sign certificates for Consul nodes and must be kept private. The CA certificate (consul-ca.pem) contains the public key necessary to validate Consul certificates and therefore must be distributed to every node that requires access.

Generating Node Certificates

With a CA certificate and key ready, the next step involves the generation of the certificates that will be used by Consul:

# Generate a certificate for the Consul server
user@TESTING-NODE1:~/certs ❯ echo '{"key":{"algo":"rsa","size":2048}}' | cfssl gencert -ca=consul-ca.pem -ca-key=consul-ca-key.pem -config=cfssl.json -hostname="TESTING-NODE1.node.consul,localhost,127.0.0.1" - | cfssljson -bare server
2018/09/07 12:30:40 [INFO] generate received request
2018/09/07 12:30:40 [INFO] received CSR
2018/09/07 12:30:40 [INFO] generating key: rsa-2048
2018/09/07 12:30:40 [INFO] encoded CSR
2018/09/07 12:30:40 [INFO] signed certificate with serial number 308446887048565225486255344383253799433198211874

# Generate a certificate for the Consul client
user@TESTING-NODE1:~/certs ❯ echo '{"key":{"algo":"rsa","size":2048}}' | cfssl gencert -ca=consul-ca.pem -ca-key=consul-ca-key.pem -config=cfssl.json -hostname="TESTING-NODE2.node.consul,localhost,127.0.0.1" - | cfssljson -bare client
2018/09/07 12:31:26 [INFO] generate received request
2018/09/07 12:31:26 [INFO] received CSR
2018/09/07 12:31:26 [INFO] generating key: rsa-2048
2018/09/07 12:31:26 [INFO] encoded CSR
2018/09/07 12:31:26 [INFO] signed certificate with serial number 206649046776008381367782454303368151432046180141

# Generate a certificate for the CLI
user@TESTING-NODE1:~/certs ❯ echo '{"key":{"algo":"rsa","size":2048}}' | cfssl gencert -ca=consul-ca.pem -ca-key=consul-ca-key.pem -profile=client - | cfssljson -bare cli
2018/09/07 12:31:46 [INFO] generate received request
2018/09/07 12:31:46 [INFO] received CSR
2018/09/07 12:31:46 [INFO] generating key: rsa-2048
2018/09/07 12:31:47 [INFO] encoded CSR
2018/09/07 12:31:47 [INFO] signed certificate with serial number 85981427291350592492813472784964121470809784860
2018/09/07 12:31:47 [WARNING] This certificate lacks a "hosts" field. This makes it unsuitable for
websites. For more information see the Baseline Requirements for the Issuance and Management
of Publicly-Trusted Certificates, v.1.1.6, from the CA/Browser Forum (https://cabforum.org);
specifically, section 10.2.3 ("Information Requirements").

You should now have the following files:

user@TESTING-NODE1:~/certs ❯ ls -l
total 56
-rw-rw-r-- 1 user user  286 Sep 08 12:27 ca-csr.json
-rw-rw-r-- 1 user user  161 Sep 08 12:30 cfssl.json             # cfssl configuration
-rw-r--r-- 1 user user  863 Sep 08 12:32 cli.csr                # CLI certificate signing request
-rw-r--r-- 1 user user  960 Sep 08 12:32 client.csr             # Client node certificate signing request
-rw------- 1 user user 1679 Sep 08 12:32 client-key.pem         # Client node private key
-rw-rw-r-- 1 user user 1298 Sep 08 12:32 client.pem             # Client node public certificate
-rw------- 1 user user 1679 Sep 08 12:32 cli-key.pem            # CLI private key
-rw-rw-r-- 1 user user 1216 Sep 08 12:32 cli.pem                # CLI certificate
-rw-r--r-- 1 user user 1041 Sep 08 12:28 consul-ca.csr          # CA signing request
-rw------- 1 user user 1675 Sep 08 12:28 consul-ca-key.pem      # CA private key
-rw-rw-r-- 1 user user 1233 Sep 08 12:38 consul-ca.pem          # CA public certificate
-rw-r--r-- 1 user user  960 Sep 08 12:32 server.csr             # Server node certificate signing request
-rw------- 1 user user 1679 Sep 08 12:32 server-key.pem         # Server node private key
-rw-rw-r-- 1 user user 1298 Sep 08 12:32 server.pem             # Server node public certificate

Configuring Servers

Each Consul node should have the appropriate key (-key.pem) and certificate (.pem) file for its purpose. In addition each node needs the CA’s public certificate (consul-ca.pem). Let’s move the required files into our working directory (/docker/services/consul_server/config/ssl)

user@TESTING-NODE1:/docker/services/consul_server ❯ mkdir ./config/ssl/
user@TESTING-NODE1:/docker/services/consul_server ❯ cp ~/certs/server-key.pem /docker/services/consul_server/config/ssl/
user@TESTING-NODE1:/docker/services/consul_server ❯ cp ~/certs/server.pem /docker/services/consul_server/config/ssl/
user@TESTING-NODE1:/docker/services/consul_server ❯ cp ~/certs/consul-ca.pem /docker/services/consul_server/config/ssl/
user@TESTING-NODE1:/docker/services/consul_server ❯ sudo chown user:user /docker/services/consul_server/config/ssl/*.pem

With the certificates in place, it’s time to update our configuration file (local.json) once again:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
user@TESTING-NODE1:/docker/services/consul_server ❯ cat config/local.json
{
    "log_level": "INFO",
    "server": true,
    "ui": true,
    "bootstrap": true,
    "client_addr":"0.0.0.0",
    "bind_addr":"10.10.100.11",
    "disable_update_check": true,
    "ca_file": "/consul/config/ssl/consul-ca.pem",
    "cert_file": "/consul/config/ssl/server.pem",
    "key_file": "/consul/config/ssl/server-key.pem",
    "verify_outgoing": true,
    "verify_incoming": true,
    "ports": {
        "http": 8501,
        "https": 8500
    }
}
  • Line 10: the ca_file provides a file path to a certificate authority, used to check the authenticity of client and server connections.
  • Line 11: the cert_file provides a file path to a certificate, which is provided to clients or servers to verify the agent’s authenticity.
  • Line 12: the key_file provides a the file path to a private key, used to verify the agent’s authenticity.
  • Line 13: when verify_outgoing is set to true, Consul requires that all outgoing connections make use of TLS and that the server provides a certificate that is signed by a Certificate Authority from the ca_file.
  • Line 14: when verify_incoming is set to true, Consul requires that all incoming connections make use of TLS and that the client provides a certificate signed by a Certificate Authority from the ca_file.
  • Line 15: with the configuration we previously had, port 8500 was used for plaintext HTTP. Here we assign the HTTP listener to a different port number (8501), so that we can configure the default port 8500 to be HTTPS.

Since we modified the default ports Consul is listening on, we now have to update also the run.sh script to bootstrap our ACLs. We will use plaintext HTTP just for this, as we require DNS for our certificates to be considered as valid. We are therefore going to point our curl requests to port 8501.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
user@TESTING-NODE1:/docker/services/consul_server ❯ cat run.sh
# Start containers
docker-compose down
docker-compose up -d

# Wait for them to be responsive
until echo | nc 127.0.0.1 8501 > /dev/null; do
  echo "Waiting for Consul to start..."
  sleep 1
done

# Recreate tokens
echo "[+] Recreating Agent Token..."
curl \
    --request PUT \
    --header "X-Consul-Token: D1A1A4BD-AAA9-4178-B517-5A5664DD7292" \
    --data \
        '{
        "Name": "Agent Token",
        "Type": "client",
        "Rules": "node \"\" { policy = \"write\" } service \"\" { policy = \"read\" }",
        "ID": "CD524DDA-1A52-2F09-4FD0-6C90674884A2"
        }' http://10.10.100.11:8501/v1/acl/create

echo "[+] Recreating Registrator Token..."
curl \
    --request PUT \
    --header "X-Consul-Token: D1A1A4BD-AAA9-4178-B517-5A5664DD7292" \
    --data \
        '{
        "Name": "Registrator",
        "Type": "client",
        "Rules": "service \"\" { policy = \"write\" }",
        "ID": "A9955E0C-8C96-4F60-8974-716B41B4C55B"
        }' http://10.10.100.11:8501/v1/acl/create

echo "[+] Recreating Anonymous Token..."
curl \
    --request PUT \
    --header "X-Consul-Token: D1A1A4BD-AAA9-4178-B517-5A5664DD7292" \
    --data \
        '{
        "ID": "anonymous",
        "Type": "client",
        "Rules": "node \"\" { policy = \"read\" } service \"consul\" { policy = \"read\" }"
        }' http://127.0.0.1:8501/v1/acl/update

If we now restart Consul, we can notice that HTTPS is on port 8500 (line 10) and TLS, both outgoing and incoming, is enabled (line 12):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
user@TESTING-NODE1:/docker/services/consul_server ❯ ./run.sh
user@TESTING-NODE1:/docker/services/consul_server ❯ docker logs consul_server
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v1.2.2'
           Node ID: 'cbca3127-d0a1-982f-bdf9-cf434c9845a9'
         Node name: 'TESTING-NODE1'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: true)
       Client Addr: [0.0.0.0] (HTTP: 8501, HTTPS: 8500, DNS: 8600)
      Cluster Addr: 10.10.100.11 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: true, TLS-Outgoing: true, TLS-Incoming: true

==> Log data will now stream in as it occurs:

    2018/09/08 17:13:12 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:cbca3127-d0a1-982f-bdf9-cf434c9845a9 Address:10.10.100.11:8300}]
    2018/09/08 17:13:12 [INFO] serf: EventMemberJoin: TESTING-NODE1.dc1 10.10.100.11
    2018/09/08 17:13:12 [INFO] serf: EventMemberJoin: TESTING-NODE1 10.10.100.11
    2018/09/08 17:13:12 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
    2018/09/08 17:13:12 [INFO] consul: Adding LAN server TESTING-NODE1 (Addr: tcp/10.10.100.11:8300) (DC: dc1)
    2018/09/08 17:13:12 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
    2018/09/08 17:13:12 [INFO] agent: Started HTTP server on [::]:8501 (tcp)
    2018/09/08 17:13:12 [INFO] agent: Started HTTPS server on [::]:8500 (tcp)
    2018/09/08 17:13:12 [INFO] agent: started state syncer
    2018/09/08 17:13:12 [INFO] raft: Node at 10.10.100.11:8300 [Follower] entering Follower state (Leader: "")
    2018/09/08 17:13:12 [INFO] consul: Handled member-join event for server "TESTING-NODE1.dc1" in area "wan"
    2018/09/08 17:13:12 [WARN] raft: Heartbeat timeout from "" reached, starting election
    2018/09/08 17:13:12 [INFO] raft: Node at 10.10.100.11:8300 [Candidate] entering Candidate state in term 2
    2018/09/08 17:13:12 [INFO] raft: Election won. Tally: 1
    2018/09/08 17:13:12 [INFO] raft: Node at 10.10.100.11:8300 [Leader] entering Leader state
    2018/09/08 17:13:12 [INFO] consul: cluster leadership acquired
    2018/09/08 17:13:12 [INFO] consul: New leader elected: TESTING-NODE1
    2018/09/08 17:13:12 [INFO] consul: Created ACL master token from configuration
    2018/09/08 17:13:12 [INFO] consul: ACL bootstrap disabled, existing management tokens found
    2018/09/08 17:13:12 [INFO] connect: initialized CA with provider "consul"
    2018/09/08 17:13:12 [INFO] consul:  member 'TESTING-NODE1' joined, marking health alive

We can also validate the certificate is setup properly with a quick curl request:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
user@TESTING-NODE1:/docker/services/consul_server ❯ curl -v -L --key /docker/services/consul_server/config/ssl/server-key.pem --cert /docker/services/consul_server/config/ssl/server.pem --cacert /docker/services/consul_server/config/ssl/consul-ca.pem https://TESTING-NODE1.node.consul:8500/v1/catalog/datacenters
*   Trying 10.10.100.11...
* Connected to TESTING-NODE1.node.consul (10.10.100.11) port 8500 (#0)
* found 1 certificates in /docker/services/consul_server/config/ssl/consul-ca.pem
* found 601 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
*        server certificate verification OK
*        server certificate status verification SKIPPED
*        common name:  (matched)
*        server certificate expiration date OK
*        server certificate activation date OK
*        certificate public key: RSA
*        certificate version: #3
*        subject:
*        start date: Fri, 07 Sep 2018 17:26:00 GMT
*        expire date: Mon, 04 Sep 2028 17:26:00 GMT
*        issuer: C=US,ST=CA,L=San Francisco,CN=node.consul
*        compression: NULL
> GET /v1/catalog/datacenters HTTP/1.1
> Host: TESTING-NODE1.node.consul:8500
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Vary: Accept-Encoding
< Date: Sat, 08 Sep 2018 18:41:44 GMT
< Content-Length: 14
<
[
    "dc1"
]
* Connection #0 to host TESTING-NODE1.node.consul left intact

Configuring Clients

Configuring clients (i.e., TESTING-NODE2) to support TLS consists in copying over the relevant certificates, and then update config/local.json:

  • Move the required files into our working directory (/docker/services/consul_client/config/ssl)
user@TESTING-NODE2:/docker/services/consul_client ❯ mkdir ./config/ssl/
user@TESTING-NODE2:/docker/services/consul_client ❯ cp ~/certs/client-key.pem /docker/services/consul_client/config/ssl/
user@TESTING-NODE2:/docker/services/consul_client ❯ cp ~/certs/client.pem /docker/services/consul_client/config/ssl/
user@TESTING-NODE2:/docker/services/consul_client ❯ cp ~/certs/consul-ca.pem /docker/services/consul_client/config/ssl/
user@TESTING-NODE2:/docker/services/consul_client ❯ sudo chown user:user /docker/services/consul_client/config/ssl/*.pem
  • Update the configuration file (local.json): like for the server, we are extending the configuration by adding paths to the certificates and new default ports (lines 8-15):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
user@TESTING-NODE2:/docker/services/consul_client ❯ cat config/local.json
{
    "log_level": "INFO",
    "server": false,
    "client_addr":"0.0.0.0",
    "bind_addr":"10.10.100.12",
    "retry_join": ["TESTING-NODE1.node.consul"],
    "ca_file": "/consul/config/ssl/consul-ca.pem",
    "cert_file": "/consul/config/ssl/client.pem",
    "key_file": "/consul/config/ssl/client-key.pem",
    "verify_outgoing": true,
    "verify_incoming": true,
    "ports": {
        "http": 8501,
        "https": 8500
    }
}

We now just have to restart Consul to have TLS encryption enabled.

Enabling TLS for Registrator

This is where issues started to arise. Although I tried multiple different paths, there doesn’t seem to be a way to make Registrator work when TLS is enabled. Here you can find my attempts, for future reference.

Ideally, the TLS setup for Registrator would be pretty straightforward: we have to configure 3 environment variables which are pointing to the location of the certificates, then we have to mount the directory containing these certificates so that Registrator has access to them, and, finally, we have to use the consul-tls:// endpoint when starting Registrator.

Attempt 1: latest doesn’t support consul-tls://

With the modifications above, the Registrator section of the docker-compose.yml file would look like this (notice lines 12-15, 18, and 22):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
...
    # ------------------------------------------------------------------------------------
    # REGISTRATOR
    # ------------------------------------------------------------------------------------
    registrator:
        container_name: registrator
        image: gliderlabs/registrator:latest
        restart: always
        network_mode: "host"
        env_file:
            - .env
        environment:
            - CONSUL_CACERT=/consul/config/ssl/consul-ca.pem
            - CONSUL_TLSCERT=/consul/config/ssl/server.pem
            - CONSUL_TLSKEY=/consul/config/ssl/server-key.pem
        volumes:
            - /var/run/docker.sock:/tmp/docker.sock
            - ./config/ssl/:/consul/config/ssl/:ro
        depends_on:
            - consul_server
        dns: ${LOCAL_IP}
        command: consul-tls://TESTING-NODE1.node.consul:8500

Indeed, the gliderlabs/registrator:latest image doesn’t support the consul-tls:// endpoint.

1
2
3
user@TESTING-NODE1:/docker/services/consul_server ❯ docker logs registrator
    Starting registrator v7 ...
    unrecognized adapter: consul-tls://TESTING-NODE1.node.consul:8500

Attempt 2: master throws a runtime error

So, latest doesn’t support the TLS endpoints. A quick Google search pointed me to this GitHub issue:

This functionality is currently only available via master, not with the latest v7 release.

I then switched to gliderlabs/registrator:master, again with no luck:

1
2
3
4
5
6
7
8
9
10
11
12
user@TESTING-NODE1:/docker/services/consul_server ❯ docker logs registrator
    Starting registrator v7 ...
    Using consul-tls adapter: consul-tls://TESTING-NODE1.node.consul:8500
        panic: runtime error: invalid memory address or nil pointer dereference
        [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x56494b642879]
        goroutine 1 [running]:
        github.com/gliderlabs/registrator/consul.(*Factory).New(0x55f98d516d30, 0xc420010680, 0x4, 0xc42003d6e0)
                /go/src/github.com/gliderlabs/registrator/consul/consul.go:52 +0x469
        github.com/gliderlabs/registrator/bridge.New(0xc420134090, 0x7ffdea768e7d, 0x2b, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
                /go/src/github.com/gliderlabs/registrator/bridge/bridge.go:43 +0x265
        main.main()
                /go/src/github.com/gliderlabs/registrator/registrator.go:105 +0x45e

Attempt 3: dev can’t validate the certificate

Another search revealed that a pull request that should fix these issues has been open on GitHub since June 2017, but it still hasn’t been merged.

In the meantime, the workaround suggested is to manually build the dev branch of Registrator:

1
2
3
4
user@TESTING-NODE1:~/ ❯ git clone https://github.com/gliderlabs/registrator.git
user@TESTING-NODE1:~/ ❯ cd registrator/
user@TESTING-NODE1:~/registrator ❯ git checkout vendor
user@TESTING-NODE1:~/registrator ❯ docker build -f Dockerfile.dev -t registrator:dev .

We can then use the newly built image as a base for a custom Dockerfile (which we can call Dockerfile-registrator):

1
2
3
4
5
6
user@TESTING-NODE1:/docker/services/consul_server ❯ cat Dockerfile-registrator
FROM registrator:dev
RUN apk update && apk add ca-certificates && rm -rf /var/cache/apk/*
COPY ./config/ssl/consul_ca.pem /usr/local/share/ca-certificates/consul_ca.pem
COPY ./config/ssl/server.pem /usr/local/share/ca-certificates/server.pem
RUN update-ca-certificates

Finally, we need to update the docker-compose.yml file so to build the container from Dockerfile-registrator (lines 7-9):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
...
    # ------------------------------------------------------------------------------------
    # REGISTRATOR
    # ------------------------------------------------------------------------------------
    registrator:
        container_name: registrator
        build:
            context: .
            dockerfile: Dockerfile-registrator
        restart: always
        network_mode: "host"
        env_file:
            - .env
        environment:
            - CONSUL_CACERT=/consul/config/ssl/consul-ca.pem
            - CONSUL_TLSCERT=/consul/config/ssl/server.pem
            - CONSUL_TLSKEY=/consul/config/ssl/server-key.pem
        volumes:
            - /var/run/docker.sock:/tmp/docker.sock
            - ./config/ssl/:/consul/config/ssl/:ro
        depends_on:
            - consul_server
        dns: ${LOCAL_IP}
        command: /bin/registrator consul-tls://TESTING-NODE1.node.consul:8500

But, again, I didn’t have much luck, as Registrator is failing to validate the certificate:

user@TESTING-NODE1:/docker/services/consul_server ❯ docker logs registrator
2018/09/09 18:53:15 Starting registrator dev ...
2018/09/09 18:53:15 Using consul-tls adapter: consul-tls://TESTING-NODE1.node.consul:8500
2018/09/09 18:53:15 Connecting to backend (0/0)
2018/09/09 18:53:16 Get https://TESTING-NODE1.node.consul:8500/v1/status/leader: remote error: tls: bad certificate

At the time being, I think the best option is to wait for pull request 568 to get merged. Until that, I will run Consul with no TLS encryption.


Conclusion

This post was Part 1 of the “Offensive Infrastructure with Modern Technologies” series, where I hope I gave a sufficiently thorough introduction to Consul and on how to configure it in both single and multi node deployments. We started with a simple configuration, and then we moved to a more secure setup with support for ACLs and encryption.

In Part 2 we will continue paving the way for our offensive infrastructure, by focusing on a step-by-step walkthrough that will allow you to automatically deploy the full HashiCorp stack with Ansible.

I hope you found this post useful/interesting, and I’m keen to get feedback on it. Therefore, if you find the information shared in this series is useful, or if something is missing, or if you have ideas on how to improve it, please leave a comment in the area below, or let me know on Twitter.