Bees in a Beehive

In a previous post, I showed how I wrote a provisioning system for servers on Vultr. In this post, I’m going to expand upon that framework, adding support for Firewalls, Docker, a VPN system and everything needed to create a small and secure infrastructure for personal projects. Two servers will be provisioned, one as a web server running a docker daemon with only ports 80 and 443 exposed, and a second that establishes a VPN to connect securely to the docker daemon on the web server.

Vultr API Recap

In my first iteration of Bee2, I was using the vultr.rb ruby gem for API calls. I discovered and fixed a small bug but then ran into issues with Vultr’s API rate limitations. Instead of limiting requests per minute, Vultr apparently has a rate limit of two requests per second1. This normally isn’t a problem, except when creating the DNS records on a low latency Internet connection. I couldn’t really recover from rate limits with the current sets of v() and vv() vultr.rb function wrappers I used previously, so I decided to remove the vultr.rb dependency and implement the actual API requests myself using the following.

  def request(method, path, args = {}, ok_lambda = nil, error_code = nil, err_lambda = nil)
    uri = URI.parse("https://api.vultr.com/v1/#{path}")
    https = Net::HTTP.new(uri.host, uri.port)
    https.use_ssl = true

    req = case method
    when 'POST'
      r = Net::HTTP::Post.new(uri.path, initheader = {'API-Key' => @api_key })
      r.set_form_data(args)
      r
    when 'GET'
      path = "#{uri.path}?".concat(args.collect { |k,v| "#{k}=#{CGI::escape(v.to_s)}" }.join('&'))
      Net::HTTP::Get.new(path, initheader = {'API-Key' => @api_key })
    end

    res = https.request(req)

    case res.code.to_i
      when 503
        @log.warn('Rate Limit Reached. Waiting...')
        sleep(2)
        request(method, path, args, ok_lambda, error_code, err_lambda)
      when 200
        if not ok_lambda.nil?
          ok_lambda.()
        else
          if res.body == ''
            ''
          else
            JSON.parse(res.body)
          end
        end
      else
        if not error_code.nil? and res.code.to_i == error_code
          err_lambda.()
        else
          @log.fatal('Error Executing Vultr Command. Aborting...')
          @log.fatal("#{res.code} :: #{res.body}")
          exit(2)
        end
    end
  end

The new request() takes a path, arguments, and optional functions to call in case of success or failure with a specific failure code. Within the function it checks for 503 errors (rate limits) and pauses for two seconds before retrying them. The previous calls to vultr.rb looked like the following with the vv() and v() wrappers:

vv(Vultr::RevervedIP.attach({'ip_address' => ip, 'attach_SUBID' => subid}), 412, -> {
  @log.info('IP Attached')
}, -> {
  @log.warn('Unable to attach IP. Rebooting VM')
  v(Vultr::Server.reboot({'SUBID' => subid}))
})

They can now be replaced with request() like so:

request('POST', 'reservedip/attach', {'ip_address' => ip, 'attach_SUBID' => subid}, -> {
  @log.info('IP Attached')
}, 412, ->{
  @log.warn('Unable to attach IP. Rebooting VM')
  request('POST', 'server/reboot', {'SUBID' => subid})
})

Infrastructure Overview

Simple Personal Application Infrastructure
Simple Personal Application Infrastructure

In my setup, I plan to have one Ubuntu web application server which only has ports 80 and 443 exposed. It will be running a docker daemon that I can connect to security over the private/VPN network. OpenVPN will be running on a FreeBSD node which will only have SSH and OpenVPN ports exposed. Most of this configuration is achieved using Ansible playbooks. I’ve moved the configuration file into the conf directory, which is included in the .gitignore file. This directory will also contain both the OpenVPN keys/certs and Docker keys/certs generated via the Ansible playbooks. An example configuration can be found in examples/settings.yml.

Provisioning, rebuilding and configuring the servers can be run using the following:


# [-r] deletes servers if they exist
./bee2 -c conf/settings.yml -p -r

# The first time DNS is configured, you may need to wait
# before running this as Ansible uses the public DNS names
./bee2 -c conf/settings.yml -a public

The first time server configuration is run using -a, it must be run against the public inventory which uses the server’s internet facing IP addresses. The Ansible roles create a VPN server, but also establish all the firewalls. Therefore, subsequently Ansible provisioning requires a running OpenVPN client before running bee2 with the -a private argument.

OpenVPN

OpenVPN is established via an Ansible role that generates the server Certificate Authority (CA), the server key and cert pair, and the client cert and key. The role has only been tested on FreeBSD and can be found in ansible/roles/vpn. It can be configured in the bee2 settings file using the following:

openvpn:
    hosts:
      gateway: 192.168.150.20
    server:
      subnet: 10.10.12.0
      routes:
        - 192.168.150.0 255.255.255.0
      netmask: 255.255.255.0
      cipher: AES-256-CBC
    clients:
      laptop: type: host

The gateway should be the private address of the machine in the server section of the configuration file which has the freebsd-vpn.yaml playbook. Bee2 will push a route for the VPN subnet to all the other servers listed under the servers section of the configuration. Client keys will be copied locally to conf/openvpn-clients.

There is an example openvpn client configuration located at examples/openvpn.conf. The setup for OpenVPN can vary per Linux distribution. I recommend installing OpenVPN using your package manager (apt-get, yum, zypper, emerge, etc.) Many distributions support multiple OpenVPN clients, although you may have to create new systemd targets or symbolic links within the init system. Typically, configuration files go in /etc/openvpn along with the keys and certificates found in conf/opvn-clients after the bee2 Ansible provisioners have run.


# Establish the OpenVPN server and Firewalls

./bee2 -c conf/settings.yml -a public

# Copy keys, certificates and config

sudo cp conf/openvpn-clients/* /etc/openvpn/
sudo cp examples/openvpn.conf /etc/openvpn/openvpn.conf

# edit configuration for your setup
$EDITOR /etc/openvpn/openvpn.conf

# Start the service (systemd)

systemctl start openvpn.server

# or, start the server on sysv/openrc

/etc/init.d/openvpn start

Once OpenVPN has started, attempt to ping the private addresses or DNS names of your services. If that fails, check the OpenVPN logs to diagnose any potential issues.

Firewall

Firewalls are one of the last pieces to be configured in the Ansible scripts. The Ubuntu firewall uses the ufw Ansible module and is fairly straightforward.


- name: Enable ufw
  ufw: state=enabled policy=allow

- name: Allow ssh internally
  ufw: rule=allow port=22 direction=in proto=tcp interface={{ private_eth }}

- name: Allow Docker internally
  ufw: rule=allow port=2376 direction=in proto=tcp interface={{ private_eth }}

- name: 80 is open
  ufw: rule=allow port=80 proto=tcp

- name: 443 is open
  ufw: rule=allow port=443 proto=tcp

- name: Disable default in
  ufw: direction=incoming policy=deny
  async: 0
  poll: 10
  ignore_errors: true

The final task that is run disables incoming connections, therefore it’s run in async mode and ignores errors, essentially turning this into a fire and forget task. Ansible preforms each task in its own SSH connection, so the firewall role needs to be the very last role that is run as any subsequent tasks will fail.

FreeBSD has several firewall options. For the Ansible role in Bee2, I decided to go with use pf and configure the firewall via the /etc/pf.conf configuration file. Special thanks to Tom Trebick on ServerFault in debugging my firewall configuration. The following is the pf.conf.j2 Ansible template.


# {{ ansible_managed }}

block all

# allow all from host itself
pass out inet all keep state
pass out inet6 all keep state

# allow all from private
pass in quick on {{ private_eth }} inet from any to any keep state

# openvpn
pass in quick proto udp to vtnet0 port openvpn keep state
pass in quick on tun0 inet from any to any keep state

# ssh
pass in quick proto tcp to vtnet0 port ssh flags S/SA keep state

This firewall configuration allows all traffic on the private network, all OpenVPN traffic (via the tun0 adapter), all outgoing traffic over IPv4/IPv6 and incoming traffic for only SSH and OpenVPN.

I attempted to use the same Ansible options to start the firewall on FreeBSD which I had on my Ubuntu/Docker VM.

- name: Enable Firewall Service
  service: name=pf state=started enabled=yes
  # perform this without waiting for the response because PF will drop the
  # SSH connection if its service is not running
  async: 0
  poll: 10
  ignore_errors: true

Unfortunately, even with ignore_errors and async set, I’d still get hangs and timeouts on this particular task the first time it’s run. I eventually added the following to an ansible.cfg located in the base of the Bee2 project:

[ssh_connection]
ssh_args = -o ServerAliveInterval=10

The first time the Firewall tasks are run, Ansible will still report an error, but the timeout occurs within a reasonable amount of time and the rest of the Ansible tasks continue. Subsequent runs will run fine without any errors, but will require running against the private Ansible inventory with an OpenVPN connection established.

TASK [firewall : Enable Firewall Service] *********************************************************************
fatal: [bastion.sumdami.net]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Timeout, server bastion.sumdami.net not responding.\r\n", "unreachable": true}

SSH Keys

When using -r to rebuild a machine, new VMs will have different SSH host keys. To avoid warning messages on rebuilds, delete_server() calls the following function to remove both the IP and hostname keys from the ~/.ssh/known_hosts file:

private def remove_ssh_key(host_or_ip)
  @log.info("Removing SSH Key for #{host_or_ip}")
  Process.fork do
    exec('ssh-keygen', '-R', host_or_ip)
  end
  Process.wait
end

When configuring a server for the first time, Bee2 runs the ssh-hostkey-check.yml Ansible playbook. Based on a post by mazac on ServerFault, it automatically adds new SSH keys without prompting, but will return an error if a key exists and is incorrect.


---
- name: accept ssh fingerprint automatically for the first time
  hosts: all
  connection: local
  gather_facts: False

  tasks:
    - name: Check if known_hosts contains server's fingerprint
      command: ssh-keygen -F {{ inventory_hostname }}
      register: keygen
      failed_when: keygen.stderr != ''
      changed_when: False

    - name: Fetch remote SSH key
      command: ssh-keyscan -T5 {{ inventory_hostname }}
      register: keyscan
      failed_when: keyscan.rc != 0 or keyscan.stdout == ''
      changed_when: False
      when: keygen.rc == 1

    - name: Add ssh-key to local known_hosts
      lineinfile:
        name: ~/.ssh/known_hosts
        create: yes
        line: "{{ item }}"
      when: keygen.rc == 1
      with_items: '{{ keyscan.stdout_lines|default([]) }}'

This implementation avoids having to disable StrictHostKeyChecking in SSH, preserving host key verification. In theory, a man in the middle attack could still occur between the provisioning and the configuration phases, although it’s unlikely. For truly paranoid or security conscious individuals, you can connect to the VMs via the Vultr HTTP console and verify the SSH host key fingerprints are correct.

Passwords

By default, the Vultr auto-generates a password for new servers, which are retrievable via their API. For security purposes, we should replace those password with generated ones we save and encrypted via a PGP key. If a security section with a valid PGP id is added to the configuration YAML file, the root-password Ansible role will use pwgen to create a new password, set it as the root user on the VM, and encrypt that password and keep in in ~/.password-store/bee2 so it can be accessible using the pass command.

security:
  pgp_id: ADFGTE59
$pass
Password Store
└── bee2
    ├── web1
    └── vpn

Remote Access to Docker

The Docker client typically talks to the Docker daemon via a local UNIX socket in most default installations. Connecting remotely to docker requires creating a local CA, keys and signing certificates, similar to configuring OpenVPN clients. The official Docker documentation has an excellent guide for protecting the Docker daemon socket2, which I converted into an Ansible role. The generated client keys and certs are placed in conf/docker.

Docker can listen to both remote and local sockets, but all the configuration must be in one place. You cannot mix command line arguments and configuration from the JSON file. The -H fd:// switch needs to be removed from the systemd target file using the following set of Ansible tasks:

- name: Disable Local Socket Access for Docker
  lineinfile:
    dest: /lib/systemd/system/docker.service
    regexp: '^ExecStart=.*\$DOCKER_OPTS'
    line: "ExecStart=/usr/bin/dockerd $DOCKER_OPTS"
  register: systemd_target_update

- name: Reload Systemd targets
  command: systemctl daemon-reload
  when: systemd_target_update.changed

Note that you cannot simply restart the Docker service after modifying the systemd target. systemctl daemon-reload must be called whenever the target files are modified or else the changes will not be picked up. This is another systemd gotcha that doesn’t appear in sane initialization systems.

The Ansible role also configures DOCKER_OPTS in /etc/default/docker to use a JSON configuration file like so:


DOCKER_OPTS="--config-file {{ docker_daemon_conf }}"

The JSON configuration file specifies the TLS keys and certificates, and allows access both locally and from the private network accessible via OpenVPN.


{
	"tls": true,
	"tlsverify": true,
	"tlscacert": "{{ docker_ca }}",
	"tlscert": "{{ server_crt }}",
	"tlskey": "{{ server_key }}",
	"hosts": ["127.0.0.1:2376", "{{ private_ip }}:2376"]
}

After the Ansible playbooks have been run, all the files for remote Docker authentication should be in the conf/docker directory. Once the OpenVPN client, from the previous section, has been configured correctly, you should be able to connect to the Docker daemon using the following command:

docker --tlsverify --tlscacert=conf/docker/ca.crt  --tlscert=conf/docker/docker-client.crt  --tlskey=conf/docker/docker-client.pem  -H=web1.example.net:2376 version

Conclusions

In this iteration of the Bee2 project, I’ve expanded the Ansible configuration to setup an OpenVPN server, establish server firewalls and configure Docker to be remotely accessible. I now have everything I need to securely run Docker applications over my VPN connection onto a publicly facing web server. The specific version of Bee2 used in this article has been tagged as pd-infra-blogpost, and the most current version of Bee2 can be found on Github. Future enhancements will include running Docker applications, as well as backing up and restoring application data.