No one enjoys changing hosting providers. I haven’t had to often, but when I have, it involved manual configuration and copying files. As I’m looking to deploy some new projects, I’m attempting to automate the provisioning process, using hosting providers with Application Programming Interfaces (APIs) to automatically create virtual machines and run Ansible playbooks on those machines. My first attempt involved installing DC/OS on DigitalOcean which met with mixed results.

In this post, I’ll be examining Bee2, a simple framework I built in Ruby. Although the framework is designed to be expandable to different providers, initially I’ll be implementing a provisioner for Vultr, a new hosting provider that seems to be competing directly with DigitalOcean. While their prices and flexibility seem better than DigitalOcean’s, their APIs are a mess of missing functions, poll/waiting and interesting bugs.

Writing a Provisioning System

When working on the open source project BigSense, I created an environment configuration tool called vSense that setup the appropriate Vagrant files to be used both in development (using VirtualBox as the provider) and for production (using libvirt/KVM as the provider). Vagrant isn’t really intended for production provisioning. While newer versions of Vagrant remove the shared insecure SSH keys in the provisioning process, for vSense I had Ansible tasks that would ensure default keys were removed and new root passwords were auto-generated and encrypted.

Terraform is another open source tool from the makers of Vagrant. On its surface, it seems like a utility designed to provision servers on a variety of hosting companies. It supports quite a few providers, but the only Vultr plugin available at the time of writing is the terraform-provider-vultr by rgl. The plugin is unmaintained, but there are several forks, at least one of which is attempting to make it into the official tree¹.

Rather than wrestle with an in-development Terraform plugin, I instead decided to write Bee2, my own Ruby provisioning scripts using an unofficial Vultr Ruby API by Tolbkni. Based on some previous attempts on writing a provisioning system, I attempted to keep everything as modular as possible so I could extend Bee2 to be used with other hosting providers in the future. Once servers have been provisioned, it can also run Ansible playbooks to apply a configuration for each individual machine.

Vultr API Oddities

I ran into a couple of issues with the Vultr API, which I attempted to work around as best as I could. There seem to be a lot of missing properties and poorly engineered API combinations required for basic system configuration function. In this post, I’ll examine the following issues:

The SUBID, used to uniquely identify all Vultr resources (except for those that it doesn’t, such as SSHKEY)
Vultr allows the provisioning of permanent static IPv4 addresses that can be attached and detached to servers, but for IPv6, it only reserves an entire /64 subnet and assigns a seemingly random IP address from that subnet upon attaching to a server.
Reserved IPv4 addresses can be setup when creating a server, but IPv6 addresses must be attached after a server is created with IPv6 enabled.
Enabling IPv6 support on a server assigns it an auto-generated IPv6 address that cannot be removed via the API.
Occasionally, attaching an IPv6 address requires a server reboot.
Private IPs are automatically generated, but they are not auto-configured on the server itself. They are essentially a totally worthless part of the API.
Duplicate SSH keys can be created (neither names nor keys seem to be unique).

Bee2: The Framework

I started with a basic Vultr provisioning class, with a provision method that completes all the basic provisioning tasks. In the following example, we see all the clearly labeled steps needed to provision a basic infrastructure: installing SSH keys, reserving and saving static IP addresses, deleting servers (if doing a full rebuild), creating servers, updating DNS records and writing an inventory file for Ansible.

class VultrProvisioner
  ...
  def provision(rebuild = false)
    ensure_ssh_keys
    reserve_ips
    populate_ips
    if rebuild
      @log.info('Rebuilding Servers')
      delete_servers
    end
    ensure_servers
    update_dns
    cleanup_dns
    write_inventory
  end
  ...
end

The unofficial Vultr Ruby library I’m using is a very thin wrapper around the Vultr REST API. All of the Ruby library’s functions return a hash with :status and :result keys that contain the HTTP status code and JSON return payload respectively. There is a spelling mistake, as the Ruby library has a RevervedIP function for the Vultr ReservedIP call. The API key is global instead of a class variable, and all the functions are static, meaning only one Vultr account/API token can be used at a time.

Overall, the library seems simple enough that I probably should have just implemented it myself. Instead, I created two wrapper functions to use around all Vultr:: calls. The first, v(cmd) will either return the :result, or bail out and exit if the :status is anything other than 200. The second function, vv(cmd, error_code, ok_lambda, err_lambda), will either run the ok_lambda function or run the err_lambda if the specified error_code is returned. v() and vv() can be chained together to deal with creating resources and avoiding duplicate resources.

private def v(cmd)
  if cmd[:status] != 200
    @log.fatal('Error Executing Vultr Command. Aborting...')
    @log.fatal(cmd)
    exit(2)
  else
    return cmd[:result]
  end
end

private def vv(cmd, error_code, ok_lambda, err_lambda)
  case cmd[:status]
  when error_code
    err_lambda.()
  when 200
    ok_lambda.()
  else
    @log.fatal('Error Executing Vultr Command. Aborting...')
    @log.fatal(cmd)
    exit(2)
  end
end

In addition, many of the API calls are asynchronous and return immediately. Commands requiring resources to be available will not block and wait, but outright fail. Therefore we need a wait function to poll, and ensure previous commands have been completed successfully. The following function is fairly robust, and can poll to ensure a certain field is set to a specific value or wait for a certain value to change/not be present.

def wait_server(server, field, field_value, field_state = true)
  while true
    current_servers = v(Vultr::Server.list).map { |k,v|
      if v['label'] == server
        if (field_state and v[field] != field_value) or (!field_state and v[field] == field_value)
          verb = field_state ? 'have' : 'change from'
          @log.info("Waiting on #{server} to #{verb} #{field} #{field_value}. Current state: #{v[field]}")
          sleep(5)
        else
          @log.info("Complete. Server: #{server} / #{field} => #{field_value}")
          return true
        end
      end
    }
  end
end

Configuration

Configuration is done using a single YAML file. For now, the only provisioner supported is Vultr and it takes an API token, a region code, a state file (which will be generated if it doesn’t exist) and SSH keys (which do need to exist; they will not be auto-generated).

The inventory section indicates the names of the files which will be created for and used by Ansible for configuration management. One contains the publicly accessible IP addresses and the other containing private IP addresses. The public inventory will be used to bootstrap the configuration process, establishing an OpenVPN server and setting up firewall rules to block off SSH ports on the public IP addresses. Once a VPN connection is established, further provisioning can be done via the private inventory.

Each server in the servers section requires a numerical plan ID and os ID. A list can be retrieved using the Vultr::Plans.list and Vultr::OS.list respectively. An IPv4 address and a /64 IPv6 subnet will be reserved and assigned to each server. DNS records will be automatically created for both the public and private IP addresses in their respective sections. Additionally, any DNS entries listed in web will have A/AAAA records created for both the domain name and the www subdomain for its respective base record.

Finally, a playbook can be specified for configuration management via Ansible. All of the playbooks should exist in the ansible sub directory.

provisioner:
  type: vultr
  token: InsertValidAPIKeyHere
  region: LAX
  state-file: vultr-state.yml
  ssh_key:
    public: vultr-key.pub
    private: vultr-key
inventory:
  public: vultr.pub.inv
  private: vultr.pri.inv
servers:
  web1:
    plan: 202 # 2048 MB RAM,40 GB SSD,2.00 TB BW
    os: 241 # Ubuntu 17.04 x64
    private_ip: 192.168.150.10
    dns:
      public:
        - web1.example.com
      private:
        - web1.example.net
      web:
        - penguindreams.org
        - khanism.org
    playbook: ubuntu-playbook.yml
  vpn:
    plan: 201 # 1024 MB RAM,25 GB SSD,1.00 TB BW
    os: 230 # FreeBSD 11 x64
    private_ip: 192.168.150.20
    dns:
      public:
        - vpn.example.com
      private:
        - vpn.example.net
    playbook: freebsd-playbook.yml

The Vultr Provisioner

Within the Vultr API, everything has a SUBID. These are unique identifiers for servers, reserved IP subnets, block storage, backups and pretty much everything except SSH keys. Often times the API requires a SUBID to attach one resource to another, sometimes requiring additional lookups. Some functions with the Vultr API have duplicate checking and will error out when trying to create duplicate resources. Other parts of the API require you to iterate over current resources to prevent creating duplicates. Some functions validate that all parameters have acceptable values, while others will fail silently.

SSH Keys

There is no uniqueness checking within the Vultr API for SSH keys. You can create multiple keys with the same name, the same key or the same combination of the two. Within the Bee2 framework, I use the name as the unique identifier. SSH_KEY_ID is a constant defined to be b2-provisioner. The following function ensures this key is only created once.

def ensure_ssh_keys
  key_list = v(Vultr::SSHKey.list).find { |k,v| v['name'] == SSH_KEY_ID }
  if key_list.nil? or not key_list.any?
    @log.info("Adding SSH Key #{SSH_KEY_ID}")
    @state['ssh_key_id'] = v(Vultr::SSHKey.create({'name' => SSH_KEY_ID, 'ssh_key' =>@ssh_key}))['SSHKEYID']
    save_state
  end
end

Private IPs

When creating a server instance, one of the things the Vultr API returns, if private networking is enabled, is a private IP address. I was puzzled as to why you couldn’t specify your own private IP address in the create method, until I realized this address is not actually assigned to your VM. It’s simply a randomly generated IP address within a private subnet that is a suggestion. The official API documentation indicates this addresss still has to be assigned manually to the internal network adapter. Originally I had the following to save the generated private IP addresses:

# Save auto-generated private IP addresses
  v(Vultr::Server.list).each { |k,v|
    if v['label'] == server
      @state['servers'][server]['private_ip'] = {}
      @state['servers'][server]['private_ip']['addr'] = v['internal_ip']
      @log.info("#{server}'s private IP is #{v['internal_ip']}'")
    end
  }
  save_state

I removed this code and instead decided to specify the private IP addresses and subnet in the settings.yml. It makes sense for the API to allow private networking to be enabled, which provides a second virtual network adapter inside the VM. However, randomly generating a private IP address seems worthless, and moves something that should happen in the provisiong phase down into a configuration management layer.

Private IP via Ansible Configuration Management

For Private IPs in Bee2, I’ve created an Ansible role to support IP assignment for both Ubuntu and FreeBSD.

---
  - set_fact: private_ip="{{ servers[ansible_hostname].private_ip }}"
  - block:
      - set_fact: private_eth=ens7
      - include: ubuntu.yml
    when: ansible_distribution in [ 'Debian', 'Ubuntu' ]
  - block:
      - set_fact: private_eth=vtnet1
      - include: freebsd.yml
    when: ansible_distribution == 'FreeBSD'

For Ubuntu, we rely on /etc/network/interface to configure the private interface. We’re relying on the fact that Vultr always creates the private interface as ens7, defined in the facts above.

---
  - blockinfile:
      path: /etc/network/interfaces
      block: |
        auto {{ private_eth }}
        iface {{ private_eth }} inet static
          address {{ private_ip }}
          netmask 255.255.255.0
    notify: restart networking

On FreeBSD, network adapters are setup in /etc/rc.conf and Vultr always assigns the private adater as vtnet1.

---
  - name: Setup Private Network
    lineinfile: >
      dest=/etc/rc.conf state=present regexp='^ifconfig_{{ private_eth }}.*'
      line='ifconfig_{{ private_eth }}="inet {{ private_ip }} netmask 255.255.255.0"'
    notify: restart netif

We’ll need playbooks that reference this private-net Ansible role. Ubuntu 17 only comes with Python3 by default and FreeBSD places the Python interpreter within /usr/local/, so we need to configure the interpreter for both operating systems. For Ubuntu machines, we’ll create ubuntu-playbook.yml which is referenced in the configuration file.

---
- hosts: all
  vars:
    ansible_python_interpreter: /usr/bin/python3
  vars_files:
    - ../{{ config_file }}
  roles:
   - private-net

The following is the freebsd-playbook.yml for our FreeBSD instance:

---
- hosts: all
  vars_files:
    - ../{{ config_file }}
  vars:
    - ansible_python_interpreter: /usr/local/bin/python
  roles:
    - private-net

IPv6

The server/create function allows for attaching a reserved IPv4 address to a virtual machine via the reserved_ip_v4 parameter. However there is no reserved_ip_v6 parameter. When creating the machine, the enable_ipv6 parameter must be set to yes (not true, as I discovered the hard way since the Vultr API doesn’t validate this parameter and will not return an error) and a random IPv6 address will then be assigned to the machine. I contacted support and learned this address cannot be deleted from the machine via the API. Furthermore when attaching the reserved ipv6 subnet, the Vultr API will assign an entire IPv6 /64 subnet to the instance and assign it a random IP address within that space.

# Attach our Reserved /Public IPv6 Address
ip = @state['servers'][server]['ipv6']['subnet']
@log.info("Attaching #{ip} to #{server}")
vv(Vultr::RevervedIP.attach({'ip_address' => ip, 'attach_SUBID' => subid}), 412, -> {
  @log.info('IP Attached')
}, -> {
  @log.warn('Unable to attach IP. Rebooting VM')
  v(Vultr::Server.reboot({'SUBID' => subid}))
})

This means that every time a machine is rebuilt, it will have a different IPv6 address (although Bee2 will update the DNS records with that new address). I understand that assigning an entire /64 to a host is common practice for IPv6, and allows for several IPv6 features to work correctly. However, it’d be convenient if the Vultr API could also provide guarantees for the final static reserved /128 address which is given to the server.

One possible workaround is to have the lower part of the IPv6 address placed in the settings.yml file, have Vultr assign the subnet and then have Ansible replace the auto assigned /128 address Vultr gives the server. This would ensure rebuilding servers would always get the same IPv6 address (although it would not match up with the IP shown in the Vultr web interface). For now, Bee2 simply lets Vultr assign an IPv6 address from the reserved subnet and updates the DNS record. Those running Bee2 on IPv6 connections may have to flush their DNS cache or wait for older records to expire before running configuration management or SSHing to the remote servers.

Finally, when attaching a reserved IPv6 subnet to a machine, Vultr occasionally will return a 412, indicating that the machine must be rebooted. As shown in the previous code sample, this can be done via the API using the server/reboot function.

{:status=>412, :result=>"Unable to attach IP: Unable to attach subnet, please restart your server from the control panel"}

Deleting/Rebuilding Machines

Deleting a machine with a reserved IPv4 address doesn’t immediately release the IP address. The following function deletes all the servers we’ve defined in the configuration file, and then waits for existing reserved IP addresses to detach from current VMs. Without the wait loop, a rebuild would immediately fail with an error message indicating the address referenced in reserved_ip_v4 is still in use.

def delete_servers
  current_servers = v(Vultr::Server.list).map { |k,v| v['label'] }
  delete_servers = @state['servers'].keys.reject { |server| not current_servers.include? server }
  delete_servers.each { |server|
    @log.info("Deleting #{server}")
    v(Vultr::Server.destroy('SUBID' => @state['servers'][server]['SUBID']))
    while v(Vultr::RevervedIP.list).find { |k,v| v['label'] == server }.last['attached_SUBID']
      @log.info("Waiting on Reserved IP to Detach from #{server}")
      sleep(5)
    end
  }
end

Another issue with developing with the Vultr API is that virtual machines cannot be deleted for five minutes after they’ve been created. Developing against the Vultr API can therefore become very time consuming, with lots of waiting around when developing anything involving server/create and server/destroy.

{:status=>412, :result=>"Unable to destroy server: Servers cannot be destroyed within 5 minutes of being created"}

Putting it All Together

Using Bee2 is pretty straight forward. The command line arguments require a configuration file, and then allow for provisioning (-p) servers. Combining -p and -r will rebuild servers, destroying the existing servers if they exist. Finally, -a will run Ansible against either the public or private inventory IP addresses.

Usage: bee2 [-v] [-h|--help] [-c <config>] [-p [-r]]
    -c, --config CONFIG              Configuration File
    -p, --provision                  Provision Servers
    -v, --verbose                    Debug Logging Output Enabled
    -r, --rebuild                    Destroy and Rebuild Servers During Provisioning
    -a, --ansible INVENTORY          Run Ansible on Inventory (public|private)
    -h, --help                       Show this message

The provisioning, rebuilding and configuration management tasks can all be combined into a single command.

./bee2 -c settings.yml -p -r -a public

Conclusions

Overall, the Vultr API is usable, but it definitely has some design issues that can result in frustration. There were a few moments where I wasn’t sure if I had discovered some bugs. However, most of the issues I encountered either involved my own code, or not waiting for a service to be in the correct state before calling another action. The Vultr support staff were mostly helpful and responsive during the weekdays and standard business hours, with requests made on the weekend often having to wait until Monday.

Although I was able to successfully write a Bee2 provisioner for the Vultr API, it did require quite a bit of work. Their current API does show signs of underlying technical debt. I’m curious if there are underlying issues with their current platform that have resulted in some of their design decisions when it comes to their API. This is only the first version of their API, so hopefully we’ll see some improvements in future versions that will streamline some of the more complicated parts of the service provisining process.

This concludes our basic Vultr provisioner for Bee2. The specific version of Bee2 used in this article has been tagged as pd-vultr-blogpost, and the most current version of Bee2 can be found on Github. Future posts will include further work with Ansible and Docker, establishing an OpenVPN for our private network, securing the VMs and using docker to run various services.

Vultr Provider Issue #2611. Hashicorb. Github. Retrieved 5 July, 2017. ↩

Technology

Bee2: Wrestling with the Vultr API