Tuning Ansible For Maximum Performance

Published on Author gryzli

I hate stuff running slow and I love Ansible. 

In this guide, I will share how achieved more than 50x speedup for executing simple ansible playbooks (which are meant to be fast , but they was not ).

 

Measuring Ansible Tasks Execution Time

Before optimizing whatever it is, we should have some good baseline information about our current situation and also be able to measure how timings are changing after certain changes we’ve made during the time. 

I’ve recently found a treasure about measuring Ansible tasks execution time, which is called callback plugins. 

In order to measure our timings, we will need to put the following line in the “[defaults]” section of our  ansible.cfg file 

[defaults]

# Enable timing information
callback_whitelist = timer, profile_tasks

After adding this line, you will start seeing timing information for your task execution. 

Here is an example look from one node_exporter playbook: 

Ansible playbook timings

 

Ansible Performance Bottlenecks

As we already have the tools for measure timing/performance, we now have to start optimizing it. 

Two are the most common performance killers in Ansible: 

1) SSH 

2) Facts gathering

and here is how to fix them. 

 

Optimizing Ansible SSH Performance 

 

Enable SSH Multiplexing

This could give you some huge performance benefit, especially if you are executing big number of tasks or executing on a big number of hosts, or both. 

The idea behind SSH Multiplexing is that once a ssh connection is made to a host, the connection will stay in background for a given period of time. Whenever you need to re-execute something on the same host, the connection will be reused (if you try using the same user/host/port pair). 

By default Ansible is using ControlPersist=60, which means each connection will stay alive (in the background) for 60 seconds at most. For me this is too little and I prefer the connections to stay alive for multiple hours. 

Add the following inside the [ssh_connection] section of your ansible.cfg config file: 

[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=18000 -o PreferredAuthentications=publickey
control_path = %(directory)s/ansible-ssh-%%h-%%p-%%r

CAUTION

When using SSH Multiplexing with longer ControlPersist time, there is a potential trouble, if you sleep your notebook/pc and wake it again with existing MUX connections. By doing this, your connections will be broken, but still staying as Persistent, which will break your ssh connectivity  to the muxed hosts. 

For fixing it, you will have to kill the SSH [mux] containing processes, could be done with something like that:

ps faux  | grep ssh | grep "\[mux\]"  | awk '{print $2}' | xargs kill 

 

Optimizing SSH PreferredAuthentications

Another thing that has effect on speed is the Authentication methods SSH will try to use while connecting. 

If you use only public keys for ssh connection with the desired hosts, you could add the following additional option “PreferredAuthentications=publickey”  to your ssh_args (ansible.cfg -> [ssh_connection])

Now your config line should look like this:

# Adding PreferredAuthentications=publickey to the ssh_args line
ssh_args = -o ControlMaster=auto -o ControlPersist=18000 -o PreferredAuthentications=publickey
 
Enabling Pipelining 

Another serious gain in the speed could be achieved by enabling Pipelining. 

Enabling  happens by adding the following option in ansible.cfg [ssh_connection] section: 

[ssh_connection]
pipelining = True

 

 

 

Optimizing Facts Gathering Process

By default Ansible will try to gather as much as possible facts for each host it connects to. This is pretty heavy operation and most of the time you don’t need most of the facts it will gather for you. 

This problem has multiple partial and full solutions including: 

  • Disable facts gathering
  • Enable only partial facts gathering
  • Use facts caching

I will go through the different solutions and their performance contribution. 

 

Fully disable facts gathering

That’s the most ‘brute-force’ solution, which should give you huge speed boost, but you won’t be able to rely on any facts for the hosts. 

- hosts: whatever
  gather_facts: no

 

Enable only certain facts to be gathered

This is the solution I’m using in most of my playbooks. It is a trade off between having no facts at all, and having just what you need. 

The fact gathering process could be tuned in both ansible.cfg (which will affect all playbooks) or inside the playbook. 

You can fine-tune the fact gathering inside a playbook with something like this: 

(This will gather only the “minimal” set of facts). 

- hosts: all
  gather_facts: False
  pre_tasks:
   - setup:
      gather_subset:
       - '!all'
  roles:
   - some_role_here

If you need only network and virtual facts groups (+ the minimal facts set), you can try something like this: 

- hosts: all
  gather_facts: False
  pre_tasks:
   - setup:
      gather_subset:
       - '!all'
       - '!any'
       - 'network'
       - 'virtual' 
  roles:
   - some_role_here

 

You can also control the fact gathering process by global configuration in your ansible.cfg file.

For example let’s exclude hardware (these are some of the slowest ones) facts from being gather globally by default. 

Add the following to your [defaults]  ansible.cfg section: 

[defaults]
gather_subset=!hardware

 

Enable facts caching mechanism 

If you still need some of the facts groups, but at the same time the gathering process is still slow for you, you could try use fact caching.

Caching enables Ansible to cache the facts for a given host in some kind of backend. 

Currently the caching plugin supports the following cache backend:

  • memcache
  • redis
  • yaml
  • json 
  • mongodb
  • memory

More information on the caching plugin, could be found here: 

https://docs.ansible.com/ansible/latest/plugins/cache.html

 

This is an example configuration of facts caching in json files

[defaults]
gathering = smart
fact_caching_connection = /tmp/facts_cache
fact_caching = jsonfile


# The timeout is defined in seconds 
# This is 2 hours 
fact_caching_timeout = 7200

 

 

Some additional optimizations 

Playing with the Fork parameter

By default Ansible has “fork” set to 5. Which means that in any given point of time, Ansible could run as much as 5 parallel executions. 

If you are executing a playbook on more than 5 hosts, this means you will actually execute on a portions of 5 hosts in parallel. This can easily become a bottleneck when working with tens, hundreds or thousands of hosts. 

Luckily this option could be manually controlled from your ansible.cfg file and looks like this: 

[defaults]
forks = 50

By adding the line above, you will increase the forks to 50. 

Keep in mind that playing with forks has its cost. By using more forks, you will be able to speedup the process of playing, but in the same time this will require more CPU on your Ansible Host machine. So..tune with care. 

 

Some Additional Resources

Print All Variables in Ansible

Dump All Variables In Ansible

Ansible CheatSheet

How To Use Ansible For Centos 5 / RHEL 5

One Response to Tuning Ansible For Maximum Performance