Managing Juju in Production - Table of Contents


#1

ToC around articles that are useful for managing Juju controllers in production

Some of these are question/posts that would be good to turn into more formal tutorials/docs going forward but at least let’s cover the material for the moment.





Topics to add

  • Suggestions for sizing of Juju controllers
  • Preparing for failure - best practices for keeping controllers up
  • Best practices for user management

Please add additional requests for information that’d be good to have added/formalized.


Fantastic Settings and Where to Find Them
#2

Sizing for Production Controllers

For bare metal HA controllers, the following setup works for me :slight_smile:
64GB ram, 500GB SSD, 2 x 10 core cpu, 10G Networking

I’ve had long standing controllers where the specs were far less and worked fine. It’s also slightly about planning and what you expect your usage to look like over the course of a few years. Concerning ^ controllers, I’m setting myself up for what I expect to be able to handle the load now and 5 years down the line.

On another side of our datacenter I have a juju controller that has a 1T spinning disk, 192GB ram, 2 x 6core cpu. It houses 100+ machines deployed over 20 models, ~10 reasonably active users. It’s been up for over a year. Here are the stats:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             95G     0   95G   0% /dev
tmpfs            19G  1.9G   17G  11% /run
/dev/sda1       917G   18G  853G   3% /
tmpfs            95G     0   95G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            95G     0   95G   0% /sys/fs/cgroup
tmpfs            19G     0   19G   0% /run/user/1000

$ uptime
 21:48:20 up 96 days, 18:36,  1 user,  load average: 0.72, 0.58, 0.60

Nice to see its barely breaking a sweat!


#3

#4

I should also note that just last week we upgraded our Prodstack which runs approx 1200 instances with about 4000 units. We went from Juju 2.5 to 2.6 and noticed a significant decrease in memory consumption. (Max Juju memory consumption went from 10-16GB down to 2.4-6.4GB depending on how the distribution of models happened across the controllers.)
You also need to account for Mongo DB memory which went from around 20GB down to around 16GB. (These are 32GB VMs that the controllers are running on.)
I think some of the memory consumption is just caching, vs must-have. But we can comfortably run the above with 32GB of ram, and possibly could do so with just 16GB of ram on Juju 2.6+.

We also see an average of 10-15Mbps across all 3 controllers (a lot of this is Mongo sync and Mongo queries). So 10G networking should give you plenty of headroom. :slight_smile: 10Mbps would be a bit light, but 100Mbps should easily be sufficient.