Managing Juju in Production - ToC


ToC around articles that are useful for managing Juju controllers in production

Some of these are question/posts that would be good to turn into more formal tutorials/docs going forward but at least let’s cover the material for the moment.

Topics to add

  • Suggestions for sizing of Juju controllers
  • Preparing for failure - best practices for keeping controllers up
  • Best practices for user management

Please add additional requests for information that’d be good to have added/formalized.


Sizing for Production Controllers

For bare metal HA controllers, the following setup works for me :slight_smile:
64GB ram, 500GB SSD, 2 x 10 core cpu, 10G Networking

I’ve had long standing controllers where the specs were far less and worked fine. It’s also slightly about planning and what you expect your usage to look like over the course of a few years. Concerning ^ controllers, I’m setting myself up for what I expect to be able to handle the load now and 5 years down the line.

On another side of our datacenter I have a juju controller that has a 1T spinning disk, 192GB ram, 2 x 6core cpu. It houses 100+ machines deployed over 20 models, ~10 reasonably active users. It’s been up for over a year. Here are the stats:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             95G     0   95G   0% /dev
tmpfs            19G  1.9G   17G  11% /run
/dev/sda1       917G   18G  853G   3% /
tmpfs            95G     0   95G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            95G     0   95G   0% /sys/fs/cgroup
tmpfs            19G     0   19G   0% /run/user/1000

$ uptime
 21:48:20 up 96 days, 18:36,  1 user,  load average: 0.72, 0.58, 0.60

Nice to see its barely breaking a sweat!