Juju stuck in creating machines on AWS


#1

My juju setup is stuck in creating machines on AWS. I have tried everything from ‘remove-machine --force --no-wait’ to trying to manually create machines etc.

I have searched all chats and threads for few weeks now and cannot find a solution. I am quite new to juju so maybe i am missing some understading of the juju “command” queue or not finding the right logs. As in juju debug-log i am not seeing any errors regarding the pending machines, nor on aws - as they are not showing up at all.
When I ran ‘remove-machine’ on an existing machine it was put to stopped status but not delete. I then deleted the machine manually from aws, but juju status is not updating.

Its our production setup so i am reluctant to delete all and try to reinstall…

controller: JAAS with controller version 2.5.4
client: 2.6.3-bionic-amd64


#2

This is not enough information to go by.
Please create a bug on and submit logs that would show us what is happening. Also, ‘juju status --format yaml’ might be more informative than the tabular one.


#3

Sorry to hear about your trouble @akkujoniel. Is it possible that the AWS credentials (IAM permissions) have changed since the project started? Perhaps the controller is unable to add/remove machines via the API and Juju is failing to report the error correctly.

If this is the problem, then there is a bug in Juju. We have specific code to detect credential failure.


#4

Hi @akkujoniel, you mention looking in debug-log output - have you looked in debug-log -m controller (or in /var/log/juju/machine-x.log on the controller machines)? One thing that isn’t obvious is that the Juju compute provisioner runs in the controller agent(s), so any errors might be being logged in the controller model.

From what you’re describing (machines not showing up in the AWS console) it does sound like it might be a credential problem. There have been some changes to display credential errors better in more recent versions of Juju - they might help you too.


#5

Another thing to look at is “juju show-machine 48”. Sometimes the details of why a machine is failing to be started could be there. (For example, this looks like you might be hitting AWS quota limits.)