Hi there! After several early aborted attempts at casually using Juju for deployment and finding debugging problems getting complex rapidly, I have dug in my heels and gotten further (actually succesful deployments of some charms for the first time!) but am snagging on a couple of Ceph cluster components.
Juju Controller deployed in container on my desktop on the same LAN as the target machines. Version 2.5-beta1-bionic-amd64 installed from snap --edge channel.
Manual cloud with 4 metal nodes running bionic manually added.
Ceph-mon and Ceph-osd deployed to the cloud mostly accepting defaults, relation added between Mon and OSD.
So far so good. Mon and OSD cluster are up and running!
Status looks like this:
Every 2.0s: juju status --color juju-metal: Mon Oct 15 14:09:12 2018
Model Controller Cloud/Region Version SLA Timestamp default metal-ctrl hl-metal 2.5-beta1 unsupported 14:09:12-07:00 App Version Status Scale Charm Store Rev OS Charm version Notes ceph-fs waiting 0 ceph-fs jujucharms 16 ubuntu ceph-mon 12.2.7 active 3 ceph-mon jujucharms 27 ubuntu ceph-osd 12.2.7 active 4 ceph-osd jujucharms 270 ubuntu Unit Workload Agent Machine Public address Ports Message ceph-mon/0 active idle 0 celery Unit is ready and clustered ceph-mon/1* active idle 1 lazarus Unit is ready and clustered ceph-mon/2 active idle 2 inspiral Unit is ready and clustered ceph-osd/4 active idle 2 inspiral Unit is ready (1 OSD) ceph-osd/5 active idle 4 rombus Unit is ready (1 OSD) ceph-osd/6* active idle 1 lazarus Unit is ready (1 OSD) ceph-osd/7 active idle 0 celery Unit is ready (1 OSD) Machine State DNS Inst id Series AZ Message 0 started celery manual:celery bionic Manually provisioned machine 1 started lazarus manual:lazarus bionic Manually provisioned machine 2 started inspiral manual:inspiral bionic Manually provisioned machine 4 started rombus manual:rombus bionic Manually provisioned machine
So I move onto installing Ceph-fs. Adding in the application, then relating Mon to the Ceph-fs app, then starting with 1 unit. The deployment gets stuck with the status “Installing btrfs-tools,ceph,ceph-mds,gdisk,ntp,python-ceph,python3-pyxattr,xfsprogs” and never advances. I’ll post the complete debug-log but the most interesting-looking output seems to be the mon complaining about the ceph-fs node:
unit-ceph-mon-1: 14:15:46 DEBUG unit.ceph-mon/1.mds-relation-changed Error EINVAL: pool 'ceph-fs_data' (id '1') has a non-CephFS application enabled.
I’m combing through Ceph docs right now to see if there’s some Mon or OSD flag I need set that currently isnt, but any points folk have here to help get to the bottom of this would be appreciated.
My second issue is that when I usually deploy Ceph with MDS I normally include a Ceph-mgr service which it seems there’s no Charm for. Any insight here would be welcome.
EDIT: Ceph-fs deploy log is here.