Issues with ceph-fs

jamesbeedy · 27 February 2019 16:44

Posting this here before filing a bug as I want to rule out user error before squawking any louder.

We have been battling ceph-fs units getting stuck in a ‘blocked’ workload state with a status of “No MDS detected using current configuration”. See Ubuntu Pastebin

Looking at ceph -w I can see the one unit is active and I am able to create and use cephfs pools. The other 2 units stuck in the blocked state don’t show up.

$ sudo ceph osd pool create cephfs_data 200
pool 'cephfs_data' created
ubuntu@ip-172-31-104-15:~$ sudo ceph osd pool create cephfs_metadata 200
pool 'cephfs_metadata' created
ubuntu@ip-172-31-104-15:~$ sudo ceph fs new cephfs cephfs_metadata cephfs_data
new fs with metadata pool 4 and data pool 3
ubuntu@ip-172-31-104-15:~$ sudo ceph -w
  cluster:
    id:     4f943010-3aac-11e9-89c5-0aeab1a19b3c
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ip-172-31-102-40,ip-172-31-104-15,ip-172-31-104-163
    mgr: ip-172-31-104-15(active), standbys: ip-172-31-102-40, ip-172-31-104-163
    mds: cephfs-1/1/1 up  {0=ip-172-31-102-75=up:active}
    osd: 9 osds: 9 up, 9 in
 
  data:
    pools:   4 pools, 464 pgs
    objects: 21  objects, 2.2 KiB
    usage:   1.1 GiB used, 449 GiB / 450 GiB avail
    pgs:     464 active+clean
 
  io:
    client:   2.6 KiB/s wr, 0 op/s rd, 5 op/s wr

Here is the log from a ceph-fs unit stuck in blocked Ubuntu Pastebin

Here is the log from an active ceph-fs unit Ubuntu Pastebin

Sometimes we can spin up this same deploy and have all cephfs units come up and join the cluster appropriately.

We have tested this on lxd, aws, and bare metal. In the bare metal deploy, our cephfs units use MTU of 9000 and still exhibit the issue.

Here is a bundle similar to the one we are using which can be used to reproduce Ubuntu Pastebin

@rr-pdl has been debugging this issue for the last week pretty aggressively, possibly he has some more input here.

Thoughts? @chris.macnaughton ^?

chris.macnaughton · 27 February 2019 20:43

Funny timing on this! James Page has very recently landed a fix into the interface for the mds (https://github.com/openstack/charm-interface-ceph-mds) that should resolve the issues with multiple MDS servers running, and that change should currently be available in ~openstack-charmers-next/ceph-fs !

seffyroff · 21 March 2019 16:24

I just hit the same issue, using ~openstack-charmers-next/ceph-fs. I posted to the OS Q&A but was shown the door, so here I am!
Post replicated here for completeness:

I’m deploying Ceph via Juju. Using the lastest version of the openstack-charmers-next osd, mon and fs charms. Host OS is Bionic and install source is cloud:bionic-rocky. 3 OSDs (metal), 3 mons (lxd) and 2 MDS (lxd).

Everything stands up great, and initially the MDS works (and the cephfs_data and cephfs_metadata pools are created). However as the install is settling, once the second and third mons come online the MDS status changes to “No MDS detected using current configuration” and the fs pools go offline.

It looks like this:

Every 2.0s: juju status --color                                                                                                              

Model  Controller  Cloud/Region  Version  SLA          Timestamp
base4  homelab     homelab       2.5.1    unsupported  02:49:07-07:00

App       Version       Status   Scale  Charm     Store       Rev  OS      Notes
ceph-fs   13.2.4+dfsg1  blocked      2  ceph-fs   jujucharms   42  ubuntu
ceph-mon  13.2.4+dfsg1  active       3  ceph-mon  jujucharms  380  ubuntu
ceph-osd  13.2.4+dfsg1  active       3  ceph-osd  jujucharms  399  ubuntu

Unit         Workload  Agent  Machine  Public address  Ports  Message
ceph-fs/0    blocked   idle   1/lxd/0  10.0.10.94             No MDS detected using current configuration
ceph-fs/1*   blocked   idle   0/lxd/0  10.0.10.89             No MDS detected using current configuration
ceph-mon/0*  active    idle   0/lxd/1  10.0.10.90             Unit is ready and clustered
ceph-mon/1   active    idle   1/lxd/1  10.0.10.92             Unit is ready and clustered
ceph-mon/2   active    idle   2/lxd/0  10.0.10.95             Unit is ready and clustered
ceph-osd/0   active    idle   0        10.0.10.140            Unit is ready (1 OSD)
ceph-osd/1   active    idle   1        10.0.10.139            Unit is ready (1 OSD)
ceph-osd/2*  active    idle   2        10.0.10.153            Unit is ready (1 OSD)

Machine  State    DNS          Inst id              Series  AZ  Message
0        started  10.0.10.140  manual:10.0.10.140   bionic      Manually provisioned machine
0/lxd/0  started  10.0.10.89   juju-f14309-0-lxd-0  bionic      Container started
0/lxd/1  started  10.0.10.90   juju-f14309-0-lxd-1  bionic      Container started
1        started  10.0.10.139  manual:10.0.10.139   bionic      Manually provisioned machine
1/lxd/0  started  10.0.10.94   juju-f14309-1-lxd-0  bionic      Container started
1/lxd/1  started  10.0.10.92   juju-f14309-1-lxd-1  bionic      Container started
2        started  10.0.10.153  manual:10.0.10.153   bionic      Manually provisioned machine
2/lxd/0  started  10.0.10.95   juju-f14309-2-lxd-0  bionic      Container started

seffyroff · 21 March 2019 16:28

Note that I see this issue with 1,2 or more MDS in the model, so I’m not seeing it being specifically related to multiple MDS services in the cluster. It almost looked like it worked better before the mons had scaled out - the initial status of the MDS was good until the second and third nodes of the mon service came up.

jamesbeedy · 21 March 2019 17:15

We are trying out the next charms right now too. Sad to hear this is persisting for you. Possibly the fix isn’t fully implemented yet?

jamesbeedy · 21 March 2019 18:07

@seffyroff next charms are looking clean to me!

$ sudo ceph -w
  cluster:
    id:     a5d281d0-4bf4-11e9-ad95-ae08418d5bd1
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum d3-ceph-mon-00,d4-ceph-mon-01,d5-ceph-mon-02
    mgr: d5-ceph-mon-02(active), standbys: d4-ceph-mon-01, d3-ceph-mon-00
    mds: ceph-fs-1/1/1 up  {0=juju-2f8106-2-lxd-0=up:active}, 2 up:standby
    osd: 102 osds: 102 up, 102 in
 
  data:
    pools:   2 pools, 2048 pgs
    objects: 22  objects, 2.2 KiB
    usage:   168 GiB used, 556 TiB / 557 TiB avail
    pgs:     2048 active+clean
 
  io:
    client:   1.4 KiB/s wr, 0 op/s rd, 5 op/s wr

Model         Controller  Cloud/Region  Version  SLA          Timestamp
ceph-storage  pdl-maas    pdl-maas      2.5.1    unsupported  11:06:20-07:00

App       Version       Status  Scale  Charm     Store       Rev  OS      Notes
ceph-fs   13.2.4+dfsg1  active      3  ceph-fs   jujucharms   42  ubuntu
ceph-mon  13.2.4+dfsg1  active      3  ceph-mon  jujucharms  380  ubuntu
ceph-osd  13.2.4+dfsg1  active      3  ceph-osd  jujucharms  399  ubuntu

Unit         Workload  Agent  Machine  Public address  Ports  Message
ceph-fs/0*   active    idle   0/lxd/0  10.10.11.10            Unit is ready (1 MDS)
ceph-fs/1    active    idle   1/lxd/0  10.10.11.11            Unit is ready (1 MDS)
ceph-fs/2    active    idle   2/lxd/0  10.10.11.12            Unit is ready (1 MDS)
ceph-mon/0*  active    idle   0        10.10.11.4             Unit is ready and clustered
ceph-mon/1   active    idle   1        10.10.11.5             Unit is ready and clustered
ceph-mon/2   active    idle   2        10.10.11.6             Unit is ready and clustered
ceph-osd/0   active    idle   3        10.10.11.9             Unit is ready (34 OSD)
ceph-osd/1*  active    idle   4        10.10.11.7             Unit is ready (34 OSD)
ceph-osd/2   active    idle   5        10.10.11.8             Unit is ready (34 OSD)

Machine  State    DNS          Inst id              Series  AZ  Message
0        started  10.10.11.4   xxfdrw               bionic  d3  Deployed
0/lxd/0  started  10.10.11.10  juju-2f8106-0-lxd-0  bionic  d3  Container started
1        started  10.10.11.5   fnsq7w               bionic  d4  Deployed
1/lxd/0  started  10.10.11.11  juju-2f8106-1-lxd-0  bionic  d4  Container started
2        started  10.10.11.6   aq3nk3               bionic  d5  Deployed
2/lxd/0  started  10.10.11.12  juju-2f8106-2-lxd-0  bionic  d5  Container started
3        started  10.10.11.9   6g77q8               bionic  d3  Deployed
4        started  10.10.11.7   4ngbw4               bionic  d4  Deployed
5        started  10.10.11.8   k6mkfx               bionic  d5  Deployed

seffyroff · 22 March 2019 21:04

I have a hypothesis that this is DNS related, with Ceph-FS unable to resolve the mons other than the one that’s running on the same lxd host. When I’m back at my desk I’ll try some static dns mappings.

The reason for this is I’m deploying to a manual cloud rather than MAAS in order to speed up iteration.