Cephs weakest leak is configuration.. once a cluster is deployed is incredibly durable and will survive most mistakes without punishment. However adding a monitor that is unreachable via all machines can yield a very broken cluster that cannot be managed.
For example, if you add a new monitor and the automatically detected ip (ansible or kolla) isn’t correct, possibly a loopback or other assigned ip, you will loose the ability to use the ceph tools on the cluster because of a broken monitor map config.
So heres what you need to know in a nut shell to fix it.
- Stop your monitors
- Export a monitor map from the last known good monitor
- Edit the monitor map to fix the broken entry
- Repeat this for all the monitors that were “working”.
- Inject the monitor maps on those monitors
- Start the monitors and check for them to forum a quorum.
ceph-mon -c /etc/ceph/cluster-name-ceph.conf -i MONITOR_NAME --extract-monmap /tmp/monmap
monmaptool --print /tmp/monmap
monmaptool --rm bad-host-entry /tmp/monmap
monmaptool --print /tmp/monmap
ceph-mon --c /etc/ceph/cluster-name-ceph.conf -i MONITOR_NAME --inject-monmap /tmp/monmap
chown ceph:ceph -R /var/lib/ceph/mon/cluster-monitor-name/
systemctl start ceph-mon.target