Sometimes you have failures that cannot be fixed… ie EC 2+1 and 2 drives failing… (btw this was the recommended default EC profile of 14.x..) and you should use 8+3 at minimum to prevent this!
Warning, everything below ensures data loss on the affected PG.
ceph pg PGID query | jq .acting # Stop OSD related to PG, figure out the shard id of the pg, generally its .s0, .s1, .s2 depending on your EC config. ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid PGID.s1/2 --force --op remove # Restart the osd, wait for it to attempt to peer, stop it then mark it complete. ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid PGID.s1/2 --op mark-complete # Tell the customer your mistake is acceptable.. ceph pg 13.df mark_unfound_lost delete
2,1 is the default profile, I wouldn’t say “recommended”.