Ceph – Delete erasure coded pgs after dataloss

Sometimes you have failures that cannot be fixed… ie EC 2+1 and 2 drives failing… (btw this was the recommended default EC profile of 14.x..) and you should use 8+3 at minimum to prevent this!

Warning, everything below ensures data loss on the affected PG.

ceph pg PGID query  | jq .acting

# Stop OSD related to PG, figure out the shard id of the pg, generally its .s0, .s1, .s2 depending on your EC config.
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid PGID.s1/2 --force --op remove

# Restart the osd, wait for it to attempt to peer, stop it then mark it complete.
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid PGID.s1/2 --op mark-complete

# Tell the customer your mistake is acceptable..
ceph pg 13.df mark_unfound_lost delete

1 Response

  1. Anthony August 10, 2023 / 10:59 am

    2,1 is the default profile, I wouldn’t say “recommended”.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.