RPM Database Corruption issues

Today i noticed that i was having issues with Yum and RPM on my work machine. It's running a fresh install of Fedora 12, so it's most likely something just weird and out of the ordinary happened. It probably had to do with yumex hanging and then killing it by hand. Fixing it was simple for me. I realized that the problem was most likely corruption, one google later, i knew which files to delete (and backup first) and what to do to rebuild the RPM database. Woot, everything back to normal.

This is a big fail whale. It has nothing to do with the coding skills of any of the RPM, Yum, or Yumex maintainers, and i'm pretty sure between them and the PackageKit guys, they've gotten more than a life's share of flames and trolls already. This is a failure, because if i were the average user, say my dad, after smacking the keyboard once or twice to get yumex to continue working, i would have restarted the machine. Then i would be just as stuck.

From where i see it, telling someone to try to rebuild their RPM database on the command line is error prone and coudl just make things worse. Fortunately, the process itself is pretty simple. You backup some files, delete them, and run rpm --rebuilddb. The entire process should just work, so long there aren't bigger failures. From the perspective as a sysadmin, i know that if the RPM database is broken on a server, then chances are other bits, like package headers could be missing or corrupted too. Running such an operation as a knee-jerk reaction would be wrong. On a desktop though, that chance is there, but there's also a better chance that the database got corrupted due to something such as a power outage, or a well placed boot up the computer's sphincter. Such a process, including said backup, would be relatively non destructive if presented in a 'recovery toolkit' of sorts for the end user. Especially, perhaps, if there was a way to verify that the package headers were intact from the last known good configuration.

So in all seriousness, when these things go wrong, how can we offer an option to the user to try and recover the system?

4 flames:

skvidal zei

grep rpm /etc/rc.d/rc.sysinit

Yankee zei

Why does RPM need that database then?

skvidal zei

the files it is removing are temporary/cache/index files. Whe nyou remove them and run rpm --rebuilddb then the database is cleaned up. In some cases you just need to remove the files, not run the rebuilddb.

Yankee zei

Gotcha, from a user interface perspective, when something crashes and leaves those cached files corrupted, how can we communicate this, and then how can we provide an easy system for the user to clear said cache without needing to reboot?