Recovering from a bad MySQL Corruption

Every now and then (once a year only hopefully) one comes across a nasty problem. Well that happened to us a couple of days ago. Corruption of a MySQL Database right down the FRM and MYI files. Even worse it seemed that MySQL’s own data schema was getting progressively more corrupt.

Symptoms

Databases appearing empty

Databases with corrupt tables

“MySQL went away” message when entering a table via command line or phpmyadmin

The “Next” database in the list mysteriously becomes empty. Removing offending database fixes the other again.

First thing to check was phpmyadmin. Logging into the database showed that something was seriously wrong. I could load any non-corrupted DBs but when I clicked on the corrupt database phpmyadmin either said “MySQL has gone away” or it brought me back to the login screen. Odd!

Next step, ssh into the linux server, load mysql console and try to repair the offending table, i.e.

# mysql -u username -p databasename

hmm something odd here

# use databasename

-> mysql has gone away.

WHAAATTTTT!!!!!! OK there is something seriously seriously wrong. Restart mysql:

# /etc/init.d/mysql restart

-> mysql failed to load

uh-oh, should have expected that, now my other DBs are down as well. Checking the mysql logs show something wrong with the DB load.

Right, getting closer, go to the data files

# cd /lib/mysql/databasename

list all of the files. OK, all the frm, myi and mdb files are there. Check them

# myisamchk *.MYI

Loads of errors. There are errors in some frm files and some MYI files.

Restore backups. Same problem, huh? This happened some time ago and has just been noticed as it spread across to some critical tables.

OK, FRMs only store structure data so I can copy them from another database of the same system (this was a joomla 1.5.23 installation). After copying the files over I restarted MySQL successfully, but of course there are still problems.

Load the database again

# REPAIR TABLE tablename USE_FRM

woohoo! success. OK, so need to do this with the remaining tables that had the same problem. Success for each.

Checking the database again there are some problems with data in some tables (as expected). It usually related to the count of records in the table. Again just run

# REPAIR TABLE tablename

Success.

Database back in action. Immediate backup taken.

So what happened? Well it seems that there is a faulty RAM chip on the machine. These files were getting hopelessly corrupt. So this is a stopgap solution at best. Currently bringing all data over to a sister server.

Phew!

Leave a Reply Cancel reply