I did some more research:
I ran "xfs_check" on the partition and it didn't report any errors at
all, so I'm guessing, that the hard drives are ok.
I also recompiled MySQL with "--with-debug".
How would I get some debug output about the crashes now?
Thanks,
Samy
Samuel Vogel schrieb:
> Hey guys,
>
> First of all: I have tried to post this Issue on forums.mysql.com, but
> the Topic doesn't appear in the Forum. When I try to post again, it
> tells me that it's a duplicate?!?!
>
> Now the real problem:
> I have MySQL set up on two Servers with 7000 Users each and about the
> same amount of databases.
> Since two days ago, there is corruption going on in many databases.
> And MySQL crashes every other hour or so.
> Last night I updated from 5.0.32 to 5.0.45, but the problem is still
> there.
>
> Here is what I see in syslog:
> Oct 7 11:02:53 h1314631 mysqld[32490]: 071007 11:02:53 [ERROR]
> /usr/sbin/mysqld: Table './10temulti@1-12/dzcp_counter_ips' is marked
> as crashed and should be repaired
> Oct 7 11:02:53 h1314631 mysqld[32490]: 071007 11:02:53 [ERROR]
> /usr/sbin/mysqld: Table './10temulti@1-12/dzcp_counter_ips' is marked
> as crashed and should be repaired
> Oct 7 11:03:23 h1314631 mysqld_safe[32724]: Number of processes
> running now: 0
> Oct 7 11:03:23 h1314631 mysqld_safe[32729]: restarted
> Oct 7 11:03:23 h1314631 mysqld[32734]: 071007 11:03:23 InnoDB:
> Database was not shut down normally!
> Oct 7 11:03:23 h1314631 mysqld[32734]: InnoDB: Starting crash recovery.
> Oct 7 11:03:23 h1314631 mysqld[32734]: InnoDB: Reading tablespace
> information from the .ibd files...
> Oct 7 11:08:36 h1314631 mysqld[32734]: InnoDB: Restoring possible
> half-written data pages from the doublewrite
> Oct 7 11:08:36 h1314631 mysqld[32734]: InnoDB: buffer...
> Oct 7 11:08:36 h1314631 mysqld[32734]: 071007 11:08:36 InnoDB:
> Starting log scan based on checkpoint at
> Oct 7 11:08:36 h1314631 mysqld[32734]: InnoDB: log sequence number 0
> 1346871925.
> Oct 7 11:08:36 h1314631 mysqld[32734]: InnoDB: Doing recovery:
> scanned up to log sequence number 0 1346871925
> Oct 7 11:08:36 h1314631 mysqld[32734]: 071007 11:08:36 InnoDB:
> Started; log sequence number 0 1346871925
> Oct 7 11:08:38 h1314631 mysqld[32734]: 071007 11:08:38 [Note]
> /usr/sbin/mysqld: ready for connections.
> Oct 7 11:08:38 h1314631 mysqld[32734]: Version:
> '5.0.45-Debian_1~bpo.1' socket: '/var/run/mysqld/mysqld.sock' port:
> 3306 Debian etch distribution
>
> As far as I understand, this means, that the MySQL Server crashed and
> mysqld_safe noticed that and restarted it.
> I also see much database corruption, but I somewhat run into a chicken
> & egg problem here. I don't know if the databases corruption appeared
> first and led into the crashes or if the crashes led to the corruption.
>
> How can I investigate further into the problem? I don't think that a
> perticular query is crashing the system, since all of our users just
> run well known apps like phpBB etc.
>
> To clarify my situtaion I have just started a "myisamchk --silent
> --force --update-state --recover" for all tables that on my system.
> It gives me, among errors it could repair, to error messages, which I
> could find much about with Google or the MySQL docs:
>
> myisamchk: error: 138 when opening MyISAM-table
> '/data/mysql/.../transcache.MYI'
>
> and
>
> myisamchk: Unknown error 126
> myisamchk: error: '/data/mysql/.../smf_membergroups.MYI' doesn't have
> a correct index definition. You need to recreate it before you can do
> a repair
>
>
> What makes me wonder too, is that the database corruption is happening
> on both Servers, but the MySQL crashes only appear on one of them.
>
> I'm running Debian Etch and the MySQL data dir is on an XFS partition.
> I have mounted the partition with "noatime".
> How would I investigate a potential hard drive error?
>
> Can anybody shade some light on my situation?
>
> Regards,
> Samy
>