下午正在开周会,然后收到短信,说是X.X.X.X的机器ping不通了,一轮测试过后,发现是某台数据库服务器挂了,先不急着重启,问下tencent客服。。。
乖乖的好家伙,母机的主板坏了。。。。一个小时候,母机起来了,看下数据库起来么。。
[root@VM_145_57_tlinux ~]# mysql -uroot -p1234 ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2) [root@VM_145_57_tlinux ~]# mysql -uroot -p Enter password: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2) [root@VM_145_57_tlinux ~]# ps -ef |grep mysql root 2016 1 0 17:50 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysql/var --socket=/tmp/mysql.sock --pid-file=/data/mysql/mysqld/mysqld.pid --basedir=/usr --user=mysql mysql 2177 2016 98 17:50 ? 00:11:21 /usr/libexec/mysqld --basedir=/usr --datadir=/data/mysql/var --user=mysql --log-error=/data/mysql/mysqld/mysqld.log --pid-file=/data/mysql/mysqld/mysqld.pid --socket=/tmp/mysql.sock --port=3306 root 13643 13364 0 18:02 pts/0 00:00:00 grep mysql [root@VM_145_57_tlinux ~]# mysql -uroot -p1234 -S /tmp/mysql.sock ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2) [root@VM_145_57_tlinux ~]# ls /tmp/mys^C [root@VM_145_57_tlinux ~]# /etc/init.d/mysqld restart Stopping mysqld: [ OK ] Starting mysqld: [ OK ] [root@VM_145_57_tlinux ~]# ps -ef | msyql -bash: msyql: command not found [root@VM_145_57_tlinux ~]# ps -ef |grep mysql root 2016 1 0 17:50 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysql/var --socket=/tmp/mysql.sock --pid-file=/data/mysql/mysqld/mysqld.pid --basedir=/usr --user=mysql mysql 2177 2016 99 17:50 ? 00:13:45 /usr/libexec/mysqld --basedir=/usr --datadir=/data/mysql/var --user=mysql --log-error=/data/mysql/mysqld/mysqld.log --pid-file=/data/mysql/mysqld/mysqld.pid --socket=/tmp/mysql.sock --port=3306 root 13942 1 0 18:03 pts/0 00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysql/var --socket=/tmp/mysql.sock --pid-file=/data/mysql/mysqld/mysqld.pid --basedir=/usr --user=mysql mysql 14100 13942 0 18:03 pts/0 00:00:00 /usr/libexec/mysqld --basedir=/usr --datadir=/data/mysql/var --user=mysql --log-error=/data/mysql/mysqld/mysqld.log --pid-file=/data/mysql/mysqld/mysqld.pid --socket=/tmp/mysql.sock --port=3306 root 14908 13364 0 18:04 pts/0 00:00:00 grep mysql
这下亮了,出现了double进程,没发解释啊。。。【下面这步觉得error】
[root@VM_145_57_tlinux ~]# kill -9 2016 [root@VM_145_57_tlinux ~]# kill -9 2177 [root@VM_145_57_tlinux ~]# ps -ef |grep mysql root 13942 1 0 18:03 pts/0 00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysql/var --socket=/tmp/mysql.sock --pid-file=/data/mysql/mysqld/mysqld.pid --basedir=/usr --user=mysql mysql 14100 13942 0 18:03 pts/0 00:00:00 /usr/libexec/mysqld --basedir=/usr --datadir=/data/mysql/var --user=mysql --log-error=/data/mysql/mysqld/mysqld.log --pid-file=/data/mysql/mysqld/mysqld.pid --socket=/tmp/mysql.sock --port=3306 root 15240 13364 0 18:05 pts/0 00:00:00 grep mysql
之前起来的进程被我kill -9之后,发现能连上数据库了,可是innodb存储引擎没起来。
mysql> show engines; +------------+---------+-----------------------------------------------------------+--------------+------+------------+ | Engine | Support | Comment | Transactions | XA | Savepoints | +------------+---------+-----------------------------------------------------------+--------------+------+------------+ | MRG_MYISAM | YES | Collection of identical MyISAM tables | NO | NO | NO | | CSV | YES | CSV storage engine | NO | NO | NO | | MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance | NO | NO | NO | | MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO | +------------+---------+-----------------------------------------------------------+--------------+------+------------+ 4 rows in set (0.00 sec)
似乎能看出问题了,进程起来了,socket没有建立,存储引擎没有启动-->innodb正在后台线程操作!!查下errorlog一探究竟:
150126 17:45:52 mysqld_safe Starting mysqld daemon with databases from /data/mysql/var 150126 17:45:53 InnoDB: Initializing buffer pool, size = 28.0G 150126 17:45:55 InnoDB: Completed initialization of buffer pool InnoDB: Log scan progressed past the checkpoint lsn 1808 1368235506 150126 17:45:55 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... InnoDB: Doing recovery: scanned up to log sequence number 1808 1373477888
事实上,那个还没创建socket的进程是正在执行double write的回滚工作,继续往下翻页:
InnoDB: Doing recovery: scanned up to log sequence number 1808 3523058688 150126 18:03:02 mysqld_safe Starting mysqld daemon with databases from /data/mysql/var 150126 18:03:03 InnoDB: Initializing buffer pool, size = 28.0G 150126 18:03:03 InnoDB: Error: cannot allocate 30064787456 bytes of InnoDB: memory with malloc! Total allocated memory InnoDB: by InnoDB 37927312 bytes. Operating system errno: 12 InnoDB: Check if you should increase the swap file or InnoDB: ulimits of your operating system. InnoDB: On FreeBSD check you have compiled the OS with InnoDB: a big enough maximum process size. InnoDB: Note that in most 32-bit computers the process InnoDB: memory space is limited to 2 GB or 4 GB. InnoDB: We keep retrying the allocation for 60 seconds... InnoDB: Doing recovery: scanned up to log sequence number 1808 3528301568
这个是我在innodb引擎正在执行recovery的时候强行启动mysql的报错,提示内存不足。。继续往下翻页:
InnoDB: Doing recovery: scanned up to log sequence number 1808 3606944768 150126 18:04:03InnoDB: Fatal error: cannot allocate the memory for the buffer pool 150126 18:04:03 [ERROR] Plugin 'InnoDB' init function returned error. 150126 18:04:03 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 150126 18:04:03 [Note] Event Scheduler: Loaded 0 events 150126 18:04:03 [Note] /usr/libexec/mysqld: ready for connections. Version: '5.1.61' socket: '/tmp/mysql.sock' port: 3306 Source distribution InnoDB: Doing recovery: scanned up to log sequence number 1808 3612187648
[此处省略若干行] InnoDB: Doing recovery: scanned up to log sequence number 1808 3711802368 150126 18:16:28 [Note] /usr/libexec/mysqld: Normal shutdown 150126 18:16:28 [Note] Event Scheduler: Purging the queue. 0 events 150126 18:16:30 [Note] /usr/libexec/mysqld: Shutdown complete 150126 18:16:30 mysqld_safe mysqld from pid file /data/mysql/mysqld/mysqld.pid ended 150126 18:16:30 mysqld_safe Starting mysqld daemon with databases from /data/mysql/var 150126 18:16:30 InnoDB: Initializing buffer pool, size = 28.0G 150126 18:16:32 InnoDB: Completed initialization of buffer pool InnoDB: Log scan progressed past the checkpoint lsn 1808 1368235506 [kill -9的结果] 150126 18:16:32 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... InnoDB: Doing recovery: scanned up to log sequence number 1808 1373477888
看到上面的错误,才恍然大悟,前面的操作是有多危险,要不是mysql的recovery不那么强悍的话,恐怕数据就被我这样弄没了。。。好悬
InnoDB: Doing recovery: scanned up to log sequence number 1808 4183661568 InnoDB: Doing recovery: scanned up to log sequence number 1808 4185407155 150126 18:38:51 InnoDB: Starting an apply batch of log records to the database... InnoDB: Progress in percents: 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 InnoDB: Apply batch completed 150126 18:40:59 InnoDB: Started; log sequence number 1808 4185407155 150126 18:40:59 [Note] Event Scheduler: Loaded 0 events 150126 18:40:59 [Note] /usr/libexec/mysqld: ready for connections. Version: '5.1.61' socket: '/tmp/mysql.sock' port: 3306 Source distribution
回滚完成,happy启动服务。
下面来讲下innodb_log_file_size在innodb异常时回滚机制:
下面我们参考 http://www.cnblogs.com/zuoxingyu/archive/2012/10/25/2738864.html 大牛的博客,算一下innodb_log_file_size到底多大为最合适:【作为一个粗略的规则,你可以让这个日志足够大到能容纳最多一小时左右的日志】
mysql> pager grep sequence PAGER set to 'grep sequence' mysql> show engine innodb statusG select sleep(60); show engine innodb statusG Log sequence number 1818 411580730 1 row in set (0.00 sec) 1 row in set (1 min 0.00 sec) Log sequence number 1818 446445181 1 row in set (0.00 sec)
(411580730-446445181)/1024/1024/60=1999M,向上取整2G,由于默认有2个日志文件,所以按照当前高峰期计算,设为1G为最佳值。