背景:
今天从库有延迟,进数据库看到他一直在执行这类更新语句:
UPDATE `A` SET `isD` = '1', `modTime` = '2012-10-10 10:45:56' WHERE (userid) /* userid是一个varchar的字符串 */
对于这个sql语句,按照以往的经验,认为是:userid 非NULL的话就会被更新,要是NULL的话,就不更新。
分析:
zhoujinyi@localhost : test 02:35:34>select count(*) from A where userid; +----------+ | count(*) | +----------+ | 2627 | +----------+ 1 row in set (0.00 sec) zhoujinyi@localhost : test 03:36:01>select count(*) from A; +----------+ | count(*) | +----------+ | 86181 | +----------+ 1 row in set (0.00 sec) zhoujinyi@localhost : test 03:36:34>select count(*) from A where userid is null; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec)
上面的可以看出:认为是:userid 非NULL的话就会被更新,要是NULL的话,就不更新。 这样的说法是错误的。
从上面得出的userid都有一个规律:数字开头。
zhoujinyi@localhost : test 03:36:41>select userid from A where userid limit 10; +-----------------+ | userid | +-----------------+ | 000726 | | 0008875 | | 0010040 | | 00110222 | | 00123456789 | | 001400 | | 0015238 | | 001577 | | 0019myfavourite | | 001catherine | +-----------------+ 10 rows in set (0.00 sec)
验证:
zhoujinyi@localhost : test 03:54:26>select count(*) from (select userid from A where userid) ab where ab.userid regexp '^[0-9]'; +----------+ | count(*) | +----------+ | 2627 | +----------+ 1 row in set (0.14 sec)
外面套一层子查询并且过滤数字开头的,确实和单独执行的结果一样:2627。
那整张表数字开头的是否也是2627呢?
zhoujinyi@localhost : test 03:54:42>select count(*) from A where userid regexp '^[0-9]'; +----------+ | count(*) | +----------+ | 2648 | +----------+ 1 row in set (0.19 sec)
结果显示多了21条,怎么回事?那这21条是怎么样的规律。
/* 把整张表所有的数字开头的userid去出来 */ zhoujinyi@localhost : test 03:57:49>create table tb2 select userid from A where userid regexp '^[0-9]'; /* 把原始sql得出的所有数字开头的userid取出来 */ zhoujinyi@localhost : test 03:59:36>create table tb1 select userid from (select userid from A where userid) ab where ab.userid regexp '^[0-9]';
zhoujinyi@localhost : test 04:03:04>select a.userid from tb2 a left join tb1 b on a.userid=b.userid where b.userid is null; +-------------------+ | userid | +-------------------+ | 000000he | | 0000liujie | | 000yanzi | | 00benson | | 00bxy | | 00lugenkuzhu | | 00wenjin | | 00的颠覆 | | 00逗 | | 0atta0 | | 0dishuizhiyuan | | 0min | | 0osissio0 | | 0v132 | | 0wanghuan0 | | 0云淡风轻0 | | 0吟风0 | | 0梧桐0 | | 0求知 | | 0白大褂在线0 | | 0超级野蛮0 | +-------------------+ 21 rows in set (1.17 sec)
结果显示确实都是0开头,把这些都转换成整数的话,刚好全部都是0。
zhoujinyi@localhost : test 04:04:48>select a.userid+0 from tb2 a left join tb1 b on a.userid=b.userid where b.userid is null; +------------+ | a.userid+0 | +------------+ | 0 | | 0 | | 0 | | 0 | | 0 | | 0 | | 0 | | 0 | | 0 | | 0 | | 0 | | 0 | | 0 | | 0 | | 0 | | 0 | | 0 | | 0 | | 0 | | 0 | | 0 | +------------+ 21 rows in set (1.46 sec)
那tb1,tb2的userid转换了之后是否都大于0呢?是否相等呢?
zhoujinyi@localhost : test 04:08:02>select count(*) from tb1; +----------+ | count(*) | +----------+ | 2627 | +----------+ 1 row in set (0.00 sec) zhoujinyi@localhost : test 04:08:09>select count(*) from tb1 where userid+0 <>0; +----------+ | count(*) | +----------+ | 2627 | +----------+ 1 row in set (0.00 sec) zhoujinyi@localhost : test 04:08:11>select count(*) from tb2 where userid+0 <>0; +----------+ | count(*) | +----------+ | 2627 | +----------+ 1 row in set (0.00 sec)
结果和猜想的一样,不等于0之后的都是相同的(True)。这样就清楚为什么有21条不一样的数据了(都是数字开头并且等于0的数据)。
结论:
zhoujinyi@localhost : test 04:08:18>select count(*) from A where userid; +----------+ | count(*) | +----------+ | 2627 | +----------+ 1 row in set (0.00 sec) zhoujinyi@localhost : test 04:15:32>select count(*) from A where userid+0 != 0; +----------+ | count(*) | +----------+ | 2627 | +----------+ 1 row in set (0.01 sec)
类似 select * from tb where col 这类查询,只要col转换成不为0的整型值的话 (!=),则都会被检索出来,包括负数。要是col里面没有数字,只有字母和汉字等字符串的话,则就检索不出结果来。和col是否为NULL 没有关系。
看到一个情况和我上面说的差不多:http://www.pnigos.com/?p=19