zoukankan      html  css  js  c++  java
  • MYSQL 取中位数

    什么是最简单的(希望不是太慢)的方式来计算与MySQL?我AVG(x)寻找,但我有一个很难找到计算现在一个简单的方法,我返回所有行到PHP,做一个排序,然后拿起中间行,但肯定要有简单的做它的方式一个MySQL查询。 示例数据:

    id | val
    --------
     1 4
     2 7
     3 2
     4 2
     5 9
     6 8
     7 3

    对排序val2 2 3 4 7 8 9,所以应该是4SELECT AVG(val)这==5
    本文地址 :CodeGo.net/75717/ 
    -------------------------------------------------------------------------------------------------------------------------
    1. 我只是发现了另一个答案网上在 在几乎所有的SQL:

    SELECT x.val from data x, data y
    GROUP BY x.val
    HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2

    请确保您的列以及索引和索引的筛选和排序。验证与解释计划。

    select count(*) from table --find the number of rows

    计算“中位数”行数。median_row = floor(count / 2)。 然后挑选出来的名单:

    select val from table order by val asc limit median_row,1

    这应该回报你与你想要的值一行。 雅各 
    2. 与建议的解决方案(TheJacobTaylor)问题是加入表格本身是慢如糖蜜为大型数据集。我建议的替代在mysql中运行,已使用显式的ORDER BY,这样你就不必希望你的索引下令适当给一个正确的结果,并且容易解开的查询来调试。

    SELECT avg(t1.val) as median_val FROM (
    SELECT @rownum:=@rownum+1 as `row_number`, d.val
     FROM data d, (SELECT @rownum:=0) r
     WHERE 1
     -- put some where clause here
     ORDER BY d.val
    ) as t1, 
    (
     SELECT count(*) as total_rows
     FROM data d
     WHERE 1
     -- put same where clause here
    ) as t2
    WHERE 1
    AND t1.row_number in ( floor((total_rows+1)/2), floor((total_rows+2)/2) );

    [编辑] 添加AVG()周围t1.val和ROW_NUMBER在(...)当有偶数个记录正确产生。推理:

    SELECT floor((3+1)/2),floor((3+2)/2);#total_rows is 3, so avg row_numbers 2 and 2
    SELECT floor((4+1)/2),floor((4+2)/2);#total_rows is 4, so avg row_numbers 2 and 3


    3. 我发现接受的解决方案并没有对我的MySQL安装工作,返回一个空集,但这个查询工作中,我测试了它在所有情况:

    SELECT x.val from data x, data y
    GROUP BY x.val
    HAVING SUM(SIGN(1-SIGN(y.val-x.val)))/COUNT(*) > .5
    LIMIT 1


    4. 一此页面上的MySQL有以下建议:

    -- (mostly) High Performance scaling MEDIAN function per group
    -- Median defined in  CodeGo.net 
    --
    -- by Peter Hlavac
    -- 06.11.2008
    --
    -- Example Table:
    DROP table if exists table_median;
    CREATE TABLE table_median (id INTEGER(11),val INTEGER(11));
    COMMIT;
    
    INSERT INTO table_median (id, val) VALUES
    (1, 7), (1, 4), (1, 5), (1, 1), (1, 8), (1, 3), (1, 6),
    (2, 4),
    (3, 5), (3, 2),
    (4, 5), (4, 12), (4, 1), (4, 7);
    -- Calculating the MEDIAN
    SELECT @a := 0;
    SELECT
    id,
    AVG(val) AS MEDIAN
    FROM (
    SELECT
    id,
    val
    FROM (
    SELECT
    -- Create an index n for every id
    @a := (@a + 1) mod o.c AS shifted_n,
    IF(@a mod o.c=0, o.c, @a) AS n,
    o.id,
    o.val,
    -- the number of elements for every id
    o.c
    FROM (
    SELECT
    t_o.id,
    val,
    c
    FROM
    table_median t_o INNER JOIN
    (SELECT
    id,
    COUNT(1) AS c
    FROM
    table_median
    GROUP BY
    id
    ) t2
    ON (t2.id = t_o.id)
    ORDER BY
    t_o.id,val
    ) o
    ) a
    WHERE
    IF(
    -- if there is an even number of elements
    -- take the lower and the upper median
    -- and use AVG(lower,upper)
    c MOD 2 = 0,
    n = c DIV 2 OR n = (c DIV 2)+1,
    -- if its an odd number of elements
    -- take the first if its only one element
    -- or take the one in the middle
    IF(
    c = 1,
    n = 1,
    n = c DIV 2 + 1
    )
    )
    ) a
    GROUP BY
    id;
    -- Explanation:
    -- The Statement creates a helper table like
    --
    -- n id val count
    -- ----------------
    -- 1, 1, 1, 7
    -- 2, 1, 3, 7
    -- 3, 1, 4, 7
    -- 4, 1, 5, 7
    -- 5, 1, 6, 7
    -- 6, 1, 7, 7
    -- 7, 1, 8, 7
    --
    -- 1, 2, 4, 1
    -- 1, 3, 2, 2
    -- 2, 3, 5, 2
    --
    -- 1, 4, 1, 4
    -- 2, 4, 5, 4
    -- 3, 4, 7, 4
    -- 4, 4, 12, 4
    
    -- from there we can select the n-th element on the position: count div 2 + 1


    5. 你函数 CodeGo.net,在这里找到。 
    6. 我提出了一个更快的方法。 获取的行数:SELECT CEIL(COUNT(*)/2) FROM data;然后取中间值在排序子查询:SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue) x;我测试了这个随机数的5×10e6个数据集,它会发现在10秒以内。 
    7. 建立销magic贴的回答,对于那些你不必做了,是通过另一个分组 选择grp_field,t1.val FROM( 选择grp_field,@ROWNUM:=IF(@S=grp_field,@ROWNUM +1,0)ASrow_number, @S:=IF(@S=grp_field,@S,grp_field)为二段,d.val 从数据D,(SELECT ROWNUM@:=0,@S:=0)R ORDER BY grp_field,d.val )为T1 JOIN( 选择grp_field,COUNT(*)作为total_rows 从数据D GROUP BY grp_field )为T2 开t1.grp_field=t2.grp_field WHERE t1.row_number=地板(total_rows / 2)+1; 
    8. 不幸的是,TheJacobTaylor的也不是magic贴的答案返回准确的结果为MySQL的最新版本。 从上面magic贴的答案是接近,但它不能正确计算结果集与偶数行。中值的定义为要么1)在偶数套的中间数的奇数编号的集合,或中间的两个数的2)的平均值。 所以,这里的补丁来处理奇数和偶数设置magic贴的解决方案:

    SELECT AVG(middle_values) AS 'median' FROM (
     SELECT t1.median_column AS 'middle_values' FROM
     (
      SELECT @row:=@row+1 as `row`, x.median_column
      FROM median_table AS x, (SELECT @row:=0) AS r
      WHERE 1
      -- put some where clause here
      ORDER BY x.median_column
     ) AS t1,
     (
      SELECT COUNT(*) as 'count'
      FROM median_table x
      WHERE 1
      -- put same where clause here
     ) AS t2
     -- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
     WHERE t1.row >= t2.count/2 and t1.row <= ((t2.count/2) +1)) AS t3;

    为此,请按照下列3个简单步骤: 与您的表上面的代码替换“median_table”(2出现) 替换“median_column”(出现3次)与你想找到的列 如果你有一个WHERE条件,将“WHERE 1”(事件2)与你的where条件 
    9. 最上面的工作方案只为表的一个字段中,您可能需要获得(第50百分位)上查询多个领域。 这样:

    SELECT CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
     GROUP_CONCAT(field_name ORDER BY field_name SEPARATOR ','),
     ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `Median`
    FROM table_name;

    您可以在例如替换“50”以上的任何百分位,是非常有效的。 只要确保你的GROUP_CONCAT,你可以改变它:

    SET group_concat_max_len = 10485760; #10MB max length

    更多详细信息: 
    10. 需要关心的奇数值计数-给出了在这种情况下,两个中间值的平均值。

    SELECT AVG(val) FROM
     ( SELECT x.id, x.val from data x, data y
      GROUP BY x.id, x.val
      HAVING SUM(SIGN(1-SIGN(IF(y.val-x.val=0 AND x.id != y.id, SIGN(x.id-y.id), y.val-x.val)))) IN (ROUND((COUNT(*))/2), ROUND((COUNT(*)+1)/2))
     ) sq


    11. 两个查询方法: 优先个获得数,最小值,最大值和平均值 第二个(与“LIMIT@数/ 2,1”和编制“ORDER BY ..”来获得值 这些被包裹在一个函数defn,所以所有的值可以从一个调用中返回。 如果你的范围是静态的,你的数据不经常变动,这可能是更有效的,而不是从头开始查询这些值存储的值每 
    12. 答案很简单:得到以下指标值:COUNT(*)/ 2取整。 答案:COUNT()/ 2调高或调低,这取决于你的高清或者你可以写一个,如果为偶数的情况下和平均中间的数字(「该数()/ 2“四舍五入数字与”COUNT(*) / 2“四舍五入号)。 
    13. 如果MySQL有ROW_NUMBER,则中位数是(通过此SQL Server查询得到启发):

    WITH Numbered AS 
    (
    SELECT *, COUNT(*) OVER () AS Cnt,
     ROW_NUMBER() OVER (ORDER BY val) AS RowNum
    FROM yourtable
    )
    SELECT id, val
    FROM Numbered
    WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
    ;

    在该情况下,在你有偶数个条目。 如果你想每组找到,那么就PARTITION BY组中的OVER 抢 
    14. 我的代码,效率不表或额外的变量:

    SELECT
    ((SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', floor(1+((count(val)-1) / 2))), ',', -1))
    +
    (SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', ceiling(1+((count(val)-1) / 2))), ',', -1)))/2
    as median
    FROM table;


    15. 或者,你也可以在存储这样做

    DROP PROCEDURE IF EXISTS median;
    DELIMITER //
    CREATE PROCEDURE median (table_name VARCHAR(255), column_name VARCHAR(255), where_clause VARCHAR(255))
    BEGIN
     -- Set default parameters
     IF where_clause IS NULL OR where_clause = '' THEN
     SET where_clause = 1;
     END IF;
     -- Prepare statement
     SET @sql = CONCAT(
     "SELECT AVG(middle_values) AS 'median' FROM (
      SELECT t1.", column_name, " AS 'middle_values' FROM
      (
       SELECT @row:=@row+1 as `row`, x.", column_name, "
       FROM ", table_name," AS x, (SELECT @row:=0) AS r
       WHERE ", where_clause, " ORDER BY x.", column_name, "
      ) AS t1,
      (
       SELECT COUNT(*) as 'count'
       FROM ", table_name, " x
       WHERE ", where_clause, "
      ) AS t2
      -- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
      WHERE t1.row >= t2.count/2
       AND t1.row <= ((t2.count/2)+1)) AS t3
     ");
     -- Execute statement
     PREPARE stmt FROM @sql;
     EXECUTE stmt;
    END//
    DELIMITER ;
    
    -- Sample usage:
    -- median(table_name, column_name, where_condition);
    CALL median('products', 'price', NULL);
  • 相关阅读:
    队列
    使用JPype实现Python调用JAVA程序
    Django和Flask对于URL尾斜杠(back slash)的处理
    数据仓库建设中的数据建模方法(转)
    python自定义logger handler
    Eclipse下.project和.classpath作用(转)
    理解python的with语句
    django常见小问题收集(转)
    windows下无法创建django工程的问题
    Excel的python读写
  • 原文地址:https://www.cnblogs.com/07byte/p/5823654.html
Copyright © 2011-2022 走看看