zoukankan      html  css  js  c++  java
  • 求程序员工资的中位数 附两种众数求法

    除了平均数外,众数和中位数也是衡量集合的标尺之一,中位数是将集合升序排列后恰好位于正中间的元素,如果集合总数为偶数,则取中间两个元素的平均值作为中位数,下文将就用SQL去求中位数展开讨论。

    首先建表:

    create table tb_coder(
        id number(4,0) not null primary key,
        name nvarchar2(20) not null,
        salary integer not null)

    然后充值:

    insert into tb_coder
    select rownum,dbms_random.string('*',dbms_random.value(2,20)),dbms_random.value(3000,20000)
    from dual
    connect by level<11
    order by dbms_random.random;

    然后我们看看表中数据:

    SQL> select * from tb_coder order by salary;
    
            ID NAME                                         SALARY
    ---------- ---------------------------------------- ----------
             9 DTVMMQSOAJA                                    4947
            10 CSIZK                                          5424
             2 TYLWH                                          5676
             7 EPUIILOYENSD                                  10316
             3 MSKHEGSWNIFZWB                                15153
             4 OFZKCXNWZWSJR                                 15326
             5 HOJLJWIOYNDBB                                 16101
             8 XJQTKPDXAIOOWJ                                16843
             6 RWRGMEIRGZN                                   18599
             1 OZKFBBFYWGEERAUGSNH                           19710
    
    已选择10行。

    现在可以观察得知,总个数为10,是偶数,中位数就该是处于中间两个数15153和15326的平均值:15329.5

    经过思考,我发现解决问题的关键是得知中间在哪,为此我添加了两列以帮助思考:

    SQL> select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary;
    
            ID     SALARY        SEQ     REVSEQ
    ---------- ---------- ---------- ----------
             9       4947          1         10
            10       5424          2          9
             2       5676          3          8
             7      10316          4          7
             3      15153          5          6
             4      15326          6          5
             5      16101          7          4
             8      16843          8          3
             6      18599          9          2
             1      19710         10          1
    
    已选择10行。

    进过观察可以知道,用seq减去revseq再求绝对值,值大的一定靠两边,值小的一定靠中间。于是再查:

    SQL> select a.*,abs(a.seq-a.revseq) as diff from
      2  (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a;
    
            ID     SALARY        SEQ     REVSEQ       DIFF
    ---------- ---------- ---------- ---------- ----------
             9       4947          1         10          9
            10       5424          2          9          7
             2       5676          3          8          5
             7      10316          4          7          3
             3      15153          5          6          1
             4      15326          6          5          1
             5      16101          7          4          3
             8      16843          8          3          5
             6      18599          9          2          7
             1      19710         10          1          9
    
    已选择10行。

    从上面已经明显看出,diff值最小的就是我们要找的15153和15326两条记录。

    如果我取diff值最小的记录,求平均值不就是中位数了吗,于是有了下面的SQL:

    SQL> select avg(b.salary) from
      2  (select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) b
      3  where b.diff=(select min(c.diff) from
      4  (select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) c);
    
    AVG(B.SALARY)
    -------------
          15239.5

    结果等于预期,这时偶数个情况,那么奇数个呢?

    让我们先删除最后一行:

    SQL> delete from tb_coder where id=10;
    
    已删除 1 行。
    
    SQL> commit;
    
    提交完成。

    再看结果集:

    SQL> select * from tb_coder order by salary;
    
            ID NAME                                         SALARY
    ---------- ---------------------------------------- ----------
             9 DTVMMQSOAJA                                    4947
             2 TYLWH                                          5676
             7 EPUIILOYENSD                                  10316
             3 MSKHEGSWNIFZWB                                15153
             4 OFZKCXNWZWSJR                                 15326
             5 HOJLJWIOYNDBB                                 16101
             8 XJQTKPDXAIOOWJ                                16843
             6 RWRGMEIRGZN                                   18599
             1 OZKFBBFYWGEERAUGSNH                           19710
    
    已选择9行。

    只剩下九条,那么处于中间的15326应该是中位数。

    看看是不是:

    SQL> select avg(b.salary) from
      2  (select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) b
      3  where b.diff=(select min(c.diff) from
      4  (select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) c);
    
    AVG(B.SALARY)
    -------------
            15326

    结果等于预期!

    附:另一种中位数求法,这种方法比较简洁,但是是别人的方法《SQL进阶教程》P61-P62,不是我自己想出来的。

    select avg(distinct salary)
    from 
    (
        select t1.salary
        from tb_coder t1,tb_coder t2
        group by t1.salary
        having sum(case when t1.salary>=t2.salary then 1 else 0 end)>=count(*)/2
            and  sum(case when t1.salary<=t2.salary then 1 else 0 end)>=count(*)/2
    )

    另外众数也是衡量集合属性的标尺之一,下面列出了两种求众数的方法供大家参考(一千元分一个档):

    --求众数(使用谓词)
    select a.sal,count(*) as cnt from
    (select round(salary/1000)*1000 as sal from tb_coder) a
    group by a.sal
    having count(*) >= all(select count(*) from (select round(salary/1000)*1000 as sal from tb_coder) b group by b.sal)
    
    --求众数(使用rownum)
    select b.sal,b.cnt from
    ( select a.sal,count(*) as cnt from
    (select round(salary/1000)*1000 as sal from tb_coder) a
    group by a.sal
    order by cnt desc ) b
    where rownum=1

    --2020年4月5日--

    以上用到的全部SQL:

    create table tb_coder(
        id number(4,0) not null primary key,
        name nvarchar2(20) not null,
        salary integer not null)
        
    insert into tb_coder
    select rownum,dbms_random.string('*',dbms_random.value(2,20)),dbms_random.value(3000,20000)
    from dual
    connect by level<11
    order by dbms_random.random;
    
    select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary
    
    
    select a.*,abs(a.seq-a.revseq) as diff from
    (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a
    
    select b.* from
    (select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) b
    order by b.diff
    
    select b.* from
    (select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) b
    where b.diff=(select min(c.diff) from
    (select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) c)
    
    select avg(b.salary) from
    (select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) b
    where b.diff=(select min(c.diff) from
    (select a.*,abs(a.seq-a.revseq) as diff from (select id,salary,rank() over (order by salary asc) as seq,rank() over (order by salary desc) as revseq from tb_coder order by salary) a ) c)
  • 相关阅读:
    javascript 创建字典
    IE显示PNG
    透明PNG背景图片 For IE 6.0 Firefox Opera
    IE FireFox对CSS的不同解释收集
    javascript中replace()
    netstat 指令返回结果中state含义
    FIREFOX层的自适应高度
    select options的操作
    事件冒泡
    C++第三讲笔记
  • 原文地址:https://www.cnblogs.com/heyang78/p/12637767.html
Copyright © 2011-2022 走看看