zoukankan      html  css  js  c++  java
  • 数据清洗维度清洗

    今天老师让做了一下数据清洗中的维度清洗。我成功导入并清洗成功。下面是要求和我的操作步骤以及流程截图。

    要求:

    测试题目:

    1、数据导入:

    要求将样表文件中的AA_GXJSQYDC2019数据导入HIVE数据仓库中。

    分别将四个标准维度表导入数据仓库中。

    2、数据清洗:

    根据标准维度将国民经济行业维度地域维度、高新技术领域维度企业所属领域维度四个维度字段清洗完成。

    3、数据可视化展示:

       尝试按照某一维度实现数据下钻展示。(例如地域维度,按照市——两级展示


    我的操作步骤:

    首先是建表;建表语句之前的博客里面都有

    create table test6 (a String,b String,c String,d String,e String,f String,g String,h String,i String,j String,k String,l String,m String,n String,o String,p String,q String,r String,s String,t String,u String,v String,w String,x String,y String,z String,aa String,ab String,ac String) ROW format delimited fields terminated by ',' STORED AS TEXTFILE;

    create table test7 (aaa String,bbb String) ROW format delimited fields terminated by ',' STORED AS TEXTFILE;

    修改为GBK方式编码

    alter table test7 set SERDEPROPERTIES('serialization.encoding'='GBK');

    导入数据

    load data local inpath '/opt/software/qq.csv' into table test7;

    修改test6为GBK方式编码
    alter table test6 set SERDEPROPERTIES('serialization.encoding'='GBK');

     

    最重要的一步---数据清洗

    insert overwrite table test6 select qiye.a as a,qiye.b as b,qiye.c as c,
    qiye.d as d,qiye.e as e,qiye.f as f,qiye.g as g,qiye.h as h,qiye.i as i,qiye.j as j,
    qiye.k as k,qiye.l as l,qiye.m as m,qiye.n as n,qiye.o as o,qiye.p as p,qiye.q as q,
    qiye.r as r,qiye.s as s,qiye.t as t,qiye.u as u,qiye.v as v,qiye.w as w,qiye.x as x,qiye.y as y,
    qiye.z as z,concat(qiye.f,xinzhen.bbb) as ab,qiye.aa as aa ,qiye.ac as ac from test6 qiye join test7 xinzhen where xinzhen.aaa = qiye.f;

    清洗成功。

  • 相关阅读:
    [CISCN2019 总决赛 Day2 Web1]Easyweb
    [极客大挑战 2019]Upload
    [SUCTF 2019]EasyWeb
    2020/2/1 PHP代码审计之任意文件读取及删除漏洞
    2020/1/31 PHP代码审计之文件包含漏洞
    [Luogu P1120]小木棍·加强版
    学习笔记·堆优化$mathscr{dijkstra}$
    [LuoguP1462]通往奥格瑞玛的道路($SPFA+$二分)
    [USACO08JAN]电话线$Telephone Lines$(图论$+SPFA+$ 二分答案)
    [USACO06NOV]玉米田$Corn Fields$ (状压$DP$)
  • 原文地址:https://www.cnblogs.com/092e/p/15534661.html
Copyright © 2011-2022 走看看