大数据测试中一些工具使用

zoukankan html css js c++ java

大数据测试中一些工具使用
一.数据处理架构

如图，数据流转主要有两条线，实时计算流程和离线计算流程
- 实时计算：事件(hive表)----(使用dw-event-to-collector.sh发送事件)---->收数工具collector-------->flume分发-------->kafka缓存-------->flink计算-------->hbase-------->elasticsearch
- 离线计算：事件(hive表)----(主动读hive表)---->hdfs-------->flink计算-------->hbase-------->elasticsearch
二.实时计算过程中工具使用

1.hive
- 进入hive数仓: hive
- 查看当前数据库: show databases;
- 切换到cdp库: use cdp;
- 创建表(SMH前端的导出事件配置中有自动生成的语句):
  CREATE TABLE IF NOT EXISTS tablename(
  uid string,
  event_time bigint,
  touch_point_id string
  )partitioned by (process_date string)
  ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ' '
  LINES TERMINATED BY ' '
  STORED AS TEXTFILE;
- 查看建表命令: show create table c8_shopping;
- 查看当前表: show tables;
- 查看表中列名: desc tablename;
- 把事件插入对应hive表中: load data local inpath "/home/hadoop/shopping.txt" into table tablename partition(process_date="2019-07-22");
- 查询表中数据: select * from tablename where process_date = '2019-04-26' limit 10;
- 查询前执行该命令列名和数据一起显示: set hive.cli.print.header=true;
- 删除表中数据: truncate table tablename;
- 删除表: drop table tablename;
2.kafka

查询kafka消费情况，路径：/home/hadoop/kafka_2.11-0.10.2.0/bin

命令： sh kafka-console-consumer.sh --topic event_c8 --from-beginning --bootstrap-server 172.00.0.000:9092 > event_c8

3.flink
- 重启flink任务，路径：/home/hadoop/cdp-etl-jobs/bin/job/realtime
- 关闭flink任务：yarn application -kill 任务id
- 启动flink任务：sh indexing-trait.sh sh calculate-trait.sh
4.hbase
- 进入hbase：hbase shell
- 查看已存在的表：list
- 查询某特性的值：scan 'trait_c8',{COLUMNS=>['d:t1425','d:uid']}
- 查询某uid删除状态：scan 'trait_c8', {COLUMNS => 'd:delete_status',FILTER => "ValueFilter(=,'substring:true')"}
- 查询某个uid： get 'trait_c8','fff144eb653e7348f051307cde7db169'
- 删除表中数据：truncate "tablename"; flush "tablename";
- 删除表：disable table; drop table;
- hbase全量同步到es：cdp/cdp-etl-jobs/bin/job/batch/trait-crowd-calc.sh -calcType sync 增量为：incr
5.elasticsearch

查询工具可以使用kibana或者elasticsearch head插件，常用命令：
- 查询特性：
  GET /trait_c39/trait_c39/_search?size=1000
  {
  "query": {
  "match_all": {}
  },
  "_source": ["t596"]
  }
- 查询人群：
  GET /trait_c39/trait_c39/_search?size=1000
  {
  "query": {
  "match_all": {}
  },
  "post_filter": {"term": {
  "crowds_code": "cr197"
  }}
  }
- 查询某个uid：
  GET /trait_c33/trait_c33/uid-1
三.离线计算过程中工具使用

1.hdfs

前端页面查询地址：http://172.23.x.xxx:50070/explorer.html#/cdp/warehouse

查看目录：hadoop fs -ls /cdp/warehouse/c8/offline/

查看文件：hadoop fs -cat /cdp/warehouse/c8/offline/shopping.txt

下载数据：hadoop fs -get /cdp/warehouse/c8/offline/

删除文件：hadoop fs -rm -r /cdp/warehouse/c8/offline/shopping.txt

2.azkaban
查看全文

相关阅读:
服务上线怎么兼容旧版本？
abstract class和interface有什么区别?
Anonymous Inner Class (匿名内部类)是否可以extends(继承)其它类，是否可以implements(实现)interface(接口)?
寒假每日日报49（开发家庭记账本APP——进度十）
寒假每周总结7
寒假每日日报48（开发家庭记账本APP——进度九）
寒假每日日报47（开发家庭记账本APP——进度八）
寒假每日日报46（开发家庭记账本APP——进度七）
寒假每日日报45（开发家庭记账本APP——进度六）
寒假每日日报44（开发家庭记账本APP——进度五）

原文地址：https://www.cnblogs.com/fanshudada/p/11278422.html

大数据测试中一些工具使用

一.数据处理架构

二.实时计算过程中工具使用

1.hive

2.kafka

3.flink

4.hbase

5.elasticsearch

三.离线计算过程中工具使用

1.hdfs

2.azkaban