HBase初探

string hbaseCluster = "https://charju.azurehdinsight.net"; string hadoopUsername = "账户名字"; string hadoopPassword = "密码"; ClusterCredentials creds = new ClusterCredentials(new Uri(hbaseCluster), hadoopUsername, hadoopPassword); var hbaseClient = new HBaseClient(creds); // No response when GetVersionvar version = hbaseClient.GetVersion(); Console.WriteLine(Convert.ToString(version));
首先上代码,这个太特么的坑爹了!代码在winform中是无法运行滴!!!在命令行应用中是可以的!!!(浪费了老子好几天的时间……)
在winform中,通过windbg调试,发现在GetVersion的时候,主线程起了一个Task,然后等待Task的完成。在Task运行初期(大概1分钟内),会有另外一个线程,在WaitHandle,然后等一段时间,该线程消失。主线程中开始Retries调用,然后,就没有然后了……
Anyway,命令行中,代码是OK的。
我的例子,是利用新浪上的API来得到股票信息,比如说:http://hq.sinajs.cn/list=sz000977,sh600718,我每秒钟调用一次,然后这些数据刷到hbase里面去。
股票的实体类定义

publicclass StockEntity { publicstring Name { get; set; } publicdouble TodayOpeningPrice { get; set; } publicdouble YesterdayClosingPrice { get; set; } publicdouble CurrentPrice { get; set; } publicdouble TodayMaxPrice { get; set; } publicdouble TodayMinPrice { get; set; } publicdouble BidPriceBuy { get; set; } publicdouble BidPriceSell { get; set; } publicint FixtureNumber { get; set; } publicdouble FixtureAmount { get; set; } publicint Buy1Number { get; set; } publicdouble Buy1Price { get; set; } publicint Buy2Number { get; set; } publicdouble Buy2Price { get; set; } publicint Buy3Number { get; set; } publicdouble Buy3Price { get; set; } publicint Buy4Number { get; set; } publicdouble Buy4Price { get; set; } publicint Buy5Number { get; set; } publicdouble Buy5Price { get; set; } publicint Sell1Number { get; set; } publicdouble Sell1Price { get; set; } publicint Sell2Number { get; set; } publicdouble Sell2Price { get; set; } publicint Sell3Number { get; set; } publicdouble Sell3Price { get; set; } publicint Sell4Number { get; set; } publicdouble Sell4Price { get; set; } publicint Sell5Number { get; set; } publicdouble Sell5Price { get; set; } public DateTime TransactionTime { get; set; } }
数据拉下来之后,新开一个线程,让它去写到hbase中。
ThreadPool.QueueUserWorkItem(new WaitCallback(SaveStockDataToHbase), se);
具体干活代码如下:
1privatevoid SaveStockDataToHbase(object state) 2 { 3 StockEntity se = state as StockEntity; 4 5// Insert data into the HBase table. 6string rowKey = Guid.NewGuid().ToString(); 7 8 CellSet cellSet = new CellSet(); 9 CellSet.Row cellSetRow = new CellSet.Row { key = Encoding.UTF8.GetBytes(rowKey) }; 10 cellSet.rows.Add(cellSetRow); 111213 Type t = typeof(StockEntity); 1415foreach (string colname in stockEntityColumns) 16 { 17var pi = t.GetProperty(colname); 18object val = pi.GetValue(se); 1920 Cell value = new Cell { column = Encoding.UTF8.GetBytes("charju:" + colname), data = Encoding.UTF8.GetBytes(Convert.ToString(val)) }; 21 cellSetRow.values.Add(value); 22 } 2324try25 { 26 hbaseClient.StoreCells(hbaseStockTableName, cellSet); 27 } 28catch (Exception ex) 29 { 30 Console.WriteLine(ex.Message); 31 } 32 }
6~10行,是生成一个新Row。20行,是反射实体类的每一个Property 定义,来取对应的值(否则我要写一坨重复的代码)。21行,把对应的该列数据写到这个行上。
26行,就是真正的放到hbase中。
上面20行,你可能会注意到:charju,这是我的column family的名字。回过头来,看看hbase中的表是怎么建立的
string hbaseCluster = "https://charju.azurehdinsight.net"; string hadoopUsername = "<your name>"; string hadoopPassword = "<your password>"; string hbaseStockTableName = "StockInformation"; HBaseClient hbaseClient; publicvoid CreateHbaseTable() { // Create a new HBase table. - StockInformation TableSchema stockTableSchema = new TableSchema(); stockTableSchema.name = hbaseStockTableName; stockTableSchema.columns.Add(new ColumnSchema() { name = "charju" }); hbaseClient.CreateTable(stockTableSchema); }
而hbaseClient的实例化,是在这里:
ClusterCredentials creds = new ClusterCredentials(new Uri(hbaseCluster), hadoopUsername, hadoopPassword); hbaseClient = new HBaseClient(creds);
数据写入后,我们可以有几个方式来。一是在hbase中配置一下,允许RDP,然后remote上去跑hbase shell命令,可惜我虚机里面RDP总失败,不知道为啥。第二种方式,就是用HIVE来查。
连接到hbase的网站后,在hive editor那个界面中,先创建对应的表
CREATE EXTERNAL TABLE StockInformation(rowkey STRING, TodayOpeningPrice STRING, YesterdayClosingPrice STRING, CurrentPrice STRING, TodayMaxPrice STRING, TodayMinPrice STRING, BidPriceBuy STRING, BidPriceSell STRING, FixtureNumber STRING, FixtureAmount STRING, Buy1Number STRING, Buy1Price STRING, Buy2Number STRING, Buy2Price STRING, Buy3Number STRING, Buy3Price STRING, Buy4Number STRING, Buy4Price STRING, Buy5Number STRING, Buy5Price STRING, Sell1Number STRING, Sell1Price STRING, Sell2Number STRING, Sell2Price STRING, Sell3Number STRING, Sell3Price STRING, Sell4Number STRING, Sell4Price STRING, Sell5Number STRING, Sell5Price STRING, TransactionTime STRING) STORED BY'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH SERDEPROPERTIES ('hbase.columns.mapping'=':key,charju:TodayOpeningPrice ,charju:YesterdayClosingPrice ,charju:CurrentPrice ,charju:TodayMaxPrice ,charju:TodayMinPrice ,charju:BidPriceBuy ,charju:BidPriceSell ,charju:FixtureNumber ,charju:FixtureAmount ,charju:Buy1Number ,charju:Buy1Price ,charju:Buy2Number ,charju:Buy2Price ,charju:Buy3Number ,charju:Buy3Price ,charju:Buy4Number ,charju:Buy4Price ,charju:Buy5Number ,charju:Buy5Price ,charju:Sell1Number ,charju:Sell1Price ,charju:Sell2Number ,charju:Sell2Price ,charju:Sell3Number ,charju:Sell3Price ,charju:Sell4Number ,charju:Sell4Price ,charju:Sell5Number ,charju:Sell5Price ,charju:TransactionTime') TBLPROPERTIES ('hbase.table.name'='StockInformation');
创建成功后,然后就可以跑SQL了,比如说:
select*from StockInformation where buy1number=9800orderby transactiontime
今天小浪的最大一笔买入。当然,类似于select count(0) 之类的更OK了。
有用的连接:
https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hbase-tutorial-get-started/
gitignore设置失效的问题
存储过程计算两个时间段的请假天数
Vue 多层级目录拖动排序
vue 动态插入渲染html
Vue:eliment-ui el-tree动态加载更新
Angular2+ iframe跨域调用父页面js
Angular2.0+动态绑定html文本
Angular2.0+开发 -实现部门树形结构
Angular2.0+开发(1)-WebStorm配置及第一个Hello World
- 最新文章
-
Keras框架下用.flow_from_directoryt自己构建数据集
tensorflow识别Mnist时,训练集与验证集精度acc高,但是测试集精度低的比较隐蔽的原因
tensorflow 全连接神经网络识别mnist数据
洛谷 P1135 奇怪的电梯
数字图像处理——高斯低通滤波器
操作系统——磁盘存储器管理
博客园随笔中添加数学公式的方法
matlab自带工具箱svmtrain函数参数的选择
Python——提取excel指定单元格的数据到txt中
Python小白的学习笔记
- 热门文章
-
【紫书】系列 UVA 514 铁轨(Rails)ACM/ICPC CERC 1997
pytorch optimizer.step()和loss.backward()和scheduler.step()的关系与区别
cudnn.benchmark = True
model.train()与model.eval()的用法
conda使用清华源一直出现HTTP错误问题
pytorch-unet训练报错Either nomask or multiple masks found for the ID
pytorch各种预训练模型的下载地址
卷基层stride,padding,kernel_size和卷积前后特征图尺寸之间的关系
把tensorboard的多条训练损失曲线画在一张图上
PyTorch之前向传播函数forward