zoukankan      html  css  js  c++  java
  • 俄罗斯最新开源的牛掰数据库ClickHouse

    ClickHouse是俄罗斯最近刚刚开源的用于数据库管理系统能够实时生成分析数据报告,性能非常强悍!

    使用SQL查询。

     他拥有切割你的数据更多的新方法

     ClickHouse的性能超过同类市场上目前用于DBMS

    ClickHouse使用所有可用的硬件全部潜能的过程尽可能快的每个查询

    ClickHouse是OLAP的柱状DBMS...... 类似的很多内容知识,在官方文档里面都有

    测试时候 需要把一些数据 下载下来然后 看下raw data格式
    然后需要转换
    然后把数据导入到 PostgreSQl中进行预处理

    文档中很多 测试非常不方便 数据量大 下载耗费时间是一个问题  

    本文介绍一个测试demo,对于英文不是很6或者刚刚一头扎进来的同学,可以跟着试一试

     1、首先下载数据:

    for s in `seq 1987 2017`
    do
    for m in `seq 1 12`
    do
    wget http://transtats.bts.gov/PREZIP/On_Time_On_Time_Performance_${s}_${m}.zip
    done
    done



    2、然后连接数据库
    clickhouse-client
    创建表格 Create table:

    CREATE TABLE `ontime` (
      `Year` UInt16,
      `Quarter` UInt8,
      `Month` UInt8,
      `DayofMonth` UInt8,
      `DayOfWeek` UInt8,
      `FlightDate` Date,
      `UniqueCarrier` FixedString(7),
      `AirlineID` Int32,
      `Carrier` FixedString(2),
      `TailNum` String,
      `FlightNum` String,
      `OriginAirportID` Int32,
      `OriginAirportSeqID` Int32,
      `OriginCityMarketID` Int32,
      `Origin` FixedString(5),
      `OriginCityName` String,
      `OriginState` FixedString(2),
      `OriginStateFips` String,
      `OriginStateName` String,
      `OriginWac` Int32,
      `DestAirportID` Int32,
      `DestAirportSeqID` Int32,
      `DestCityMarketID` Int32,
      `Dest` FixedString(5),
      `DestCityName` String,
      `DestState` FixedString(2),
      `DestStateFips` String,
      `DestStateName` String,
      `DestWac` Int32,
      `CRSDepTime` Int32,
      `DepTime` Int32,
      `DepDelay` Int32,
      `DepDelayMinutes` Int32,
      `DepDel15` Int32,
      `DepartureDelayGroups` String,
      `DepTimeBlk` String,
      `TaxiOut` Int32,
      `WheelsOff` Int32,
      `WheelsOn` Int32,
      `TaxiIn` Int32,
      `CRSArrTime` Int32,
      `ArrTime` Int32,
      `ArrDelay` Int32,
      `ArrDelayMinutes` Int32,
      `ArrDel15` Int32,
      `ArrivalDelayGroups` Int32,
      `ArrTimeBlk` String,
      `Cancelled` UInt8,
      `CancellationCode` FixedString(1),
      `Diverted` UInt8,
      `CRSElapsedTime` Int32,
      `ActualElapsedTime` Int32,
      `AirTime` Int32,
      `Flights` Int32,
      `Distance` Int32,
      `DistanceGroup` UInt8,
      `CarrierDelay` Int32,
      `WeatherDelay` Int32,
      `NASDelay` Int32,
      `SecurityDelay` Int32,
      `LateAircraftDelay` Int32,
      `FirstDepTime` String,
      `TotalAddGTime` String,
      `LongestAddGTime` String,
      `DivAirportLandings` String,
      `DivReachedDest` String,
      `DivActualElapsedTime` String,
      `DivArrDelay` String,
      `DivDistance` String,
      `Div1Airport` String,
      `Div1AirportID` Int32,
      `Div1AirportSeqID` Int32,
      `Div1WheelsOn` String,
      `Div1TotalGTime` String,
      `Div1LongestGTime` String,
      `Div1WheelsOff` String,
      `Div1TailNum` String,
      `Div2Airport` String,
      `Div2AirportID` Int32,
      `Div2AirportSeqID` Int32,
      `Div2WheelsOn` String,
      `Div2TotalGTime` String,
      `Div2LongestGTime` String,
      `Div2WheelsOff` String,
      `Div2TailNum` String,
      `Div3Airport` String,
      `Div3AirportID` Int32,
      `Div3AirportSeqID` Int32,
      `Div3WheelsOn` String,
      `Div3TotalGTime` String,
      `Div3LongestGTime` String,
      `Div3WheelsOff` String,
      `Div3TailNum` String,
      `Div4Airport` String,
      `Div4AirportID` Int32,
      `Div4AirportSeqID` Int32,
      `Div4WheelsOn` String,
      `Div4TotalGTime` String,
      `Div4LongestGTime` String,
      `Div4WheelsOff` String,
      `Div4TailNum` String,
      `Div5Airport` String,
      `Div5AirportID` Int32,
      `Div5AirportSeqID` Int32,
      `Div5WheelsOn` String,
      `Div5TotalGTime` String,
      `Div5LongestGTime` String,
      `Div5WheelsOff` String,
      `Div5TailNum` String
    ) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192)



    3、Load the data:

    for i in *.zip; do echo $i; unzip -cq $i '*.csv' | sed 's/.00//g' | clickhouse-client --host=example-perftest01j --query="INSERT INTO ontime FORMAT CSVWithNames"; done
       



    4、接下来就是各种操作了Queries:
    Q0、 select avg(c1) from (select Year, Month, count(*) as c1 from ontime group by Year, Month);

      

      Q1  Count flights per day from 2000 to 2008 years

     Q2. Count of flights delayed more than 10min per day of week for 2000-2008 years

       

    Q3. Count of delays per airport for years 2000-2008

     

    Q4. Count of delays per Carrier for 2007 year

     

    Q5. Percentage of delays for each carrier for 2007 year.

      










  • 相关阅读:
    329. Longest Increasing Path in a Matrix
    2、evaluate-reverse-polish-notation
    1、minimum-depth-of-binary-tree
    2、替换空格
    C风格字符串和C++string对象的相互转化
    1、二维数组中的查找
    8、sort排序中比较函数的几种应用方式
    1131(★、※)Subway Map
    7、(★、※)判断一个序列是否是二叉查找树的后序、前序遍历序列
    041219~051219流水账
  • 原文地址:https://www.cnblogs.com/toov5/p/7346391.html
Copyright © 2011-2022 走看看