Big Data Era:
1.More and more data becoming available on Hadoop
2.Limitations in existing Business Intelligence (BI) Tools
Limited support for Hadoop
Data size growing exponentially
High latency of interactive queries
Scale-Up architecture
3.Challenges to adopt Hadoop as interactive analysis system
Majority of analyst groups are SQL savvy
No mature SQL interface on Hadoop
OLAP capability on Hadoop ecosystem not ready yet
Business Needs for Big Data Analysis
1.Sub-second query latency on billions of rows
2.ANSI SQL for both analysts and engineers
3.Full OLAP capability to offer advanced functionality
4.Seamless Integration with BI Tools
5.Support of high cardinality and high dimensions
6.High concurrency – thousands of end users
7.Distributed and scale out architecture for large data volume
Kylin is designed to accelerate 80+% analytics queries performance on Hadoop
Technical Challenges:
1.Huge volume data
Table scan
2.Big table joins
Data shuffling
3.Analysis on different granularity
Runtime aggregation expensive
4.Map Reduce job
Batch processing
OLAP Cube – Balance between Space and Time
How Does Kylin Utilize Hadoop Components
1.Hive
Input source
Pre-join star schema during cube building
2.MapReduce
Pre-aggregation metrics during cube building
3.HDFS
Store intermediated files during cube building.
4.HBase
Store data cube.
Serve query on data cube.
Coprocessor is used for query processing.
Cube Designer
Job Management
Query and Visualization
Tableau Integration