原文url: https://my.oschina.net/guol/blog/891156
介绍
Presto是一个开源的分布式SQL查询引擎,适用于交互式分析查询,数据量支持GB到PB字节。Presto的设计和编写完全是为了解决像Facebook这样规模的商业数据仓库的交互式分析和处理速度的问题。Presto支持在线数据查询,包括Hive, Cassandra, Mysql关系数据库以及专有数据存储。也支持Redis,Mongodb,Kafak这样的系统通过SQL语句来查询数据。一条Presto查询可以将多个数据源的数据进行合并,可以跨越整个组织进行分析。
第一次接触Presto,还是0.150,现在版本已经更新到0.174,可见presto的更新还是很活跃的,社区氛围也不错。
依赖
Mac OS X or Linux
Java 8 Update 92 or higher (8u92+), 64-bit
Maven 3.3.9+ (for building)
Python 2.4+ (for running with the launcher script)
架构
coordinator
presto中的coordinator主要是控制worker节点的,一般称为调度节点
worker
presto中的worker主要是工作节点,具体的查询都是在worker节点上执行的。
部署
下载
cd /opt/programs
wget 'https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.174/presto-server-0.174.tar.gz'
coordinator部署
#进入下载目录
cd /opt/programs
#解压
tar czvf presto-server-0.174.tar.gz -C presto_174
#进入主目录
cd presto_174
#创建配置目录
mkdir etc
#etc目录的基础配置文件,均需要手动创建
##jvm.config
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
##log.properties
com.facebook.presto=INFO
##node.properties
node.environment=production
node.id=109
node.data-dir=/tmp/presto/data
##config.properties
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=9999
query.max-memory=20GB
query.max-memory-per-node=4GB
discovery-server.enabled=true
discovery.uri=http://192.168.1.109:9999
worker部署
#进入下载目录
cd /opt/programs
#解压
tar czvf presto-server-0.174.tar.gz -C presto_174
#进入主目录
cd presto_174
#创建配置目录
mkdir etc
#etc目录的基础配置文件,均需要手动创建
##jvm.config
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
##log.properties
com.facebook.presto=INFO
##node.properties
node.environment=production
node.id=135
node.data-dir=/tmp/presto/data
##config.properties
coordinator=false
http-server.http.port=9999
query.max-memory=20GB
query.max-memory-per-node=4GB
discovery.uri=http://192.168.1.109:9999
Presto CLI部署
#进入bin
cd /opt/programs/presto_174/bin
#下载
wget 'https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.174/presto-cli-0.174-executable.jar'
mv presto-cli-0.174-executable.jar presto-cli
chmod 755 presto-cli
Presto UI
http://192.168.1.109:9999
Presto init.d
#!/bin/bash
#
#Author: dalu, Date: 2016/12/8
#
###################################
# chkconfig read settings
# chkconfig: - 99 50
# description: presto start script
# processname: presto server
###################################
export PATH=/opt/programs/jdk1.8.0_111/bin/:$PATH
#java home
JAVA_HOME="/opt/programs/jdk1.8.0_111"
#app home
APP_HOME='/opt/programs/presto_174/'
#app name
APP_NAME='PrestoServer'
#app log dir
APP_LOG="$APP_HOME/var/logs/"
#java main function
APP_MAINCLASS="PrestoServer"
#start cmd
CMD="/opt/programs/presto_174/bin/launcher"
###################################
#check PrestoServer app is running
###################################
psid=0
checkpid(){
javaps=`$JAVA_HOME/bin/jps -l | grep $APP_MAINCLASS`
if [ -n "$javaps" ];then
psid=`echo $javaps | awk '{print $1}'`
else
psid=0
fi
}
###################################
# start PrestoServer app
##################################
start(){
checkpid
if [ $psid -ne 0 ];then
echo "================================"
echo "warn: $APP_NAME already started! (pid=$psid)"
echo "================================"
else
echo -n "Starting $APP_NAME ..."
$CMD --pid-file=$APP_LOG/launcher.pid --launcher-log-file=$APP_LOG/launcher.log --server-log-file=$APP_LOG/server.log start 2>&1
checkpid
if [ $psid -ne 0 ];then
echo "(pid=$psid) [OK]"
else
echo "[Failed]"
fi
fi
}
##################################
# stop PrestoServer app
##################################
stop(){
checkpid
if [ $psid -ne 0 ];then
echo -n "Stopping $APP_NAME ...(pid=$psid) "
presto_pid=`$JAVA_HOME/bin/jps -l | grep $APP_MAINCLASS | awk '{print $1}'`
kill $presto_pid
if [ $? -eq 0 ];then
echo "[OK]"
else
kill -9 $presto_pid
sleep 5
checkpid
if [ $psid -ne 0 ];then
echo -n "[Failed]"
fi
fi
else
echo "================================"
echo "warn: $APP_NAME is not running"
echo "================================"
fi
}
case "$1" in
'start')
start
;;
'stop')
stop
;;
'restart')
stop
start
;;
*)
echo "Usage: $0 {start|stop|restart}"
exit 1
esac
exit 0
redis访问测试
配置
#创建catalog
mkdir -p /opt/programs/presto_174/etc/catalog
#进入catalog
cd /opt/programs/presto_174/etc/catalog
#创建redis.properties
connector.name=redis
redis.table-names=antnest
redis.nodes=192.168.1.109:6379
redis.password=dalu
redis.default-schema=redis
redis.database-index=0
redis.table-description-dir=etc/redis
redis.hide-internal-columns=false
创建redis映射
#创建redis映射目录
mkdir -p /opt/programs/presto_174/etc/redis
#进入映射目录
cd /opt/programs/presto_174/etc/redis
#创建映射文件redis.json
{
"tableName": "antnest",
"schemaName": "redis",
"key": {
"dataFormat": "raw",
"fields": [
{
"name": "key",
"type": "VARCHAR"
}
]
},
"value": {
"dataFormat": "raw",
"fields": [
{
"name": "value",
"type": "VARCHAR"
}
]
}
}
ps:上述所有配置文件在coordinator和worker都需要拷贝一份,然后重启coordinator和worker上的presto进程
测试
cd /opt/programs/presto_174/bin
./presto-cli --server 192.168.1.109:9999 --catalog redis
mongodb访问测试
配置 : 可以创建多个,如 mongoUser.properties , xxxx.properties
#在catalog目录创建mongodb配置mongodb.properties
connector.name=mongodb
mongodb.seeds=192.168.1.109:12001
mongodb.credentials=admin:dalu@admin
mongodb.socket-keep-alive=true
mongodb.schema-collection=admin
测试
kafka访问测试
配置
#在catalog目录创建kafka配置
connector.name=kafka
kafka.table-names=json_data
kafka.nodes=192.168.1.109:9092
kafka.hide-internal-columns=false
kafka.table-description-dir=etc/kafka
kafka.default-schema=kafka
配置映射
#创建kafka映射目录
mkdir -p /opt/programs/presto_174/etc/kafka
#创建kafka映射文件json_data.json
{
"tableName": "json_data",
"schemaName": "kafka",
"topicName": "json_data",
"key": {
"dataFormat": "raw",
"fields": [
{
"name": "kafka_key",
"type": "BIGINT",
"dataFormat": "LONG",
"hidden": "false"
}
]
},
"message": {
"dataFormat": "json",
"fields": [
{
"name": "name",
"mapping": "name",
"type": "VARCHAR"
},
{
"name": "phone",
"mapping": "phone",
"type": "VARCHAR"
}
]
}
}
测试
第三方WEB
airpal
airpal不建议使用,很久没有更新了,版本还在0.1,一直没有变化,优点是有认证,采用apache shiro认证,UI界面不错。
yanagishima
1.x版本的时候,还比较搓,作者比较勤奋,一直在更新,现在已经更新到3.0版本了,界面也比airpal好看了
下载
cd /opt/programs
wget 'wget https://bintray.com/artifact/download/wyukawa/generic/yanagishima-3.0.zip'
unzip yanagishima-3.0.zip
配置
cd /opt/programs/yanagishima-3.0/conf
#配置文件yanagishima.properties
jetty.port=8088
presto.query.max-run-time-seconds=1800
presto.max-result-file-byte-size=1073741824
presto.datasources=presto1
presto.coordinator.server.presto1=http://192.168.1.109:9999
catalog.presto1=mongodb
schema.presto1=webspider
select.limit=500
访问
首页
选择
结果