zoukankan      html  css  js  c++  java
  • Linux系统运维之Hadoop、Hive、Flume数据处理

    配置环境

    主机名
    IP 备注
    Hadoop-Data01
    192.168.0.194
    Hadoop-Master/Hive/Mysql/Flume-Agent
    Hadoop-Data02
    192.168.0.195
    Hadoop-Slave
    软件版本:
    CentOS release 6.6 (Final)
    Hdk-8u131-linux-x64
    Hadoop-2.7.3
    Hive-2.1.1
    Apache-flume-1.7.0-bin
    下载JDK、Hadoop、Hive、Flume:
    [root@Hadoop-Data01 soft]# wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz
    [root@Hadoop-Data01 soft]# wget http://apache.fayea.com/hadoop/common/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz
    [root@Hadoop-Data01 soft]# wget http://apache.fayea.com/hive/hive-2.1.1/apache-hive-2.1.1-bin.tar.gz

    Hadoop部署

    修改主机名,/etc/hosts文件,确保各主机DNS解析正确:
    [root@Hadoop-Data01 ~]# cat /etc/hosts
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    192.168.0.194   Hadoop-Data01
    192.168.0.195   Hadoop-Data02
    192.168.0.196   Hadoop-Data03
    注:Slave服务器内容同上.

      配置Hadoop-Master、Hadoop-Slave主机间的免key登录:

    [root@Hadoop-Data01 ~]# vim /etc/ssh/sshd_config
    RSAAuthentication yes
    PubkeyAuthentication yes
    注:这里可以通过sed:sed -i '47,48s/^#//g' /etc/ssh/sshd_config
    [root@Hadoop-Data01 ~]# ssh-keygen -t rsa
    [root@Hadoop-Data01 .ssh]# cat authorized_keys 
    ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA2JGjCEwc+H3/5Y939DHSkhHYAO7qPjO86gyaqvlN2j1ZMUhdKhXUmTH0pBBwXIqp9jooTXxtIu55cuBvOeBD6eUKN5mH9rydRIXm8HEvb9nQzOvVghP1E9lBTGsGXkUWDo0KPkFYOhb2NguYibzVUgpUpAt0NY5iqdenXNqvDOWGhWqDsg/C6VnUzsxskiT9x2EROhddWQnYsObXxjOasgdGPngzZsJZPchRboS+HfvVF0uSyUjljtKsQqYOX2Nt0plO4t6VlcnZXvjDXKezJCNwGToFvvoiIHnjVu/akgtv/bpd8HZp1dZEj7cYnSFkqN5xdodg7TmtjAjobutU5Q== root@Hadoop-Data01
    ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAvQ3JZOtdfFvrsM/m6YwQQuGkOCpNt0+tw87tS4p1gB98ZAn+zaUnFMw5Gvo0i1KvHVaxmb0s1gqDjGDNVLQM5MB60emyVFHLs6DZBI5f4c0BiA17KfDRzlsfuTmuLdymmoj54OhPbEcH+mwo/N1UK9V0gqxAB9abC6UFT00MXXXJN1+qBkV9mUuFbXhn4m5/DCoEbIxvMlWghAsSrDtMaMtJYRumRvd7MLwwefdCYyQd8dZASE1Z8VP0K/BDRntWXCeKGCVMb4uJAnSdhN6ZcRme/Qlx0YCkPpQir3jgcblVW5RODNUyaIc+vUMp9UYagvK7nKKfWAGa/MPdyfu2nw== root@Hadoop-Data02
    ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA1pC5Py1aqbojVetakak3WmxJf4DgmTe1ci60tn9Hyq84kdAhw7z1lAQN544uPDDvl4XPki36Y13Hjl0P+S3g11iOi42FRugkBDmokqADZrUfp5tqWX8K9QvYMePoyiuQlnrGAyCpOiMmEAykBR6lVkNHgPAWThjU9eggt6dalMPiy/dDKZNemlWGHy8wdS5PyjVsIuDGgTtNLADn6OOaYcO/UWq78gqc1Nkq4mNxKSTYorh7taki9SKw4cq0NeggDFz7cZEewtgJdRla0W2ZKz8bgfuUSSntbN55/uCVUSgK+kurqRmklQ3sA3c9687BH1Lse5luDFJRaYo2wa5nlQ== root@Hadoop-Data03
    注:合并三台服务器/root/.ssh/id_rsa.pub文件到authorized_keys
    [root@Hadoop-Data01 .ssh]# scp authorized_keys root@192.168.0.195:/root/.ssh/
    [root@Hadoop-Data01 .ssh]# scp authorized_keys root@192.168.0.196:/root/.ssh/

      在各个主机上安装JDK

    [root@Hadoop-Data01 soft]# tar -xf jdk-8u131-linux-x64.tar.gz
    [root@Hadoop-Data01 soft]# cp -r jdk1.8.0_131 /usr/local/
    [root@Hadoop-Data01 soft]# cd /usr/local/
    [root@Hadoop-Data01 local]# ln -s jdk1.8.0_131 jdk
    [root@Hadoop-Data01 ~]# vim /etc/profile
    >>>>>
    ulimit -n 10240
    export JAVA_HOME=/usr/local/jdk
    export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
    export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
    [root@Hadoop-Data01 ~]# source /etc/profile
    [root@Hadoop-Data03 ~]# java -version
    java version "1.8.0_131"
    Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

     安装hadoop

    /usr/local/hadoop/etc/hadoop/core-site.xml配置文件:
    [root@Hadoop-Data01 soft]# tar -xf hadoop-2.7.3.tar.gz
    [root@Hadoop-Data01 soft]# mv hadoop-2.7.3 /usr/local/
    [root@Hadoop-Data01 soft]# cd /usr/local/
    [root@Hadoop-Data01 local]# ln -s hadoop-2.7.3 hadoop
    [root@Hadoop-Data01 hadoop]# vim core-site.xml
    >>>>>
    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://192.168.0.194:9000</value>
        </property>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>file:/usr/local/hadoop/tmp</value>
        </property>
        <property>
            <name>io.file.buffer.size</name>
            <value>1024</value>
        </property>
    </configuration>
    注:
    <fs.defaultFS>:默认文件系统的名称,URI形式,uri的scheme需要由(fs.SCHEME.impl)指定文件系统实现类;uri的authority部分用来指定host、port等;默认是本地文件系统。HA方式,这里设置服务名,例如:hdfs:// 192.168.0.194:9000,HDFS的客户端访问HDFS需要此参数;
    <hadoop.tmp.dir>:Hadoop的临时目录,其它目录会基于此路径,本地目录。只可以设置一个值;建议设置到一个足够空间的地方,而不是默认的/tmp下,服务端参数,修改需重启;
    <io.file.buffer.size>:在读写文件时使用的缓存大小,这个大小应该是内存Page的倍数,建议1M。
    
    ----------
    
    /usr/local/hadoop/etc/hadoop/hdfs-site.xml配置文件:
    [root@Hadoop-Data01 hadoop]# vim hdfs-site.xml
    >>>>>
    <configuration>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:/usr/local/hadoop/dfs/name</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:/usr/local/hadoop/dfs/data</value>
        </property>
        <property>
            <name>dfs.replication</name>
            <value>2</value>
        </property>
        <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>192.168.0.194:9001</value>
        </property>
        <property>
            <name>dfs.webhdfs.enabled</name>
            <value>true</value>
        </property>
    </configuration>
    注:
    <dfs.namenode.name.dir>:本地磁盘目录,NN存储fsimage文件的地方;可以是按逗号分隔的目录列表,fsimage文件会存储在全部目录,冗余安全;这里多个目录设定,最好在多个磁盘,另外,如果其中一个磁盘故障,不会导致系统故障,会跳过坏磁盘。由于使用了HA,建议仅设置一个,如果特别在意安全,可以设置2个;
    <dfs.datanode.data.dir>:本地磁盘目录,HDFS数据应该存储Block的地方。可以是逗号分隔的目录列表(典型的,每个目录在不同的磁盘),这些目录被轮流使用,一个块存储在这个目录,下一个块存储在下一个目录,依次循环;每个块在同一个机器上仅存储一份,不存在的目录被忽略;必须创建文件夹,否则被视为不存在;
    <dfs.replication>:数据块副本数,此值可以在创建文件是设定,客户端可以只有设定,也可以在命令行修改;不同文件可以有不同的副本数,默认值用于未指定时。
    <dfs.namenode.secondary.http-address>:SNN的http服务地址,如果是0,服务将随机选择一个空闲端口,使用了HA后,就不再使用SNN;
    <dfs.webhdfs.enabled>:在NN和DN上开启WebHDFS (REST API)功能。
    
    ----------
    
    /usr/local/hadoop/etc/hadoop/mapred-site.xml配置文件:
    [root@Hadoop-Data01 hadoop]# cp mapred-site.xml.template mapred-site.xml
    [root@Hadoop-Data01 hadoop]# vim mapred-site.xml
    >>>>>
    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>192.168.0.194:10020</value>
        </property>
        <property>
           <name>mapreduce.jobhistory.webapp.address</name>
            <value>192.168.0.194:19888</value>
        </property>
    </configuration>
    
    /usr/local/hadoop/etc/hadoop/yarn-site.xml配置文件:
    [root@Hadoop-Data01 hadoop]# vim yarn-site.xml
    >>>>>
    <configuration>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>  
        <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
            <name>yarn.resourcemanager.address</name>
            <value>192.168.0.194:8032</value>
        </property>
        <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>192.168.0.194:8030</value>
        </property>
        <property>
            <name>yarn.resourcemanager.resource-tracker.address</name>
            <value>192.168.0.194:8031</value>
        </property>
        <property>
            <name>yarn.resourcemanager.admin.address</name>
            <value>192.168.0.194:8033</value>
        </property>
        <property>
            <name>yarn.resourcemanager.webapp.address</name>
            <value>192.168.0.194:8088</value>
        </property>
        <property>
            <name>yarn.nodemanager.resource.memory-mb</name>
            <value>8192</value>
        </property>
    </configuration>
    注:
    <mapreduce.framework.name>:MapReduce按照任务大小和设置的不同,提供了两种任务模式:①本地模式(LocalJobRunner实现)mapreduce.framework.name设置为local,则不会使用YARN集群来分配资源,在本地节点执行。在本地模式运行的任务,无法发挥集群的优势。在web UI是查看不到本地模式运行的任务。②Yarn模式(YARNRunner实现)mapreduce.framework.name设置为yarn,当客户端配置mapreduce.framework.name为yarn时, 客户端会使用YARNRunner与服务端通信, 而YARNRunner真正的实现是通过ClientRMProtocol与RM交互, 包括提交Application, 查询状态等功能。
    <mapreduce.jobhistory.address>和<mapreduce.jobhistory.webapp.address>:Hadoop自带了一个历史服务器,可以通过历史服务器查看已经运行完的Mapreduce作业记录,比如用了多少个Map、用了多少个Reduce、作业提交时间、作业启动时间、作业完成时间等信息。
    
    ----------
    
    配置hadoop环境变量:
    [root@Hadoop-Data01 hadoop]# vim /etc/profile
    >>>>>
    export HADOOP_HOME=/usr/local/hadoop
    export PATH=$HADOOP_HOME/bin:$PATH
    
    [root@Hadoop-Data01 hadoop]# vim hadoop-env.sh
    >>>>>
    export JAVA_HOME=/usr/local/jdk
    
    添加从节点IP到Slave文件:
    [root@Hadoop-Data01 hadoop]# echo > slave && echo 192.168.0.195 > slave
    
    拷贝hadoop服务目录到从主机:
    [root@Hadoop-Data01 local]# scp -r hadoop-2.7.3 root@192.168.0.195:/usr/local/
    
    进入Hadoop目录,启动Hadoop-Master主机上的服务:
    ①初始化:
    [root@Hadoop-Data01 bin]# sh /usr/local/hadoop/bin/hdfs namenode -format
    ②启动服务:
    [root@Hadoop-Data01 sbin]# sh /usr/local/hadoop/sbin/start-all.sh
    ③关闭服务:
    [root@Hadoop-Data01 sbin]# sh /usr/local/hadoop/sbin/stop-all.sh
    ④查看组件:
    [root@Hadoop-Data01 sbin]# jps
    6517 SecondaryNameNode
    6326 NameNode
    6682 ResourceManager
    6958 Jps

     测试访问OK

    浏览器访问:http://192.168.0.194:8088/
    

    浏览器访问:http://192.168.0.194:50070/
    

     

    部署Hive

     解压部署、配置环境变量:

    [root@Hadoop-Data01 soft]# tar -xf  apache-hive-2.1.1-bin.tar.gz
    [root@Hadoop-Data01 soft]# mv apache-hive-2.1.1-bin /usr/local/
    [root@Hadoop-Data01 soft]# cd /usr/local/
    [root@Hadoop-Data01 local]# ln -s apache-hive-2.1.1-bin hive
    [root@Hadoop-Data01 conf]# cp hive-env.sh.template hive-env.sh
    [root@Hadoop-Data01 conf]# vim hive-env.sh
    >>>>>
    HADOOP_HOME=/usr/local/hadoop
    export HIVE_CONF_DIR=/usr/loca/hive/conf
    export HIVE_AUX_JARS_PATH=/usr/loca/hive/lib

     安装部署mysql环境

    [root@Hadoop-Data01 conf]# yum install httpd php mysql mysql-server php-mysql -y
    [root@Hadoop-Data01 conf]# /usr/bin/mysqladmin -u root password 'hadoopmysql'
    [root@Hadoop-Data01 conf]# /usr/bin/mysqladmin -u root -h192.168.0.194 password 'hadoopmysql'
    [root@Hadoop-Data01 conf]# mysql -uroot -phadoopmysql
    mysql> create user 'hive' identified by 'hive';
    mysql> grant all privileges on *.* to 'hive'@'localhost' identified by 'hive';
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> grant all privileges on *.* to 'hive'@'%' identified by 'hiveycfw';
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> flush privileges;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> create database hive;
    Query OK, 1 row affected (0.00 sec)

     修改HIVE配置文件:

    [root@Hadoop-Data01 conf]# vim hive-site.xml
    44行:>>>>>
    <name>hive.exec.local.scratchdir</name>
    <value>/usr/local/hive/iotmp</value>
    批量替换::%s/${system:java.io.tmpdir}//usr/local/hive/iotmp/g
    486行:>>>>>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
    501行:>>>>>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
    686行:>>>>>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
    933行:>>>>>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    957行:>>>>>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    
    拷贝JDBC驱动到lib目录下:
    [root@Hadoop-Data01 mysql-connector-java-5.1.42]# cp mysql-connector-java-5.1.42-bin.jar  /usr/local/hive/lib/
    
    精简版hive-site.xml:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
        <property>
            <name>javax.jdo.option.ConnectionURL</name>     #数据库连接串
            <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionDriverName</name>  #JDBC驱动
            <value>com.mysql.jdbc.Driver</value>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionUserName</name>    #数据库账号
            <value>hive</value>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionPassword</name>    #数据库密码
            <value>hive</value>
        </property>
        <property>
            <name>hive.metastore.warehouse.dir</name>       #该参数指定了 Hive 的数据存储目录,默认位置在 HDFS 上面的 /user/hive/warehouse 路径下
            <value>/user/hive/warehouse</value>
        </property>
        <property>
            <name>hive.exec.scratchdir</name>       #该参数指定了 Hive 的数据临时文件目录,默认位置为 HDFS 上面的 /tmp/hive 路径下
            <value>/tmp/hive</value>
        </property>
    </configuration>

     初始化Mysql

    [root@Hadoop-Data01 bin]# schematool -initSchema -dbType mysql      #初始化完成后,mysql数据库中会增加hive库
    which: no hbase in (/usr/local/hive/bin:/usr/local/hive/conf:/usr/local/hadoop/bin:/usr/local/jdk//bin:/usr/local/jdk//jre/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    Metastore connection URL:    jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true
    Metastore Connection Driver :    com.mysql.jdbc.Driver
    Metastore connection User:   hive
    Starting metastore schema initialization to 2.1.0
    Initialization script hive-schema-2.1.0.mysql.sql
    Initialization script completed
    schemaTool completed

     启动Hive

    [root@Hadoop-Data01 bin]# ./hive
    Logging initialized using configuration in jar:file:/usr/local/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
    hive>
    hive> show functions;   #查看hive函数;
    hive> desc function day;    #查看day函数详细信息;
    OK
    day(param) - Returns the day of the month of date/timestamp, or day component of interval
    Time taken: 0.039 seconds, Fetched: 1 row(s)

    部署Flume

    一、简介

    1. flume是分布式的日志收集系统,把收集来的数据传送到目的地去。
    2. flume里面有个核心概念,叫做agent。agent是一个java进程,运行在日志收集节点。
    3. agent里面包含3个核心组件:source、channel、sink。 source组件是专用于收集日志的,可以处理各种类型各种格式的日志数据,包括avro、thrift、exec、jms、spooling directory、netcat、sequence generator、syslog、http、legacy、自定义。 source组件把数据收集来以后,临时存放在channel中。 channel组件是在agent中专用于临时存储数据的,可以存放在memory、jdbc、file、自定义。 channel中的数据只有在sink发送成功之后才会被删除。 sink组件是用于把数据发送到目的地的组件,目的地包括hdfs、logger、avro、thrift、ipc、file、null、hbase、solr、自定义。
    4. 在整个数据传输过程中,流动的是event。事务保证是在event级别。
    5. flume可以支持多级flume的agent,支持扇入(fan-in)、扇出(fan-out)。

    二、安装

     解压flume文件,传输到/usr/local/下(安装到hadoop服务器):

    [root@Hadoop-Data01 soft]# cp -r apache-flume-1.7.0-bin /usr/local/
    [root@Hadoop-Data01 soft]# cd /usr/local/
    [root@Hadoop-Data01 local]# ln -s apache-flume-1.7.0-bin flume
    [root@Hadoop-Data01 conf]# cp flume-env.sh.template flume-env.sh
    [root@Hadoop-Data01 conf]# vim flume-env.sh
    >>>>:
    export JAVA_HOME=/usr/local/jdk
    [root@Hadoop-Data01 bin]# ./flume-ng version
    Flume 1.7.0
    Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
    Revision: 511d868555dd4d16e6ce4fedc72c2d1454546707
    Compiled by bessbd on Wed Oct 12 20:51:10 CEST 2016
    From source with checksum 0d21b3ffdc55a07e1d08875872c00523

     下载flume服务到需要采集的服务器,这里是windows,然后配置/conf/flume-conf.properties:

    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # Describe/configure the source
    a1.sources.r1.type = spooldir
    a1.sources.r1.channels = c1
    a1.sources.r1.spoolDir = D:\flume\log     #收集这个目录下的文件
    a1.sources.r1.fileHeader = true
    a1.sources.r1.basenameHeader = true
    a1.sources.r1.basenameHeaderKey = fileName
    a1.sources.r1.ignorePattern = ^(.)*\.tmp$
    a1.sources.r1.interceptors = i1
    a1.sources.r1.interceptors.i1.type = timestamp
    
    a1.sinks.k1.type = avro
    a1.sinks.k1.hostname = 192.168.0.194        #接受agent端地址
    a1.sinks.k1.port = 19949
    
    # Use a channel which buffers events in memory
    a1.channels.c1.type=memory  
    a1.channels.c1.capacity=10000  
    a1.channels.c1.transactionCapacity=1000  
    a1.channels.c1.keep-alive=30  
    
    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1

     启动采集端服务,windows端:

    D:apache-flume-1.7.0-binin>flume-ng.cmd agent --conf ..conf --conf-file ..confflume-conf.properties --name a1

     配置Linux端,agent配置/conf/flume-conf.properties:

    tier1.sources=source1
    tier1.channels=channel1
    tier1.sinks=sink1
    
    tier1.sources.source1.type=avro
    tier1.sources.source1.bind=192.168.0.194    #flume接收端地址
    tier1.sources.source1.port=19949
    tier1.sources.source1.channels=channel1
    
    
    tier1.channels.channel1.type=memory
    tier1.channels.channel1.capacity=10000
    tier1.channels.channel1.transactionCapacity=1000
    tier1.channels.channel1.keep-alive=30
    
    tier1.sinks.sink1.channel=channel1
    
    tier1.sources.source1.interceptors=e1 e2
    tier1.sources.source1.interceptors.e1.type=com.huawei.flume.InterceptorsCommons$Builder
    tier1.sources.source1.interceptors.e2.type=com.huawei.flume.InterceptorsFlows$Builder
    
    tier1.sinks.sink1.type = hdfs
    tier1.sinks.sink1.hdfs.path=hdfs://192.168.0.194:9000/user/hive/warehouse/%{table_name}/inputdate=%Y-%m-%d      #flume接受端agent,hive表名
    tier1.sinks.sink1.hdfs.writeFormat = Text
    tier1.sinks.sink1.hdfs.fileType = DataStream
    tier1.sinks.sink1.hdfs.fileSuffix = .log
    tier1.sinks.sink1.hdfs.rollInterval = 0
    tier1.sinks.sink1.hdfs.rollSize = 0
    tier1.sinks.sink1.hdfs.rollCount = 0
    tier1.sinks.sink1.hdfs.useLocalTimeStamp = true
    tier1.sinks.sink1.hdfs.idleTimeout = 60
    tier1.sinks.sink1.hdfs.rollSize = 125829120
    tier1.sinks.sink1.hdfs.minBlockReplicas = 1

     启动Linux端,agent服务:

    [root@Hadoop-Data01 conf]# flume-ng agent -c /usr/local/flume/conf/ -f /usr/local/flume/conf/flume-conf.properties -n tier1 -Dflume.root.logger=DEBUG,console
     

    本文来自博客园,作者:白日梦想家Zz,转载请注明原文链接:https://www.cnblogs.com/zzlain/p/6895346.html

  • 相关阅读:
    949. Largest Time for Given Digits
    450. Delete Node in a BST
    983. Minimum Cost For Tickets
    16. 3Sum Closest java solutions
    73. Set Matrix Zeroes java solutions
    347. Top K Frequent Elements java solutions
    215. Kth Largest Element in an Array java solutions
    75. Sort Colors java solutions
    38. Count and Say java solutions
    371. Sum of Two Integers java solutions
  • 原文地址:https://www.cnblogs.com/zzlain/p/6895346.html
Copyright © 2011-2022 走看看