zoukankan      html  css  js  c++  java
  • Flume学习 & Kafka & Storm 等 & Log4J 配置

    正在学习这篇文章:

    http://blog.csdn.net/ymh198816/article/details/51998085

    和工作中接触的电商、订单、分析,可以结合起来。

    开宗明义,这幅图片:

    Strom是一个非常快的实时计算框架,至于快到什么程度呢?

    官网首页给出的数据是每一个Storm集群上的节点每一秒能处理一百万条数据。
    相比Hadoop的“Mapreduce”计算框架,Storm使用的是"Topology";
    Mapreduce程序在计算完成后最终会停下来,而Topology则是会永远运行下去除非你显式地使用“kill -9 XXX”命令停掉它。

    准备实际写一个实时分析系统。不然纸上得来终觉浅。

    首先需要让Java程序在Linux环境上运行。用leetcode的java程序来做实验,把leetcode工程的output目录拷贝到安装了java的机器(m42n05.gzns)。

    $ pwd
    /home/work/data/code/out/production/leetcode
    
    $ java com.company.Main
    Hello!
    ret:3
    
    注意,只能在这个目录,运行完成的package。如果换到子目录,就不行:
    
    $ pwd
    /home/work/data/code/out/production/leetcode/com
    
    $ java company.Main
    Error: Could not find or load main class company.Main

    用Idea创建了一个Maven项目“LogGenerator”,项目的主要代码如下:

    package com.comany.log.generator;
    
    /**
     * Created by baidu on 16/11/7.
     */
    
    import java.text.SimpleDateFormat;
    import java.util.Date;
    import java.util.Random;
    
    // Import log4j classes.
    import org.apache.log4j.LogManager;
    import org.apache.log4j.Logger;
    
    
    public class LogGenerator {
    
        public enum paymentWays {
            Wechat,Alipay,Paypal
        }
        public enum merchantNames {
            优衣库,天猫,淘宝,咕噜大大,快乐宝贝,守望先峰,哈毒妇,Storm,Oracle,Java,CSDN,跑男,路易斯威登,
            暴雪公司,Apple,Sumsam,Nissan,Benz,BMW,Maserati
        }
    
        public enum productNames {
            黑色连衣裙, 灰色连衣裙, 棕色衬衫, 性感牛仔裤, 圆脚牛仔裤,塑身牛仔裤, 朋克卫衣,高腰阔腿休闲裤,人字拖鞋,
            沙滩拖鞋
        }
    
        float[] skuPriceGroup = {299,399,699,899,1000,2000};
        float[] discountGroup = {10,20,50,100};
        float totalPrice = 0;
        float discount = 0;
        float paymentPrice = 0;
    
        private static final Logger logger = LogManager.getLogger(LogGenerator.class);
        private int logsNumber = 10;
    
        public void generate() {
    
            for(int i = 0; i <= logsNumber; i++) {
                logger.info(randomOrderInfo());
            }
        }
    
        public String randomOrderInfo() {
    
            SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
            Date date = new Date();
    
            String orderNumber = randomNumbers(5) + date.getTime();
    
            String orderDate = sdf.format(date);
    
            String paymentNumber = randomPaymentWays() + "-" + randomNumbers(8);
    
            String paymentDate = sdf.format(date);
    
            String merchantName = randomMerchantNames();
    
            String skuInfo = randomSkus();
    
            String priceInfo = calculateOrderPrice();
    
            return "orderNumber: " + orderNumber + " | orderDate: " + orderDate + " | paymentNumber: " +
                    paymentNumber + " | paymentDate: " + paymentDate + " | merchantName: " + merchantName +
                    " | sku: " + skuInfo + " | price: " + priceInfo;
        }
    
        private String randomPaymentWays() {
    
            paymentWays[] paymentWayGroup = paymentWays.values();
            Random random = new Random();
            return paymentWayGroup[random.nextInt(paymentWayGroup.length)].name();
        }
    
        private String randomMerchantNames() {
    
            merchantNames[] merchantNameGroup = merchantNames.values();
            Random random = new Random();
            return merchantNameGroup[random.nextInt(merchantNameGroup.length)].name();
        }
    
        private String randomProductNames() {
    
            productNames[] productNameGroup = productNames.values();
            Random random = new Random();
            return productNameGroup[random.nextInt(productNameGroup.length)].name();
        }
    
    
        private String randomSkus() {
    
            Random random = new Random();
            int skuCategoryNum = random.nextInt(3);
    
            String skuInfo ="[";
    
            totalPrice = 0;
            for(int i = 1; i <= 3; i++) {
    
                int skuNum = random.nextInt(3)+1;
                float skuPrice = skuPriceGroup[random.nextInt(skuPriceGroup.length)];
                float totalSkuPrice = skuPrice * skuNum;
                String skuName = randomProductNames();
                String skuCode = randomCharactersAndNumbers(10);
                skuInfo += " skuName: " + skuName + " skuNum: " + skuNum + " skuCode: " + skuCode
                        + " skuPrice: " + skuPrice + " totalSkuPrice: " + totalSkuPrice + ";";
                totalPrice += totalSkuPrice;
            }
    
    
            skuInfo += " ]";
    
            return skuInfo;
        }
    
        private String calculateOrderPrice() {
    
            Random random = new Random();
            discount = discountGroup[random.nextInt(discountGroup.length)];
            paymentPrice = totalPrice - discount;
    
            String priceInfo = "[ totalPrice: " + totalPrice + " discount: " + discount + " paymentPrice: " + paymentPrice +" ]";
    
            return priceInfo;
        }
    
        private String randomCharactersAndNumbers(int length) {
    
            String characters = "abcdefghijklmnopqrstuvwxyz0123456789";
            String randomCharacters = "";
            Random random = new Random();
            for (int i = 0; i < length; i++) {
                randomCharacters += characters.charAt(random.nextInt(characters.length()));
            }
            return randomCharacters;
        }
    
        private String randomNumbers(int length) {
    
            String characters = "0123456789";
            String randomNumbers = "";
            Random random = new Random();
            for (int i = 0; i < length; i++) {
                randomNumbers += characters.charAt(random.nextInt(characters.length()));
            }
            return randomNumbers;
        }
    
        public static void main(String[] args) {
    
            LogGenerator generator = new LogGenerator();
            generator.generate();
        }
    }

    运行的时候报错:

    log4j:WARN No appenders could be found for logger (com.comany.log.generator.LogGenerator).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

    所以增加一个 log4j.properties

    log4j.rootLogger=INFO,Console,File
    
    #控制台日志
    log4j.appender.Console=org.apache.log4j.ConsoleAppender
    log4j.appender.Console.Target=System.out
    log4j.appender.Console.layout=org.apache.log4j.PatternLayout
    log4j.appender.Console.layout.ConversionPattern=[%p][%t][%d{yyyy-MM-dd HH:mm:ss}][%C] - %m%n
    
    #普通文件日志
    log4j.appender.File=org.apache.log4j.RollingFileAppender
    log4j.appender.File.File=logs/generator.log
    log4j.appender.File.MaxFileSize=10MB
    #输出日志,如果换成DEBUG表示输出DEBUG以上级别日志
    log4j.appender.File.Threshold=ALL
    log4j.appender.File.layout=org.apache.log4j.PatternLayout
    log4j.appender.File.layout.ConversionPattern=[%p][%t][%d{yyyy-MM-dd HH:mm:ss}][%C] - %m%n

    然后在外层目录能够看到有Log目录生成:

    直接把target目录拷贝到Linux机器上,发现找不到依赖包 log4j。

    需要用 Intellij 进行打包。在File -> Project Structure里面。

    然后应该会自动生成Jar包(也可以Build->Build Artifacts) LogGenerator.jar ,拷贝到Linux机器上。

    但是开始不能运行。提示找不到Manifest.mf。搜索之后,发现,要选第二个选项“copy to ...”,而且必须把Manifest的目录从java改到Resources才行。

    这样选项之后,生成的目录里面有两个jar。目录拷贝到Linux,然后运行:

    $ java -jar LogGenerator.jar 
    
    能够看到logs目录下有新的日志生成:
    $ ll logs/*
    -rw-rw-r--  1 work work 12328 Nov  7 20:12 logs/generator.log

    但是Linux的log文件都是乱码。试了改SecureCRT配置什么的,都没有用。

    查看到文件编码是 ASCII TEXT

    只好用  (printf "357273277";cat generator.log) > File2 来修改文件编码格式。

    还是乱码。在log4j的配置文件里面加上

    log4j.appender.file.encoding=UTF-8

    还是乱码。加上上面哪个 BOM符号,还是乱码。

    这时候把mac上面的日志拷贝到Linux上,发现是正常的。那么还是log4j打印的地方出了问题。

    再仔细检查log4j的地方,发现上面log4j配置里面的file应该大写才行,要与上下文一致:

    log4j.appender.File.encoding=UTF-8

    修改之后,重新生成artifact,拷贝,运行。能看到中文啦:

    [INFO][main][2016-11-07 23:50:51][com.comany.log.generator.LogGenerator] - orderNumber: 494341478533851064 | orderDate: 2016-11-07 23:50:51 | paymentNumber: Alipay-46983228 | paymentDate: 2016-11-07 23:50:51 | merchantName: 守望先峰 | sku: [ skuName: 黑色连衣裙 skuNum: 3 skuCode: 06vteu0ewx skuPrice: 899.0 totalSkuPrice: 2697.0; skuName: 塑身牛仔裤 skuNum: 3 skuCode: oz13bdht0w skuPrice: 2000.0 totalSkuPrice: 6000.0; skuName: 圆脚牛仔裤 skuNum: 3 skuCode: geuum757jk skuPrice: 399.0 totalSkuPrice: 1197.0; ] | price: [ totalPrice: 9894.0 discount: 100.0 paymentPrice: 9794.0 ]
    [INFO][main][2016-11-07 23:50:51][com.comany.log.generator.LogGenerator] - orderNumber: 623011478533851071 | orderDate: 2016-11-07 23:50:51 | paymentNumber: Alipay-58335677 | paymentDate: 2016-11-07 23:50:51 | merchantName: 暴雪公司 | sku: [ skuName: 黑色连衣裙 skuNum: 1 skuCode: mp9fbajaj9 skuPrice: 699.0 totalSkuPrice: 699.0; skuName: 黑色连衣裙 skuNum: 1 skuCode: oaiwi0xj3z skuPrice: 2000.0 totalSkuPrice: 2000.0; skuName: 黑色连衣裙 skuNum: 1 skuCode: tkbd5a91iq skuPrice: 399.0 totalSkuPrice: 399.0; ] | price: [ totalPrice: 3098.0 discount: 100.0 paymentPrice: 2998.0 ]

    安装Flume

    参考 http://www.aboutyun.com/thread-8917-1-1.html

    http://www.cnblogs.com/smartloli/p/4468708.html

    下载了apache-flume-1.7.0-bin.tar.gz , 拷贝到 m42n06:/home/work/data/installed 

    解压后在conf目录 cp flume-env.sh.template flume-env.sh

    把JAVA_HOME改对:

    export JAVA_HOME=/home/work/.jumbo/opt/sun-java8/

    运行命令,报错:

    $ bin/flume-ng version
    bin/flume-ng: line 82: syntax error in conditional expression: unexpected token `('
    bin/flume-ng: line 82: syntax error near `^java.library.path=(.'
    bin/flume-ng: line 82: `      if [[ $line =~ ^java.library.path=(.*)$ ]]; then'

    怀疑又是shell脚本的原因。。

    用jumbo 装了一个 zsh,但是执行的时候貌似还是有点问题。再用jumbo装一个coreutils试试吧。

    装好coreutils之后 bash版本没有变化,试了一下还是不行,只好还是用zsh。

    把flume-ng脚本的第一行改成 #!/home/work/.jumbo/bin/zsh-5.0.2

    运行报错:

    $ bin/flume-ng version
    bin/flume-ng version
    bin/flume-ng:cd:409: too many arguments
    run_flume:13: no such file or directory: /home/work/data/installed/apache-flume-1.7.0-bin/bin/java

    发现原来是要指定配置文件,而且不能指定文件名,只能指定目录,如下:

    $ bin/flume-ng version -c conf
    bin/flume-ng version -c conf
    Flume 1.7.0
    Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
    Revision: 511d868555dd4d16e6ce4fedc72c2d1454546707
    Compiled by bessbd on Wed Oct 12 20:51:10 CEST 2016
    From source with checksum 0d21b3ffdc55a07e1d08875872c00523

    准备配一个flume,sink输出到hdfs上面,注意我们能够通过两种命令方式访问hdfs上面的文件:

    $ ../hadoop-2.7.3/bin/hdfs dfs -cat /output/part-00000
    ../hadoop-2.7.3/bin/hdfs dfs -cat /output/part-00000
          5       5      15
    
    $ ../hadoop-2.7.3/bin/hadoop fs -cat /output/part-00000
    ../hadoop-2.7.3/bin/hadoop fs -cat /output/part-00000
          5       5      15

    配置成hdfs的方式:

    agent1.sources = origin
    agent1.channels = memoryChannel
    agent1.sinks = hsink
    
    # For each one of the sources, the type is defined
    agent1.sources.origin.type = exec
    agent1.sources.origin.command = tail -f /home/work/data/LogGenerator_jar/logs/generator.log
    # The channel can be defined as follows.
    agent1.sources.origin.channels = memoryChannel
    
    # Each sink's type must be defined
    agent1.sinks.hsink.type = hdfs
    agent1.sinks.hsink.hdfs.path = /output/logOut
    agent1.sinks.hsink.hdfs.fileType = DataStream
    agent1.sinks.hsink.hdfs.writeFormat=TEXT
    agent1.sinks.hsink.hdfs.rollInterval=1
    agent1.sinks.hsink.hdfs.filePrefix=%Y-%m-%d
    #Specify the channel the sink should use
    agent1.sinks.hsink.channel = memoryChannel
    
    # Each channel's type is defined.
    agent1.channels.memoryChannel.type = memory
    # Other config values specific to each type of channel(sink or source)
    # can be defined as well
    # In this case, it specifies the capacity of the memory channel
    agent1.channels.memoryChannel.capacity = 100

    然后运行命令,还是报错:

    $ bin/flume-ng agent -n agent -f conf/flume-conf.properties  -c conf
    Info: Sourcing environment configuration script /home/work/data/installed/apache-flume-1.7.0-bin/conf/flume-env.sh
    Info: Including Hive libraries found via () for Hive access
    +run_flume:13> /home/work/.jumbo/opt/sun-java8//bin/java -Xmx20m -cp '/home/work/data/installed/apache-flume-1.7.0-bin/conf:/home/work/data/installed/apache-flume-1.7.0-bin/lib/*:/lib/*' '-Djava.library.path=' org.apache.flume.node.Application ' -n agent1 -f flume-conf.properties'

    日志 logs/flums.log显示:

    08 Nov 2016 17:17:01,783 ERROR [main] (org.apache.flume.node.Application.main:348)  - A fatal error occurred while running. Exception follows.
    org.apache.commons.cli.MissingOptionException: Missing required option: n
            at org.apache.commons.cli.Parser.checkRequiredOptions(Parser.java:299)
            at org.apache.commons.cli.Parser.parse(Parser.java:231)
            at org.apache.commons.cli.Parser.parse(Parser.java:85)
            at org.apache.flume.node.Application.main(Application.java:263)

    无语。。

    又安装了flume 1.5试了下,还是不行。

    又仔细检查了一下,很可能是java版本太高导致的。只能再装下java1.7试试看。发现还是不行。

    最后发现是运行的java命令里面貌似有点问题:

    /home/work/.jumbo/opt/sun-java8//bin/java -Xmx20m -cp '/home/work/data/installed/apache-flume-1.7.0-bin/conf:/home/work/data/installed/apache-flume-1.5.0-bin/lib/*' '-Djava.library.path=' org.apache.flume.node.Application ' -n agent -f flume-conf.properties'

    需要把最后的引号去掉。

    然后改成向日志输出。配置文件flume-conf.properties如下:

    agent.sources = origin
    agent.channels = memoryChannel
    agent.sinks = loggerSink
    
    # For each one of the sources, the type is defined
    agent.sources.origin.type =  exec
    agent.sources.origin.command = tail -f /home/work/data/LogGenerator_jar/logs/generator.log
    
    # The channel can be defined as follows.
    agent.sources.origin.channels = memoryChannel
    
    # Each sink's type must be defined
    agent.sinks.loggerSink.type = logger
    
    #Specify the channel the sink should use
    agent.sinks.loggerSink.channel = memoryChannel
    
    # Each channel's type is defined.
    agent.channels.memoryChannel.type = memory
    
    # Other config values specific to each type of channel(sink or source)
    # can be defined as well
    # In this case, it specifies the capacity of the memory channel
    agent.channels.memoryChannel.capacity = 100

    Java命令如下:

    /home/work/.jumbo/opt/sun-java8//bin/java -Xmx20m -cp '/home/work/data/installed/apache-flume-1.7.0-bin/conf:/home/work/data/installed/apache-flume-1.5.0-bin/lib/*' '-Dflume.root.logger=INFO,console' org.apache.flume.node.Application   -n agent -f conf/flume-conf.properties 

    然后貌似能够监听到日志了。。。

    其他的地方,还要再处理。。。

    不知道为什么。登出重新登录之后,居然就好了。。。

    用了HDFS作为sink的新的配置文件内容:

    agent.sources = origin
    agent.channels = memoryChannel
    agent.sinks = hdfsSink
    
    # For each one of the sources, the type is defined
    agent.sources.origin.type =  exec
    agent.sources.origin.command = tail -f /home/work/data/LogGenerator_jar/logs/generator.log
    # The channel can be defined as follows.
    agent.sources.origin.channels = memoryChannel
    
    # Each sink's type must be defined
    agent.sinks.hdfsSink.type = hdfs
    agent.sinks.hdfsSink.hdfs.path = /output/Logger
    agent.sinks.hdfsSink.hdfs.fileType = DataStream
    agent.sinks.hdfsSink.hdfs.writeFormati = TEXT
    agent.sinks.hdfsSink.hdfs.rollInterval = 1
    agent.sinks.hdfsSink.hdfs.filePrefix=%Y-%m-%d
    
    # 不加下面这一行会一直报错:Expected timestamp in the Flume event headers
    agent.sinks.hdfsSink.hdfs.useLocalTimeStamp
    = true #Specify the channel the sink should use agent.sinks.hdfsSink.channel = memoryChannel # Each channel's type is defined. agent.channels.memoryChannel.type = memory # Other config values specific to each type of channel(sink or source) # can be defined as well # In this case, it specifies the capacity of the memory channel agent.channels.memoryChannel.capacity = 100

    然后运行命令:

    bin/flume-ng agent -n agent -f conf/flume-conf.properties  -c conf

    然后查看日志 logs/flume.log,貌似成功了(因为直接tail -f 是有内容的)

    08 Nov 2016 23:19:03,709 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:231)  - Creating /output/Logger/2016-11-08.1478618341857.tmp
    08 Nov 2016 23:19:04,818 INFO  [hdfs-hdfsSink-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.close:357)  - Closing /output/Logger/2016-11-08.1478618341857.tmp
    08 Nov 2016 23:19:04,828 INFO  [hdfs-hdfsSink-call-runner-9] (org.apache.flume.sink.hdfs.BucketWriter$8.call:618)  - Renaming /output/Logger/2016-11-08.1478618341857.tmp to /output/Logger/2016-11-08.1478618341857
    08 Nov 2016 23:19:04,835 INFO  [hdfs-hdfsSink-roll-timer-0] (org.apache.flume.sink.hdfs.HDFSEventSink$1.run:382)  - Writer callback called.

    在另一台机器05上面看下HDFS是否生成。

    $ ~/data/installed/hadoop-2.7.3/bin/hadoop fs -ls /output/Logger
    Found 5 items
    -rw-r--r--   3 work supergroup       1187 2016-11-08 23:19 /output/Logger/2016-11-08.1478618341853
    -rw-r--r--   3 work supergroup       1182 2016-11-08 23:19 /output/Logger/2016-11-08.1478618341854
    -rw-r--r--   3 work supergroup       1188 2016-11-08 23:19 /output/Logger/2016-11-08.1478618341855
    -rw-r--r--   3 work supergroup       1196 2016-11-08 23:19 /output/Logger/2016-11-08.1478618341856
    -rw-r--r--   3 work supergroup       1176 2016-11-08 23:19 /output/Logger/2016-11-08.1478618341857
    
    发现生成的是一个目录,然后里面有按照时间戳生成的文件

    然后打开一个文件看看:

    $ ~/data/installed/hadoop-2.7.3/bin/hadoop fs -cat /output/Logger/2016-11-08.1478618810929
    [INFO][main][2016-11-08 23:26:50][com.comany.log.generator.LogGenerator] - orderNumber: 966811478618810487 | orderDate: 2016-11-08 23:26:50 | paymentNumber: Wechat-92539661 | paymentDate: 2016-11-08 23:26:50 | merchantName: Sumsam | sku: [ skuName: 人字拖鞋 skuNum: 1 skuCode: 1q9vaq3484 skuPrice: 2000.0 totalSkuPrice: 2000.0; skuName: 朋克卫衣 skuNum: 3 skuCode: 7vwna4abw2 skuPrice: 2000.0 totalSkuPrice: 6000.0; skuName: 塑身牛仔裤 skuNum: 3 skuCode: 50qem6vlid skuPrice: 699.0 totalSkuPrice: 2097.0; ] | price: [ totalPrice: 10097.0 discount: 100.0 paymentPrice: 9997.0 ]

    采用HDFS输出成功。

    然后安装 Kafka。

    Kafka安装新起一篇:http://www.cnblogs.com/charlesblc/p/6046023.html 

  • 相关阅读:
    函数重载和函数指针在一起
    Uva
    Uva
    Uva
    Uva
    Uva
    CCPC-Wannafly-day5
    CCPC-Wannafly-day3
    CCPC-Wannafly-day2
    CCPC-Wannafly-Winter 2020.01.12总结
  • 原文地址:https://www.cnblogs.com/charlesblc/p/6038112.html
Copyright © 2011-2022 走看看