zoukankan      html  css  js  c++  java
  • Linux操作、hadoop和sh脚本小结

    近期一直在忙项目上的事情,今天对以前的工作做一个简单的小结。明天就是国庆节啦。

    1  脚本可以手动执行,可是crontab缺总是找不到路径?

    #!/bin/bash
    . /etc/profile . /home/sms/.bash_profile

    请在脚本中加入 本机的环境变量和用户的环境变量的配置

    2 config.ini文件总是出现乱码,导致读入的数据莫名其妙?

    更改config的编码为ANSI, UTF-8不一定是更好的选择。

    3 logback.xml配置不起作用

    可能是pom文件引入jar文件相互之间有屏蔽的影响。把logback依赖的文件放入到前面试试。
    <?xml version="1.0" encoding="UTF-8"?>
    <configuration>
        <appender name="FILE-THREAD" class="ch.qos.logback.classic.sift.SiftingAppender">
            <discriminator>
                <key>logname</key>
                <defaultValue>rdjklog</defaultValue>
            </discriminator>
            <sift>
                <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
                    <file>${logname}.log</file>
    
                    <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
                        <fileNamePattern>${logname}.log.%i</fileNamePattern>
                        <minIndex>1</minIndex>
                        <maxIndex>10</maxIndex>
                    </rollingPolicy>
    
                    <triggeringPolicy
                            class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
                        <maxFileSize>10MB</maxFileSize>
                    </triggeringPolicy>
    
                    <!-- encoders are assigned the type
                 ch.qos.logback.classic.encoder.PatternLayoutEncoder by default -->
                    <encoder>
                        <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
                    </encoder>
                </appender>
    
            </sift>
        </appender>
    
        <root level="debug">
            <appender-ref ref="FILE-THREAD"/>
        </root>
    </configuration>
    logback文件-样例

    4 SVN提交总是冲突,是不是没有update

    在操纵之前update源码,确保代码是最新的版本。

    5 crontab 运行脚本是一个好的选择吗?

    #!/bin/bash
    
    # KpiAggregator run script(mainly for crontab)
    # author: Alfred
    # created: 2015/08/31
    # history:
    #   2015/08/31 - add sourcing /etc/profile and ~/.bash_profile to solve crontab env problem
    
    . /etc/profile
    . /home/sms/.bash_profile
    
    v_dir=$(dirname $0)
    v_basename=$(basename $0 .sh)
    v_logname=${v_dir}/${v_basename}
    
    # parameters
    v_topo_index=$(echo $v_basename | awk -F'_' '{print $3}')
    v_redis_url=$(grep "^${v_topo_index}.redis.url" $v_dir/config.ini | awk -F '=' '{print $2}' | head -1)
    v_db_driver=$(grep "^${v_topo_index}.db.driver" $v_dir/config.ini | awk -F '=' '{print $2}' | head -1)
    v_db_url=$(grep "^${v_topo_index}.db.url" $v_dir/config.ini | awk -F '=' '{print $2}' | head -1)
    v_db_user=$(grep "^${v_topo_index}.db.user" $v_dir/config.ini | awk -F '=' '{print $2}' | head -1)
    v_db_password=$(grep "^${v_topo_index}.db.password" $v_dir/config.ini | awk -F '=' '{print $2}' | head -1)
    v_monitor_time=$(grep "^${v_topo_index}.monitor.time" $v_dir/config.ini | awk -F '=' '{print $2}' | head -1)
    v_monitor_timedis=$(grep "^${v_topo_index}.monitor.timedis" $v_dir/config.ini | awk -F '=' '{print $2}' | head -1)
    v_monitor_timeDistrict=$(grep "^${v_topo_index}.monitor.timeDistrict" $v_dir/config.ini | awk -F '=' '{print $2}' | head -1)
    v_monitor_sender=$(grep "^${v_topo_index}.monitor.sender" $v_dir/config.ini | awk -F '=' '{print $2}' | head -1)
    v_monitor_smtpHost=$(grep "^${v_topo_index}.monitor.smtpHost" $v_dir/config.ini | awk -F '=' '{print $2}' | head -1)
    v_monitor_user=$(grep "^${v_topo_index}.monitor.user" $v_dir/config.ini | awk -F '=' '{print $2}' | head -1)
    v_monitor_password=$(grep "^${v_topo_index}.monitor.password" $v_dir/config.ini | awk -F '=' '{print $2}' | head -1)
    v_monitor_mailtitle=$(grep "^${v_topo_index}.monitor.mailtitle" $v_dir/config.ini | awk -F '=' '{print $2}' | head -1)
    v_monitor_dirfielCount=$(grep "^${v_topo_index}.monitor.dirfielCount" $v_dir/config.ini | awk -F '=' '{print $2}' | head -1)
    v_kloader_indir=$(grep "^${v_topo_index}.kloader.indir" $v_dir/config.ini | awk -F '=' '{print $2}' | head -1)
    echo $
    nohup java -cp $v_dir/../lib/log4j-1.2.9.jar:$(ls $v_dir/../jar/rta-*-with-dependencies.jar) DataMonitor redis.url=${v_redis_url} topo.index=${v_topo_index} db.driver=${v_db_driver} db.url=${v_db_url} db.user=${v_db_user} db.password=${v_db_password} monitor.time=${v_monitor_time} monitor.timedis=${v_monitor_timedis} monitor.timeDistrict=${v_monitor_timeDistrict} monitor.sender=${v_monitor_sender} monitor.smtpHost=${v_monitor_smtpHost} monitor.user=${v_monitor_user} monitor.password=${v_monitor_password} monitor.mailtitle=${v_monitor_mailtitle} monitor.dirfielCount=${v_monitor_dirfielCount} kloader.indir=${v_kloader_indir} logname=${v_logname} &
    例子1
    #!/bin/bash
    
    . ~/.bash_profile
    
    #外部JAR包引用
    
    MY_LIB_PATH=/home/utxt/software/zdgh/SmsApplication_GG/lib
    
    #定义通道数量
    number='9'
    
    #程序工作路径
    work_path=/home/utxt/software/zdgh/SmsApplication_GG/bin
    
    
    CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib:$MY_LIB_PATH/msutil.jar:$MY_LIB_PATH/msbase.jar:$MY_LIB_PATH/ojdbc14.jar:$MY_LIB_PATH/commons-pool-1.5.4.jar:$MY_LIB_PATH/commons-dbcp-1.2.2.jar:$MY_LIB_PATH/c3p0-0.9.1.jar:$MY_LIB_PATH/spring-2.5.5.jar:$MY_LIB_PATH/commons-logging.jar:$MY_LIB_PATH/commons-io-1.3.1.jar:$MY_LIB_PATH/commons-lang-2.2.jar:$MY_LIB_PATH/log4j-1.2.13.jar:$MY_LIB_PATH/commons-collections-2.1.1.jar:$MY_LIB_PATH/cglib-src-2.2.jar
    
    export LANG="zh_CN.GBK"
    
    PROC_DESC="CMPP网关短信下发程序"
    
    is_proc_run(){
      result=`ps -ef| grep -P "CmppStart $i " | grep -v grep | wc -l`
      if [ "$result" -eq "0" ]; then
        return 0
      else 
        return 1
      fi
    }
    
    start(){
      for ((i=0;i<number;i=i+1))
      do
        is_proc_run $i
        #echo "$?"
        if [ "$?" -eq "1" ]; then
          echo "The Process is Exists"
          echo "$PROC_DESC 通道【$i】---->已运行" 
        else
          
          cd $work_path
          nohup java com.witsky.sms.cmpp.app.proc.CmppStart $i  -server -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Xmx512m -Xms512m -Xmn512m -XX:PermSize=128m -Xss256k -XX:MaxTenuringThreshold=31 -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC  -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:LargePageSizeInBytes=256m  -XX:+UseFastAccessorMethods  >/dev/null 2>&1 &
          sleep 1
      
          cmppid=`ps -ef|grep " com.witsky.sms.cmpp.app.proc.CmppStart $i " | grep -v grep | awk '{print $2}'`
          if [ -n "$cmppid" ]; then
            echo "$i process spid is $cmppid"
            echo "$PROC_DESC [$i]通道进程号[$cmppid]"
          else
            echo "$i Process spid is not exists"
            echo "$PROC_DESC 通道【$i】---->不存在 "
          fi
        fi 
      done
    
    }
    
    stop(){
      for ((i=0;i<number;i=i+1))
      do
        is_proc_run $i
        if [ "$?" -eq "1" ]; then
          echo "Kill the Process"
          cmppid=`ps -ef|grep " com.witsky.sms.cmpp.app.proc.CmppStart $i " |grep -v grep|awk '{print $2 }'|wc -l`
          if [ "$cmppid" -eq "0" ]; then
            echo " The Process is not Exists"
            echo "$PROC_DESC 通道【$i】---->不存在 "  
            
          else
            cmppid=`ps -ef|grep " com.witsky.sms.cmpp.app.proc.CmppStart $i " | grep -v grep | awk '{print $2}'`
            ps -ef |grep " com.witsky.sms.cmpp.app.proc.CmppStart $i " |grep -v grep|awk '{print $2 }'| xargs kill -9
            echo "$PROC_DESC 通道【$i】进程号[$cmppid]----->已杀死 "
          fi
        else
          echo "The Process is not Exists"
          echo "$PROC_DESC 通道【$i】---->不存在"
        fi
    
      done
    
    }
    
    status(){
      for ((i=0;i<number;i=i+1))
      do
        is_proc_run $i
        if [ "$?" -eq "1" ]; then
          cmppid=`ps -ef |grep " com.witsky.sms.cmpp.app.proc.CmppStart $i " |grep -v grep|awk '{print $2 }'`
          if [ -n "$cmppid" ]; then
            echo "$i process spid is $cmppid"
            echo "$PROC_DESC 通道【$i】---->已运行"
          else
            echo "The Process is not Exists"
            echo "$PROC_DESC 通道【$i】---->不存在"
          fi
        else
          echo "The Process is not Exists"
          echo "$PROC_DESC 通道【$i】---->不存在"
        fi
      done
    
    }
    
    
    
    usage(){
            echo ${PROC_DESC} usage:
            echo -e "`basename $0` <start|stop|status|restart>"
            echo -e "	start   - start   ${PROC_NAME}"
            echo -e "	stop    - stop    ${PROC_NAME}"
            echo -e "	status  - list    ${PROC_NAME}"
            echo -e "	restart - restart ${PROC_NAME}"
    }
    
    #=======================================================================
    # 主发送程序启动,状态,停止,重启 2012-7-12  hzg
    #=======================================================================
    case $1 in
            start) 
          #          stop
                    start
                    ;;
            status)
                    status
                   
                    ;;
            stop)
                    stop
                    ;;
            restart)
                    stop
                    start
                    ;;
            *)
                    usage
    esac
    例子2

    可嵌入Strom运行,或者实时计算。

    6 hadoop处理小文件,怎么办?

    基本的办法是把众多小文件,合并为大文件。再MapReduce,实测确实比较快。
    比如原来的文件是2000个500k的小文件,伪分布的MapReduce耗时4到5个小时。
    合并为一个大文件后,耗时为1分钟左右。合并的方法也很简单,可参考如下代码。
    public static void putMergeFunc(String LocalDir, String fsFile)
                throws IOException {
    
            Configuration conf = new Configuration();
            FileSystem fs = FileSystem.get(conf); // fs是HDFS文件系统
            FileSystem local = FileSystem.getLocal(conf); // 本地文件系统
            FileStatus[] status = local.listStatus(new Path(LocalDir)); // 得到输入目录
            FSDataOutputStream out = fs.create(new Path(fsFile)); // 在HDFS上创建输出文件
    
            for (FileStatus st : status) {
                Path temp = st.getPath();
                FSDataInputStream in = local.open(temp);
                IOUtils.copyBytes(in, out, 4096, false); // 读取in流中的内容放入out
                in.close(); // 完成后,关闭当前文件输入流
            }
            out.close();
        }
    合并本地小文件上传HDFS

    7 hadoop怎么用java -cp的方式执行?

    项目执行过程中,一般会把需要的jar包整体打包,一般不会用hadoop jar的方式执行。
    只需注意两个地方就行。Configuration需要添加特定的xml和通信机制,否则访问不到HDFS文件系统。
    在其它的函数中,调用createFS方法,就可以访问和写入HDFS文件了。
    import org.apache.commons.httpclient.URI;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FileStatus;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    
     /**
         * 客户端xml配置文件, 待更改
         * @return
         */
        public static Configuration getConf(){
            Configuration conf = new Configuration();
            conf.addResource("../../hadoop/core-site.xml");
            conf.addResource("../../hadoop/hdfs-site.xml");
            conf.addResource("../../hadoop/mapred-site.xml");
            conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
            conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
            return conf;
        }
    
        /**
         * 返回特定的fs文件系统
         * @return
         */
        public static FileSystem createFS(){
            Configuration conf = getConf();
            try{
                URI uri = new URI(ConArgs.hdfsInPathDir, false);
                String hdfs = uri.getScheme() + "://" + uri.getHost() + ":" + uri.getPort();
                FileSystem fs = FileSystem.get(java.net.URI.create(hdfs), conf);
    
                return fs;
    
            }catch(Exception e){
    
                logger.error("createFS() wrong, please check hdfsInPathDir.");
                System.exit(-1);
            }
    
            return null;
        }
    getHDFS

    java 指定内存运行命令行

    java启动参数调整
    nohup java -cp $v_dir/../cfg:$v_dir/../lib/icp.jar:$(ls $v_dir/../jar/bsssa-*-with-dependencies.jar) -server -Xms2g -Xmx2g -XX:MaxPermSize=256m -Xloggc:/utxt/soft/bss/proglog/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Diname=witskybss cn.witsky.smb.bss.Main &
    指定内存

    8 关键路径和关联规则,以及频繁项集之间的联系?

    9 速度计算、拥堵指数计算,贝叶斯统计?

    参考资料

    1 Hadoop Map中获取当前spilt文件名    http://blog.csdn.net/shallowgrave/article/details/7757914

    2 Linux中crontab的坑爹环境变量问题    http://blog.csdn.net/dancen/article/details/24355287

    3 linux中crontable的用法   http://www.xuebuyuan.com/1791389.html

    4 hadoop java-jar运行,yarn框架   http://blog.163.com/silver9886@126/blog/static/35971862201432163918529/

    5 hadoop map读取参数     http://blog.csdn.net/zdy0_2004/article/details/46335195

  • 相关阅读:
    sql注入的防护
    mysql及sql注入
    机器学习之新闻文本分类。
    python导入各种包的方法——2
    爬去搜狐新闻历史类
    前端展示
    热词分析前端设计
    爬虫经验总结二
    爬虫经验总结一
    SpringBoot配置Druid数据库连接池
  • 原文地址:https://www.cnblogs.com/hdu-2010/p/4848969.html
Copyright © 2011-2022 走看看