zoukankan      html  css  js  c++  java
  • shell操作文件的几条命令:删除最后一列、删除第一行、diff等

    删除文件第一行: sed '1d' filename

    删除文件最后一列: awk '{print $NF}' filename

    awk删除重复行的命令:awk '{if (!seen[$0]++) {print $0;}}' filename

    比较文件的两种方法:

    1)comm -3 --nocheck-order file1 file2

    2) grep -v -f file1 file2 :输出file2中有file1中没有的行

    当然还有diff file1 file2

    贴一段昨天写的shell脚本~

    #!/bin/bash
    date_time=`date +'%H_%M_%S'`
    yesterday=`date -d"-1 day" +'%Y_%m_%d'`
    today=`date +'%Y_%m_%d'`
    date_day_time=`date +'%Y_%m_%d_%H_%M_%S'`
    
    mkdir /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/same_similiar_log/$today
    
    # begin to get input files which haven't been deal with
    today_input=/home/crawler/petabyte/crawllog/news_data/$today
    yesterday_input=/home/crawler/petabyte/crawllog/news_data/$yesterday
    
    /opt/hadoop/program/bin/hadoop fs -ls $yesterday_input/ > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get
    /opt/hadoop/program/bin/hadoop fs -ls $today_input/ >> /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get
    
    sed '1d' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get_without_first_line
    
    awk '{print $NF}' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get_without_first_line > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input
    
    #comm -3 --nocheck-order /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff
    
    grep -v -f /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff
    
    awk '{print $NF}' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_new_input
    
    mv /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done
    
    
    # begin to compute same_similary_news
    inputfile1=""
    while read line
    do
      inputfile1=$inputfile1,${line}
    done < /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done
    echo $inputfile1
  • 相关阅读:
    重要的API运算函数
    Chicken的代码解剖 :4 ChickenPawn_Chicken一小部分
    Chicken的代码解剖:5 Chicken中的两个接口及其相关
    项目实例:深投控股star rating评分插件
    项目实例:深投控股JQueryXmlMenu
    编程经验:VS2008注册方法
    编程经验:SQL Server Management Studio使用注意事项
    程序员面试题精选100题(07)翻转句子中单词的顺序
    程序员面试100题精选(8)
    Ogre框架的搭建过程
  • 原文地址:https://www.cnblogs.com/changxiaoxiao/p/3161279.html
Copyright © 2011-2022 走看看