zoukankan      html  css  js  c++  java
  • shell操作文件的几条命令:删除最后一列、删除第一行、diff等

    删除文件第一行: sed '1d' filename

    删除文件最后一列: awk '{print $NF}' filename

    awk删除重复行的命令:awk '{if (!seen[$0]++) {print $0;}}' filename

    比较文件的两种方法:

    1)comm -3 --nocheck-order file1 file2

    2) grep -v -f file1 file2 :输出file2中有file1中没有的行

    当然还有diff file1 file2

    贴一段昨天写的shell脚本~

    #!/bin/bash
    date_time=`date +'%H_%M_%S'`
    yesterday=`date -d"-1 day" +'%Y_%m_%d'`
    today=`date +'%Y_%m_%d'`
    date_day_time=`date +'%Y_%m_%d_%H_%M_%S'`
    
    mkdir /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/same_similiar_log/$today
    
    # begin to get input files which haven't been deal with
    today_input=/home/crawler/petabyte/crawllog/news_data/$today
    yesterday_input=/home/crawler/petabyte/crawllog/news_data/$yesterday
    
    /opt/hadoop/program/bin/hadoop fs -ls $yesterday_input/ > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get
    /opt/hadoop/program/bin/hadoop fs -ls $today_input/ >> /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get
    
    sed '1d' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get_without_first_line
    
    awk '{print $NF}' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get_without_first_line > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input
    
    #comm -3 --nocheck-order /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff
    
    grep -v -f /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff
    
    awk '{print $NF}' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_new_input
    
    mv /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done
    
    
    # begin to compute same_similary_news
    inputfile1=""
    while read line
    do
      inputfile1=$inputfile1,${line}
    done < /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done
    echo $inputfile1
  • 相关阅读:
    20181120-1 每周例行报告
    20181113-2 每周例行报告
    20181030-4 每周例行报告
    20181023-3 每周例行报告
    第六周例行报告
    软件功能说明书final修订
    第十二周——例行报告
    PSP总结报告
    第十一周——例行报告
    PSP Daily软件beta版本——基于NABCD评论,及改进建议
  • 原文地址:https://www.cnblogs.com/changxiaoxiao/p/3161279.html
Copyright © 2011-2022 走看看