zoukankan      html  css  js  c++  java
  • shell操作文件的几条命令:删除最后一列、删除第一行、diff等

    删除文件第一行: sed '1d' filename

    删除文件最后一列: awk '{print $NF}' filename

    awk删除重复行的命令:awk '{if (!seen[$0]++) {print $0;}}' filename

    比较文件的两种方法:

    1)comm -3 --nocheck-order file1 file2

    2) grep -v -f file1 file2 :输出file2中有file1中没有的行

    当然还有diff file1 file2

    贴一段昨天写的shell脚本~

    #!/bin/bash
    date_time=`date +'%H_%M_%S'`
    yesterday=`date -d"-1 day" +'%Y_%m_%d'`
    today=`date +'%Y_%m_%d'`
    date_day_time=`date +'%Y_%m_%d_%H_%M_%S'`
    
    mkdir /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/same_similiar_log/$today
    
    # begin to get input files which haven't been deal with
    today_input=/home/crawler/petabyte/crawllog/news_data/$today
    yesterday_input=/home/crawler/petabyte/crawllog/news_data/$yesterday
    
    /opt/hadoop/program/bin/hadoop fs -ls $yesterday_input/ > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get
    /opt/hadoop/program/bin/hadoop fs -ls $today_input/ >> /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get
    
    sed '1d' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get_without_first_line
    
    awk '{print $NF}' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get_without_first_line > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input
    
    #comm -3 --nocheck-order /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff
    
    grep -v -f /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff
    
    awk '{print $NF}' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_new_input
    
    mv /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done
    
    
    # begin to compute same_similary_news
    inputfile1=""
    while read line
    do
      inputfile1=$inputfile1,${line}
    done < /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done
    echo $inputfile1
  • 相关阅读:
    阿波罗11号登月全套高清照片(16650张,67.1G)分享
    oracle ORA-02292: 违反完整约束条件
    三十六副寺庙对联,领略真正的大智慧!
    SpringCloud微服务架构及其示例
    IDEA怎么关闭暂时不用的工程
    关于解决Incorrect result size: expected 1, actual的问题
    Centos7安装redis6.0.6教程
    VMware安装CentOS7超详细版
    Spring5--@Indexed注解加快启动速度
    《程序员修炼手册》
  • 原文地址:https://www.cnblogs.com/changxiaoxiao/p/3161279.html
Copyright © 2011-2022 走看看