zoukankan      html  css  js  c++  java
  • regular expression, grep (python, linux)

    re.match(pattern, string, flags=0)  尝试从字符串的起始位置匹配一个模式
    re.search(pattern, string, flags=0)  扫描整个字符串并返回第一个成功的匹配
    re.sub(pattern, repl, string, max=0)  替换字符串中的匹配项
    re.findall(pattern, string, flags=0)  从字符串中查找所有匹配模式的子串
    >>> import re
    >>> s = '112.90.239.137 112.90.239.137 1526446118 [26/Nov/2015:00:00:47 +0800] 23 "GET /ag/coord/convert?_appName=jiakaobaodianxingui&_appUser=632e76c53b4f3c9ffe90b8c4c61bd5b0&_cityCode=330300&_cityName=%E6%B8%A9%E5%B7%9E&_device=iPhone&_firstTime=2015-10-28%2018%3A49%3A05&_gpsType=baidu&_idfa=D0DD23E5-B407-449B-B005-0B52C6C2CBF3&_idfv=85A23658-2DD4-490D-B8AE-767842401821&_imei=c09b0b9b9759e72eaf0fd6e3eb38e55113d74cdd&_j=1.0&_jail=false&_latitude=27.610026844605&_launch=45&_longitude=120.56419068644&_network=wifi&_openUuid=c09b0b9b9759e72eaf0fd6e3eb38e55113d74cdd&_pkgName=cn.mucang.ios.jiakaobaodianPromise&_platform=iphone&_product=%E9%A9%BE%E8%80%83%E5%AE%9D%E5%85%B8-%E9%A9%BE%E7%85%A7%E8%80%83%E8%AF%95&_productCategory=jiakaobaodian&_renyuan=mucang&_screenDip=2&_screenHeight=1136&_screenWidth=640&_system=iPhone%20OS&_systemVersion=9.0.2&_vendor=appstore&_version=5.9.0&from=0&to=4&x=120.5576965508963&y=27.61254659188421 HTTP/1.1" "api.map.baidu.com" 200 76 gzip:116pct. "-" "BAIDUID=C328D2934E2C6EDF8E185FAC44EB168D:FG=1" "jiakaobaodianPromise/5.9.0 (iPhone; iOS 9.0.2; Scale/2.00)" map apimap 16555290153476373216 10.46.234.22 "9904758605881922946"'
    >>> res = re.compile(r"(.*) (.*) (.*) [(.*)] (.*) "(.*)" "(.*)" (.*) (.*) (.*) "(.*)" "(.*)" "(.*)" (.*) (.*) (.*) (.*) "(.*)"")
    >>> res is None
    False
    >>> res.search(s).groups()
    ('112.90.239.137', '112.90.239.137', '1526446118', '26/Nov/2015:00:00:47 +0800', '23', 'GET /ag/coord/convert?_appName=jiakaobaodianxingui&_appUser=632e76c53b4f3c9ffe90b8c4c61bd5b0&_cityCode=330300&_cityName=%E6%B8%A9%E5%B7%9E&_device=iPhone&_firstTime=2015-10-28%2018%3A49%3A05&_gpsType=baidu&_idfa=D0DD23E5-B407-449B-B005-0B52C6C2CBF3&_idfv=85A23658-2DD4-490D-B8AE-767842401821&_imei=c09b0b9b9759e72eaf0fd6e3eb38e55113d74cdd&_j=1.0&_jail=false&_latitude=27.610026844605&_launch=45&_longitude=120.56419068644&_network=wifi&_openUuid=c09b0b9b9759e72eaf0fd6e3eb38e55113d74cdd&_pkgName=cn.mucang.ios.jiakaobaodianPromise&_platform=iphone&_product=%E9%A9%BE%E8%80%83%E5%AE%9D%E5%85%B8-%E9%A9%BE%E7%85%A7%E8%80%83%E8%AF%95&_productCategory=jiakaobaodian&_renyuan=mucang&_screenDip=2&_screenHeight=1136&_screenWidth=640&_system=iPhone%20OS&_systemVersion=9.0.2&_vendor=appstore&_version=5.9.0&from=0&to=4&x=120.5576965508963&y=27.61254659188421 HTTP/1.1', 'api.map.baidu.com', '200', '76', 'gzip:116pct.', '-', 'BAIDUID=C328D2934E2C6EDF8E185FAC44EB168D:FG=1', 'jiakaobaodianPromise/5.9.0 (iPhone; iOS 9.0.2; Scale/2.00)', 'map', 'apimap', '16555290153476373216', '10.46.234.22', '9904758605881922946’)
    
    >>> re.sub('(<b>)|(</b>)', '', s)
    grep:
      -v, --invert-match        select non-matching lines
      -i, --ignore-case         ignore case distinctions
      -f, --file=FILE           obtain PATTERN from FILE
      -w, --word-regexp         force PATTERN to match only whole words
      -o, --only-matching       show only the part of a line matching PATTERN
     
      -P, --perl-regexp         PATTERN is a Perl regular expression
      -n, --line-number         print line number with output lines
      -H, --with-filename       print the file name for each match
      -B, --before-context=NUM  print NUM lines of leading context
      -A, --after-context=NUM   print NUM lines of trailing context
      -C, --context=NUM         print NUM lines of output context
      -a, --text                equivalent to --binary-files=text
      -s, --no-messages         suppress error messages
     
    regexp:
    • . (dot) - a single character.
    • ? - the preceding character matches 0 or 1 times only.
    • * - the preceding character matches 0 or more times.
    • + - the preceding character matches 1 or more times.
    • {n} - the preceding character matches exactly n times.
    • {n,m} - the preceding character matches at least n times and not more than m times.
    • [agd] - the character is one of those included within the square brackets.
    • [^agd] - the character is not one of those included within the square brackets.
    • [c-f] - the dash within the square brackets operates as a range. In this case it means either the letters c, d, e or f.
    • () - allows us to group several characters to behave as one.
    • | (pipe symbol) - the logical OR operation.
    • ^ - matches the beginning of the line.
    • $ - matches the end of the line.
    • s - matches anything which is considered whitespace. This could be a space, tab, line break etc.
    • S - matches the opposite of s, that is anything which is not considered whitespace.
    • d - matches anything which is considered a digit. ie 0 - 9 (It is effectively a shortcut for [0-9]).
    • D - matches the opposite of d, that is anything which is not considered a digit.
    • w - matches anything which is considered a word character. That is [A-Za-z0-9_]. Note the inclusion of the underscore character '_'. This is because in programming and other areas we regulaly use the underscore as part of, say, a variable or function name.
    • W - matches the opposite of w, that is anything which is not considered a word character.
    • Tab - represented in regular expressions as 
    • Carriage return - represented in regular expressions as 
    • Line feed (or newline) - represented in regular expressions as 
    • Windows - uses the sequence   (in that order)
    • Mac OS (version 9 and below) - uses the sequence 
    • Unix/Linux and OSX - uses the sequence 
    • < - represents the beginning of a word.
    • > - represents the end of a word.
    •  - represents either the beginning or end of a word.
    • ( ) Group part of the regular expression. 1 2 etc Refer to something matched by a previous grouping. | Match what is on either the left or right of the pipe symbol. (?=x) Positive lookahead. (?!x) Negative lookahead. (?<=x) Positive lookbehind. (?<!x) Negative lookbehind.
  • 相关阅读:
    带箭头提示框
    文本溢出显示省略号
    Git高级操作
    sublime text 2 破解
    python如何画三维图像?
    pytorch梯度下降法讲解(非常详细)
    pytorch数学运算与统计属性入门(非常易懂)
    pytorch张量数据索引切片与维度变换操作大全(非常全)
    pytorch中tensor张量数据基础入门
    pytorch深度学习神经网络实现手写字体识别
  • 原文地址:https://www.cnblogs.com/yaoyaohust/p/10363200.html
Copyright © 2011-2022 走看看