zoukankan      html  css  js  c++  java
  • r 中sub() gsub()等匹配与替换函数

    Description

    grep, grepl, regexpr, gregexpr and regexec search for matches to pattern within each element of a character vector: they differ in the format of and amount of detail in the results.

    sub and gsub perform replacement of the first and all matches respectively.

    Usage

    pattern目标字符,replacement替换字符,x对象

    
    sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
        fixed = FALSE, useBytes = FALSE)#替换第一个匹配
    
    gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
         fixed = FALSE, useBytes = FALSE)#全部替换
    
    grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
         fixed = FALSE, useBytes = FALSE, invert = FALSE)
    
    grepl(pattern, x, ignore.case = FALSE, perl = FALSE,
          fixed = FALSE, useBytes = FALSE)
    
    regexpr(pattern, text, ignore.case = FALSE, perl = FALSE,
            fixed = FALSE, useBytes = FALSE)
    
    gregexpr(pattern, text, ignore.case = FALSE, perl = FALSE,
             fixed = FALSE, useBytes = FALSE)
    
    regexec(pattern, text, ignore.case = FALSE, perl = FALSE,
            fixed = FALSE, useBytes = FALSE)
    

    Arguments

    pattern

    character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector. Coerced by as.character to a character string if possible. If a character vector of length 2 or more is supplied, the first element is used with a warning. Missing values are allowed except for regexpr, gregexpr and regexec.

    x, text

    a character vector where matches are sought, or an object which can be coerced by as.character to a character vector. Long vectors are supported.

    ignore.case

    if FALSE, the pattern matching is case sensitive and if TRUE, case is ignored during matching.表示是否忽视大小写

    perl

    logical. Should Perl-compatible regexps be used perl规则

    value

    if FALSE, a vector containing the (integer) indices of the matches determined by grep is returned, and if TRUE, a vector containing the matching elements themselves is returned.

    fixed

    logical. If TRUE, pattern is a string to be matched as is. Overrides all conflicting arguments.

    useBytes

    logical. If TRUE the matching is done byte-by-byte rather than character-by-character. See ‘Details’.

    invert

    logical. If TRUE return indices or values for elements that do not match.

    replacement

    a replacement for matched pattern in sub and gsub. Coerced to character if possible. For fixed = FALSE this can include backreferences "\1" to "\9" to parenthesized subexpressions of pattern. For perl = TRUE only, it can also contain "\U" or "\L" to convert the rest of the replacement to upper or lower case and "\E" to end case conversion. If a character vector of length 2 or more is supplied, the first element is used with a warning. If NA, all elements in the result corresponding to matches will be set to NA.

    Details

    Arguments which should be character strings or character vectors are coerced to character if possible.

    Each of these functions operates in one of three modes:

    fixed = TRUE: use exact matching.

    perl = TRUE: use Perl-style regular expressions.

    fixed = FALSE, perl = FALSE: use POSIX 1003.2 extended regular expressions (the default).

    See the help pages on regular expression for details of the different types of regular expressions.

    The two *sub functions differ only in that sub replaces only the first occurrence of a pattern whereas gsub replaces all occurrences. If replacement contains backreferences which are not defined in pattern the result is undefined (but most often the backreference is taken to be "").

    For regexpr, gregexpr and regexec it is an error for pattern to be NA, otherwise NA is permitted and gives an NA match.

    Both grep and grepl take missing values in x as not matching a non-missing pattern.

    The main effect of useBytes = TRUE is to avoid errors/warnings about invalid inputs and spurious matches in multibyte locales, but for regexpr it changes the interpretation of the output. It inhibits the conversion of inputs with marked encodings, and is forced if any input is found which is marked as "bytes" (see Encoding).

    Caseless matching does not make much sense for bytes in a multibyte locale, and you should expect it only to work for ASCII characters if useBytes = TRUE.

    regexpr and gregexpr with perl = TRUE allow Python-style named captures, but not for long vector inputs.

    Invalid inputs in the current locale are warned about up to 5 times.

    Caseless matching with perl = TRUE for non-ASCII characters depends on the PCRE library being compiled with ‘Unicode property support’, which PCRE2 is by default.

  • 相关阅读:
    MyEclipse中的Tomcat跑大项目时内存溢出:permgen space
    MyEclipse新建工作空间后的配置详细步骤
    解决eclipse复制粘贴js代码卡死的问题
    eclipse复制工作空间配置
    maven项目检出后报错(包括编译报错和运行报错)的常见检查处理方式
    MyEclipse中引用的maven配置文件只访问私服的配置
    图标网站
    afasf
    一个权限管理模块的设计(转载)
    奇艺下载
  • 原文地址:https://www.cnblogs.com/impw/p/13029395.html
Copyright © 2011-2022 走看看