zoukankan html css js c++ java

字符串处理函数

一：查找

1：strcspn函数

该函数是标准库的函数，包含在头文件<string.h>中，其原型如下：

size_t strcspn(const char *s1, const char *s2);

该函数计算字符串s1中，从头开始的某子串的长度，该子串中的字符都不会在s2中出现。举例如下：

int main(int argc, char **argv)
{
    char *s1 = argv[1];
    char *s2 = argv[2];

    printf("the length is %d
", strcspn(s1, s2));
}

$./1 abcdefg hij

the length is 7

$./1 defgabc hijad

the length is 0

$./1 defgabc heija

the length is 1

2：strspn函数

由strcspn函数引申，strspn的意义与其相反。它也是标准库函数，在头文件<string.h>中定义，其原型如下：

       size_t strspn(const char *s1, const char *s2);

该函数计算字符串s1中，从头开始的某子串的长度，该子串中的字符都在s2中出现。举例如下：

int main(int argc, char **argv)
{
    char *s1 = argv[1];
    char *s2 = argv[2];

    printf("the length is %d
", strspn(s1, s2));
}

$./1 abcde fgh

the length is 0

$./1 abcde fghaxzbhc

the length is 3

$./1 abcde edcbaheh

the length is 5

3：index函数

       #include <strings.h>
       char * index(const char *string, int c);

该函数等价于strchr函数，返回一个指针，该指针指向字符串string中，字符c第一次出现的位置，如果字符c未出现在string中，则返回NULL，举例如下：

int main(int argc, char **argv)
{
    char *s1 = argv[1];
    int c = argv[2][0];

    char *res1 = index(s1, c);
    char *res2 = strchr(s1, c);

    printf("res1 is %p
", res1);
    printf("res2 is %p
", res2);
}

$./1 abcdef h

res1 is 0x0

res2 is 0x0

$./1 abcdef a

res1 is 0x612f2d44

res2 is 0x612f2d44

$./1 abcdef c

res1 is 0x612f2d46

res2 is 0x612f2d46

4：strpbrk函数

strpbrk是标准库函数，在头文件<string.h>中定义，其原型如下：

       char *strpbrk(const char *s, const char *accept);

该函数在字符串s中，寻找accept字符串中任意字符的首次出现的位置。如果找到了任意字符，则返回指向该字符的指针，如果没找到，则返回NULL。

该函数是线程安全的。例子如下：

int main(int argc, char **argv)
{
    if(argc != 3)
    {
        printf("argument error
");
        return -1;
    }

    char *str = argv[1];
    char *accept = argv[2];

    char *res = strpbrk(str, accept);
    if(res == NULL)
    {
        printf("NULL
");
    }
    else
    {
        printf("res is %s
", res);
    }
}

$ ./1 "//abc/def/hehe/abc:/abc:def//" ""
NULL

$./1 "//abc/def/hehe/abc:/abc:def//" ":/"

res is //abc/def/hehe/abc:/abc:def//

$./1 "//abc/def/hehe/abc:/abc:def//" ":"

res is :/abc:def//

$./1 "//abc/def/hehe/abc:/abc:def//" "a"

res is abc/def/hehe/abc:/abc:def//

$./1 "//abc/def/hehe/abc:/abc:def//" "h"

res is hehe/abc:/abc:def//

$./1 "//abc/def/hehe/abc:/abc:def//" "z"

NULL

二：分割

1：strtok和strtok_r函数

strtok函数是标准库函数，包含在头文件 <string.h>中，其原型如下：

       char *strtok(char *str, const char *delim);

strtok函数用来将一字符串分割为一系列的子串。要得到原字符串被分割后的所有子串，需要多次调用strtok，每次调用都返回一个子串的首地址。第一次调用strtok时，str需要指向分割的字符串，但在后续的调用中，置str为NULL即可。

delim参数指向的字符串包含了若干分割字符。在分割同一个字符串时，可以在delim中指定不同的分隔字符。

strtok返回一个指向子串的指针，该子串以’’结尾。该子串不会包含分隔符。如果已经没有子串了，则该函数返回NULL。

在strtok函数的实现中，内部使用一静态指针old，该指针保存的是每次搜索子串的起始地址。第一次调用strtok时，该指针就是原字符串的首字符。

本次调用strtok要寻找的子串首地址，由str中的下一个非分隔字符决定。如果找到了这样的字符，则它就是本次子串的首字符。如果没有找到这样的子串，则strtok返回NULL。

子串的结尾，由下一个分隔字符或者’’决定。如果找到了一个分隔字符，则该位置被置为’’，并且将old指向下一个字符。

根据上面的描述，可知，两个或多个连续的分割字符会被当做一个分隔符，并且会忽略原字符串中起始字符或者结尾字符是分隔字符的情况。因此strtok返回的子串一定不会是空串。在最新的GLIBC2.22版本中，strtok的源码如下：

static char *olds;

char *strtok (char *s, const char *delim)
{
    char *token;
    if (s == NULL)
        s = olds;

    /* Scan leading delimiters.  */
    s += strspn (s, delim);
    if (*s == '')
    {
      olds = s;
      return NULL;
    }

    /* Find the end of the token.  */
    token = s;
    s = strpbrk (token, delim);
    if (s == NULL)
        /* This token finishes the string.  */
        olds = __rawmemchr (token, '');
    else
    {
        /* Terminate the token and make OLDS point past it.  */
        *s = '';
        olds = s + 1;
    }
    return token;
}

因strtok函数内部使用了静态指针，因此它不是线程安全的。相应的线程安全函数是strtok_r，它是strtok函数的可重入版本。它的原型如下：

       char *strtok_r(char *str, const char *delim, char **saveptr);

其中的saveptr参数，便充当了strtok中old指针的角色，strtok_r的代码与strtok基本一样，只不过其中的old都替换成了*saveptr。

注意：这些函数会改变原字符串的内容，所以，第一个参数不能是只读的。例子如下：

int main(int argc, char **argv)
{
    if(argc != 3)
    {
        printf("argument error
");
        return -1;
    }
    
    char *delim = argv[2];

    int strsize = strlen(argv[1])+1;
    char *str = calloc(strsize, sizeof(char));
    memcpy(str, argv[1], strsize-1);
    char *token = NULL;
    char *tmpstr = str;

    while((token = strtok(tmpstr, delim)) != NULL)
    {
        printf("	%s
", token);
        tmpstr = NULL;
    }

    int i = 0;
    for(i = 0; i < strsize; i++)
    {
        printf("%c ", str[i]);
    }
    printf("
");
}

$ ./1 "//abc/def/hehe/abc:/abc:def//" ""
        //abc/def/hehe/abc:/abc:def//
/ / a b c / d e f / h e h e / a b c : / a b c : d e f / /

$ ./1  "//abc/def/hehe/abc:/abc:def//" ":/"
        abc
        def
        hehe
        abc
        abc
        def
/ / a b c  d e f  h e h e  a b c  / a b c  d e f  /

2：strsep函数

strsep函数包含在头文件<string.h>中，其原型如下：

char *strsep(char **stringp, const char *delim);

strsep函数与strtok函数的功能一样，也是用来分割字符串的。但是它们又有几处明显的不同：

a：如果*stringp为NULL，则该函数不进行任何操作，直接返回NULL；

b：strsep每次都是从*stringp指向的位置开始搜索，搜索到任一分割字符之后，将其置为’’，并使*stringp指向它的下一个字符。如果找不到任何分割字符，则将*stringp置为NULL。

c：strsep内部没有使用静态指针，因而strsep是线程安全的。

d：strsep返回的子串有可能是空字符串，实际上，就是因为strtok无法返回空子串，才引入的strsep函数。不过strtok符合C89/C99标准，因而移植性更好。但strsep却不是。

在最新的GLIBC2.22版本中，strsep的源码如下：

char *strsep (char **stringp, const char *delim)
{
    char *begin, *end;

    begin = *stringp;
    if (begin == NULL)
        return NULL;

    /* A frequent case is when the delimiter string contains only one
     character.  Here we don't need to call the expensive `strpbrk'
     function and instead work using `strchr'.  */
    if (delim[0] == '' || delim[1] == '')
    {
        char ch = delim[0];

        if (ch == '')
            end = NULL;
        else
        {
            if (*begin == ch)
                end = begin;
            else if (*begin == '')
                end = NULL;
            else
                end = strchr (begin + 1, ch);
        }
    }
    else
        /* Find the end of the token.  */
        end = strpbrk (begin, delim);

    if (end)
    {
        /* Terminate the token and set *STRINGP past NUL character.  */
        *end++ = '';
        *stringp = end;
    }
    else
        /* No more delimiters; this is the last token.  */
        *stringp = NULL;

    return begin;
}

例子如下：

int main(int argc, char **argv)
{
    if(argc != 3)
    {
        printf("argument error
");
        return -1;
    }
    
    char *delim = argv[2];

    int strsize = strlen(argv[1])+1;
    char *str = calloc(strsize, sizeof(char));
    memcpy(str, argv[1], strsize-1);
    char *token = NULL;
    char *tmpstr = str;

    while((token = strsep(&tmpstr, delim)) != NULL)
    {
        printf("	%s
", token);
    }

    int i = 0;
    for(i = 0; i < strsize; i++)
    {
        printf("%c ", str[i]);
    }
    printf("
");
}

$ ./1 "//abc/def/hehe/abc:/abc:def//" ""
token is:  //abc/def/hehe/abc:/abc:def//
/ / a b c / d e f / h e h e / a b c : / a b c : d e f / /

$ ./1 "//abc/def/hehe/abc:/abc:def//" "/"
token is:
token is:
token is:  abc
token is:  def
token is:  hehe
token is:  abc:
token is:  abc:def
token is:
token is:
  a b c  d e f  h e h e  a b c :  a b c : d e f

$ ./1 "//abc/def/hehe/abc:/abc:def//" ":/"
token is:
token is:
token is:  abc
token is:  def
token is:  hehe
token is:  abc
token is:
token is:  abc
token is:  def
token is:
token is:
  a b c  d e f  h e h e  a b c   a b c  d e f

查看全文

相关阅读:
App提交Appstore审核流程【转】
程序员必须软件
 Linux的cron和crontab
Git操作基本命令
 Python编码问题整理【转】
Python读取ini配置文件
 RF+Jenkins构建持续集成
 RF接口测试本地环境部署
 Python建立SSH连接与使用方法
 永久修改python默认的字符编码为utf-8

原文地址：https://www.cnblogs.com/gqtcgq/p/7247120.html