zoukankan      html  css  js  c++  java
  • IOS 用正则表达式解析HTML等文件,得到所有文本

     获得网页内容

    NSURL *url=[NSURL URLWithString:@"http://121.199.34.52/wordpress/?json=core.get_post_content&post_id=8764&post_type=post"];
         NSDictionary * dic=[NSJSONSerialization JSONObjectWithData:[NSData dataWithContentsOfURL:url] options:0 error:Nil];
     
      NSString *content=[dic objectForKey:@"content"];

    正则表达式

       NSRegularExpression *regularExpretion=[NSRegularExpression regularExpressionWithPattern:@"<[^>]*>| "
                                                                                        options:0
                                                                                          error:nil];
        
        content=[regularExpretion stringByReplacingMatchesInString:content options:NSMatchingReportProgress range:NSMakeRange(0, content.length) withTemplate:@"-"];//替换所有html和换行匹配元素为"-"
        
        regularExpretion=[NSRegularExpression regularExpressionWithPattern:@"-{1,}" options:0 error:nil] ;
         content=[regularExpretion stringByReplacingMatchesInString:content options:NSMatchingReportProgress range:NSMakeRange(0, content.length) withTemplate:@"-"];//把多个"-"匹配为一个"-"
        
        //根据"-"分割到数组
         NSArray *arr=[NSArray array];
        content=[NSString stringWithString:content];
         arr =  [content componentsSeparatedByString:@"-"];
        NSMutableArray *marr=[NSMutableArray arrayWithArray:arr];
        [marr removeObject:@""];
        for (NSString *str in marr) {
               NSLog(@"呵呵-------------%@",str);
            
        }

    去除字符串中所有得空格及控制字符:

    str = [str stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet ]];

  • 相关阅读:
    笔试题 输出金字塔 面试经典
    C++ 函数, 虚函数, 纯虚函数
    EJB 根据beanName引用EJB
    【J2EE性能分析篇】JVM参数对J2EE性能优化的影响【转】
    C++ 引用和指针作为函数参数的例子。请不要拍砖
    lucene 总结
    二维数组按列序号排序 面试经典
    http://www.linuxidc.com/Linux/201004/25494.htm
    银行取款费用
    PHP 生成 csv 文件时乱码解决
  • 原文地址:https://www.cnblogs.com/huntaiji/p/3513172.html
Copyright © 2011-2022 走看看