zoukankan html css js c++ java

iOS

一、前言：

　　最近有个需求就是需要客户端来解析小说网站的内容，所以对这块进行了一些了解，发现这块的资料还是蛮少的。

　　基本上都是使用 libxml2 这个东西来做解析，但是这东西的资料也蛮少的，基本上都是用 HPPLE 这个库，这个库是基于 libxml2 封装的一个OC库，地址奉上 https://github.com/topfunky/hpple

　　简单集成下吧。

二、集成

　　1、导入文件

　　　　把下载下来的项目这个class目录下面的文件导入进来

　　2、配置属性

　　　　2.1、 libxml2.2.dylib 导入framework

　　　　2.2 、PROJECT 中的 Search Path - header search paths添加 /usr/include/libxml2

　　3、封装工具类，使用

　　　　直接上代码吧

#import <Foundation/Foundation.h>

NS_ASSUME_NONNULL_BEGIN

@interface HppleManger : NSObject

+ (instancetype)sharedHppleMark;

/// 获取解析结果
/// @param htmlStr 网页地址
/// @param subStr 网页解析规则
- (void)getListWithHTML:(NSString *)htmlStr andSubscri:(NSString *)subStr;

@end

NS_ASSUME_NONNULL_END

#import "HppleManger.h"
#import "TFHpple.h"

@implementation HppleManger

static HppleManger *_showWaterMark = nil;

+ (instancetype)sharedHppleMark{
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        _showWaterMark = [[super allocWithZone:NULL] init];
    });
    
    return _showWaterMark;
}

+ (id)allocWithZone:(struct _NSZone *)zone {
    return [HppleManger sharedHppleMark];
}
  
- (id)copyWithZone:(struct _NSZone *)zone {
    return [HppleManger sharedHppleMark];
}

- (void)getListWithHTML:(NSString *)htmlStr andSubscri:(NSString *)subStr {
    NSData  * data      = [NSData dataWithContentsOfFile:htmlStr];
    
    TFHpple * doc       = [[TFHpple alloc] initWithHTMLData:data];
    //a[@class='sponsor']
    NSArray * elements  = [doc searchWithXPathQuery:subStr];
    if (elements.count == 0) {
        return;
    }

    TFHppleElement *element = [elements objectAtIndex:0];
    [element text];                       // The text inside the HTML element (the content of the first text node)
    [element tagName];                    // "a"
    [element attributes];                 // NSDictionary of href, class, id, etc.
    [element objectForKey:@"href"];       // Easy access to single attribute
    [element firstChildWithTagName:@"b"]; // The first "b" child node
    NSLog(@"text：%@",[element text]);
    NSLog(@"tagName：%@",[element tagName]);
    NSLog(@"content：%@" , [element content]);
    
}

　　使用：

NSString *htmlString  = @"<br>&nbsp;&nbsp;&nbsp;&nbsp;全本小说网 最快更新你的灵兽看起来很好吃最新章节！<br /><br />";
    NSString *htmlSub = @"//br";
    [[HppleManger sharedHppleMark] getListWithHTML:htmlString andSubscri:htmlSub];

三、总结

　　这个解析要传的规则字符串暂时还没有弄的太清楚，解析失败率也高，经常解析不出来，速度还行，不过没有使用复杂的网页来做。

　　下一步想直接传一个网址进来直接解析看看效果。

　　下次再弄这个吧，好累，睡觉去了。

查看全文

相关阅读:
docker中centos7安装ssh服务
 redis加入systemctl服务
 elasticsearch设置执行脚本并添加开机启动 (转）
CentOS7使用firewalld打开关闭防火墙与端口
 腾讯地图JSAPI开发demo 定位，查询
 C# 开发AliYun(阿里云）小蜜调用接口代码
 VSCode 开发Core教程
 Rabbit原理理解
 Exceptionless 本地部署
 Visual Studio 2019 自带混淆工具DotFuscator不需要去网络下载

原文地址：https://www.cnblogs.com/qiyiyifan/p/12008133.html