zoukankan      html  css  js  c++  java
  • LTP 分词算法实践

    参考链接:

    https://github.com/HIT-SCIR/ltp/blob/master/doc/install.rst

    http://www.xfyun.cn/index.php/services/ltp/detail?&app_id=NTZmYzg5ZWE=

    http://www.ltp-cloud.com/document/#api_rest_format_json

    其他分词算法参考链接:

    NLPIR:http://www.nlpir.org/    http://www.datatang.com/data/13483

    bosonnlp: http://bosonnlp.com/

    下面针对LTP分词算法 实践

    1.登录官网 获取授权码api_key

    2.分词接口操作 

    接口参数:

     StringBuilder sb = new StringBuilder();
                sb.Append(" 本报讯 (记者 王少勇)3月28日,国土资源部部长、党组书记、国家土地总督察姜大明主持召开第10次部党组会议,传达学习习xx总书记在北京市考察工作时的重要讲话精神。会议提出,要深刻理解习xx总书记重要讲话精神,充分发挥国土资源部门的服务和保障作用,推进京津冀协同发展。  xx平总书记高度重视北京发展和京津冀协同发展,今年2月下旬专程到北京市调研考察,并发表重要讲话,从做好北京发展和管理工作、推动京津冀协同发展两个方面进行了深刻阐述。会议指出,习xx总书记的重要讲话,对于实现京津冀优势互补、促进环渤海经济区发展、带动北方腹地发展,意义重大、影响深远,要认真学习,深刻领会。 会议提出,国土资源部门要进一步解放思想,加大改革创新力度,");            
                string url = "http://ltpapi.voicecloud.cn/analysis/";
                string data = "api_key=xxx&text=" + sb.ToString() + "&pattern=ws&format=xml";
                string returnValue= HttpHelper.HttpPost(url,data);
                Console.WriteLine(returnValue);
                Console.ReadKey();
     /// <summary>
            /// Post请求 webClient
            /// </summary>
            /// <param name="Url">地址</param>
            /// <param name="postString">这里即为传递的参数,可以用工具抓包分析,也可以自己分析,主要是form里面每一个name都要加进来  </param>
            /// <returns></returns>
            public static string HttpPost(string url, string postString)
            {
                byte[] postData = Encoding.UTF8.GetBytes(postString);//编码,尤其是汉字,事先要看下抓取网页的编码方式  
                WebClient webClient = new WebClient();
                webClient.Headers.Add("Content-Type", "application/x-www-form-urlencoded");//采取POST方式必须加的header,如果改为GET方式的话就去掉这句话即可  
                byte[] responseData = webClient.UploadData(url, "POST", postData);//得到返回字符流  
                string srcString = Encoding.UTF8.GetString(responseData);//解码 
                return srcString;
            }

    3.测试结果

    <?xml version="1.0" encoding="utf-8" ?>
    <xml4nlp>
        <note sent="y" word="y" pos="n" ne="n" parser="n" wsd="n" srl="n" />
        <doc>
            <para id="0">
                <sent id="0" cont="本报讯 (记者 王少勇)3月28日,国土资源部部长、党
    组书记、国家土地总督察姜大明主持召开第10次部党组会议,传达学习习xx总书记在北京
    市考察工作时的重要讲话精神。">
                    <word id="0" cont="本报" />
                    <word id="1" cont="" />
                    <word id="2" cont="" />
                    <word id="3" cont="记者" />
                    <word id="4" cont="王少勇" />
                    <word id="5" cont="" />
                    <word id="6" cont="3月" />
                    <word id="7" cont="28日" />
                    <word id="8" cont="" />
                    <word id="9" cont="国土" />
                    <word id="10" cont="资源部" />
                    <word id="11" cont="部长" />
                    <word id="12" cont="" />
                    <word id="13" cont="党组" />
                    <word id="14" cont="书记" />
                    <word id="15" cont="" />
                    <word id="16" cont="国家" />
                    <word id="17" cont="土地" />
                    <word id="18" cont="总督" />
                    <word id="19" cont="察姜" />
                    <word id="20" cont="大明" />
                    <word id="21" cont="主持" />
                    <word id="22" cont="召开" />
                    <word id="23" cont="第10" />
                    <word id="24" cont="" />
                    <word id="25" cont="部党组" />
                    <word id="26" cont="会议" />
                    <word id="27" cont="" />
                    <word id="28" cont="传达" />
                    <word id="29" cont="学习" />
                    <word id="30" cont="" />
                    <word id="31" cont="近平" />
                    <word id="32" cont="总书记" />
                    <word id="33" cont="" />
                    <word id="34" cont="北京市" />
                    <word id="35" cont="考察" />
                    <word id="36" cont="工作" />
                    <word id="37" cont="" />
                    <word id="38" cont="" />
                    <word id="39" cont="重要" />
                    <word id="40" cont="讲话" />
                    <word id="41" cont="精神" />
                    <word id="42" cont="" />
                </sent>
                <sent id="1" cont="会议提出,要深刻理解习xx总书记重要讲话精神,充
    分发挥国土资源部门的服务和保障作用,推进京津冀协同发展。">
                    <word id="0" cont="会议" />
                    <word id="1" cont="提出" />
                    <word id="2" cont="" />
                    <word id="3" cont="" />
                    <word id="4" cont="深刻" />
                    <word id="5" cont="理解" />
                    <word id="6" cont="" />
                    <word id="7" cont="近平" />
                    <word id="8" cont="总书记" />
                    <word id="9" cont="重要" />
                    <word id="10" cont="讲话" />
                    <word id="11" cont="精神" />
                    <word id="12" cont="" />
                    <word id="13" cont="充分" />
                    <word id="14" cont="发挥" />
                    <word id="15" cont="国土" />
                    <word id="16" cont="资源" />
                    <word id="17" cont="部门" />
                    <word id="18" cont="" />
                    <word id="19" cont="服务" />
                    <word id="20" cont="" />
                    <word id="21" cont="保障" />
                    <word id="22" cont="作用" />
                    <word id="23" cont="" />
                    <word id="24" cont="推进" />
                    <word id="25" cont="" />
                    <word id="26" cont="" />
                    <word id="27" cont="" />
                    <word id="28" cont="协同" />
                    <word id="29" cont="发展" />
                    <word id="30" cont="" />
                </sent>
                <sent id="2" cont="习xx总书记高度重视北京发展和京津冀协同发展,今
    年2月下旬专程到北京市调研考察,并发表重要讲话,从做好北京发展和管理工作、推动京
    津冀协同发展两个方面进行了深刻阐述。">
                    <word id="0" cont="" />
                    <word id="1" cont="近平" />
                    <word id="2" cont="总书记" />
                    <word id="3" cont="高度" />
                    <word id="4" cont="重视" />
                    <word id="5" cont="北京" />
                    <word id="6" cont="发展" />
                    <word id="7" cont="" />
                    <word id="8" cont="" />
                    <word id="9" cont="" />
                    <word id="10" cont="" />
                    <word id="11" cont="协同" />
                    <word id="12" cont="发展" />
                    <word id="13" cont="" />
                    <word id="14" cont="今年" />
                    <word id="15" cont="2月" />
                    <word id="16" cont="下旬" />
                    <word id="17" cont="专程" />
                    <word id="18" cont="" />
                    <word id="19" cont="北京市" />
                    <word id="20" cont="调研" />
                    <word id="21" cont="考察" />
                    <word id="22" cont="" />
                    <word id="23" cont="" />
                    <word id="24" cont="发表" />
                    <word id="25" cont="重要" />
                    <word id="26" cont="讲话" />
                    <word id="27" cont="" />
                    <word id="28" cont="" />
                    <word id="29" cont="做好" />
                    <word id="30" cont="北京" />
                    <word id="31" cont="发展" />
                    <word id="32" cont="" />
                    <word id="33" cont="管理" />
                    <word id="34" cont="工作" />
                    <word id="35" cont="" />
                    <word id="36" cont="推动" />
                    <word id="37" cont="" />
                    <word id="38" cont="" />
                    <word id="39" cont="" />
                    <word id="40" cont="协同" />
                    <word id="41" cont="发展" />
                    <word id="42" cont="" />
                    <word id="43" cont="" />
                    <word id="44" cont="方面" />
                    <word id="45" cont="进行" />
                    <word id="46" cont="" />
                    <word id="47" cont="深刻" />
                    <word id="48" cont="阐述" />
                    <word id="49" cont="" />
                </sent>
                <sent id="3" cont="会议指出,习xx总书记的重要讲话,对于实现京津冀
    优势互补、促进环渤海经济区发展、带动北方腹地发展,意义重大、影响深远,要认真学习
    ,深刻领会。">
                    <word id="0" cont="会议" />
                    <word id="1" cont="指出" />
                    <word id="2" cont="" />
                    <word id="3" cont="习xx" />
                    <word id="4" cont="总书记" />
                    <word id="5" cont="" />
                    <word id="6" cont="重要" />
                    <word id="7" cont="讲话" />
                    <word id="8" cont="" />
                    <word id="9" cont="对于" />
                    <word id="10" cont="实现" />
                    <word id="11" cont="" />
                    <word id="12" cont="" />
                    <word id="13" cont="" />
                    <word id="14" cont="优势" />
                    <word id="15" cont="互补" />
                    <word id="16" cont="" />
                    <word id="17" cont="促进" />
                    <word id="18" cont="" />
                    <word id="19" cont="渤海" />
                    <word id="20" cont="经济区" />
                    <word id="21" cont="发展" />
                    <word id="22" cont="" />
                    <word id="23" cont="带动" />
                    <word id="24" cont="北方" />
                    <word id="25" cont="腹地" />
                    <word id="26" cont="发展" />
                    <word id="27" cont="" />
                    <word id="28" cont="意义" />
                    <word id="29" cont="重大" />
                    <word id="30" cont="" />
                    <word id="31" cont="影响" />
                    <word id="32" cont="深远" />
                    <word id="33" cont="" />
                    <word id="34" cont="" />
                    <word id="35" cont="认真" />
                    <word id="36" cont="学习" />
                    <word id="37" cont="" />
                    <word id="38" cont="深刻" />
                    <word id="39" cont="领会" />
                    <word id="40" cont="" />
                </sent>
                <sent id="4" cont="会议提出,国土资源部门要进一步解放思想,加大改革
    创新力度,">
                    <word id="0" cont="会议" />
                    <word id="1" cont="提出" />
                    <word id="2" cont="" />
                    <word id="3" cont="国土" />
                    <word id="4" cont="资源" />
                    <word id="5" cont="部门" />
                    <word id="6" cont="" />
                    <word id="7" cont="进一步" />
                    <word id="8" cont="解放思想" />
                    <word id="9" cont="" />
                    <word id="10" cont="加大" />
                    <word id="11" cont="改革" />
                    <word id="12" cont="创新" />
                    <word id="13" cont="力度" />
                    <word id="14" cont="" />
                </sent>
            </para>
        </doc>
    </xml4nlp>
  • 相关阅读:
    算法
    如果业界中不用高级算法和数据结构,那为什么还要学?
    CentOS 7 运行级别切换
    ECharts笔记
    Vue+TypeScript学习
    TypeScript深入学习
    TypeScript基础
    检测数据类型的方法
    前端提高性能的方式
    柯里化
  • 原文地址:https://www.cnblogs.com/meiCode/p/LTP.html
Copyright © 2011-2022 走看看