zoukankan      html  css  js  c++  java
  • newspaper安装

    安装系统是centos6.5 32位

    Python环境 Python2.7

    由于直接pip安装时,lmlx安装有问题

    1. 安装lmlx

    http://mdba.cn/?p=86

    yum install libxml* -y
    yum install libxslt* -y
     
    wget http://lxml.de/files/lxml-3.1.2.tgz
    tar xzvf lxml-3.1.2.tgz
    cd lxml-3.1.2
    python setup.py build
    python setup.py install
     
    #验证是否安装成功
    shell > python
    >>> import lxml

    2. 安装newspaper
    https://github.com/codelucas/newspaper

    pip install newspaper
    curl https://raw.github.com/codelucas/newspaper/master/download_corpora.py | python2.7

    3. 测试

    from newspaper import Article
    url = 'http://edition.cnn.com/2014/08/14/world/meast/gaza-couple-wedding-at-unrwa-shelter/index.html?hpt=hp_c2'
    
    a = Article(url, language='zh') # Chinese
    a.download()
    a.parse()
    print a.title
    
    url = 'http://www.tuicool.com/articles/fYneUz'
    a = Article(url, language='zh') # Chinese
    a.download()
    a.parse()
    print a.title
    
    url = 'http://www.tuicool.com/articles/AJJ7nu3'
    a = Article(url, language='zh') # Chinese
    a.download()
    a.parse()
    print a.title
    

      



  • 相关阅读:
    centos6 下erlang安装
    待研究
    关键字拦截查询
    获取CNVD的cookie
    adb pull 文件夹到电脑
    Linux中查看端口占用情况
    Running Tensorflow on AMD GPU
    验证码识别相关文章
    conda和pip相关操作
    windows安装pycrypto报错
  • 原文地址:https://www.cnblogs.com/huiwq1990/p/3913851.html
Copyright © 2011-2022 走看看