zoukankan      html  css  js  c++  java
  • spider

    from lxml import etree
    import requests
    def getHtml(html):
    novelcontent = requests.get(html).content
    return etree.HTML(novelcontent)

    source = getHtml("http://www.cabintu.com")

    listclassify = source.xpath('//ul[@class="sg_menu"]/li/a')
    listtype = source.xpath('//div[@class="mainleft"]/ul[@class="sg_menu"]/li[@class="section"]//ul[@class="subnav_a"]/li[@class="airline"]/a')

    for i in range(0,len(listclassify)-1):
    fname = source.xpath('//div[@class="mainleft"]/ul[@class="sg_menu"]/li[@class="section"]/a/text()')[i]
    print fname



    for n in range(0,len(listtype)-1):
    typelist = source.xpath('//div[@class="mainleft"]/ul[@class="sg_menu"]/li[@class="section"]//ul[@class="subnav_a"]/li[@class="airline"]/a/text()')[n]
    print typelist



    # for n in range(0,)


    # ftypelist = source.xpath('//div[@class="mainleft"]/ul[@class="sg_menu"]/li[@class="section"]/ul[@class="subnav_a"]/li[@class="airline"]/a/text()')[i]
  • 相关阅读:
    day 48
    40 协程 多路复用
    JQuery
    JS DOMBOM
    psotgres、timescaledb
    crontab命令
    Go语言结构体和方法
    Go语言锁的使用
    Go语言map数据结构
    ZOJ 3777 Problem Arrangement
  • 原文地址:https://www.cnblogs.com/cutepython/p/6102824.html
Copyright © 2011-2022 走看看