zoukankan      html  css  js  c++  java
  • 爬虫10-lxml爬取飘花电影网

    import requests
    from  lxml import  etree
    url="https://www.piaohua.com/"
    headers={
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36"
    }
    #1.请求网页
    response=requests.get(url,headers=headers)
    content=response.content.decode("utf-8")
    #2.建立xpath
    html=etree.HTML(content)
    #3.使用xpath语法筛选
    ul=html.xpath("//ul[@class='ul-imgtxt1 row']")[0]
    lis=ul.xpath("./li")
    # for li in lis:
        #print(etree.tostring(li,encoding='utf-8').decode('utf-8'))#检测li没有问题
    movies=[]
    for li in lis:
        title=li.xpath(".//h3//text()")[0]
        clear=li.xpath(".//h3//text()")[1]
        playbill=li.xpath(".//img/@src")#@相当于取值符号
        movie={
            "title":title,
            "clear":clear,
            "playbill":playbill
        }
        movies.append(movie)
    print(movies)
    

      

  • 相关阅读:
    golang recover
    golang sort
    golang matrix
    golang encoding/json
    go package的理解
    golang beego cache
    git操作
    阿里云图标使用
    Stylus的使用
    vue-preview的使用
  • 原文地址:https://www.cnblogs.com/wcyMiracle/p/12466647.html
Copyright © 2011-2022 走看看