zoukankan      html  css  js  c++  java
  • request-html 简单爬虫

    import asyncio
    
    from requests_html import HTMLSession
    
    url  = 'http://www.xiaohuar.com/hua/'
    
    session = HTMLSession( browser_args=[
            '--no-sand',
            '--disable-infobars'
            '--user-agent=Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'
        ],headless=False)
    res = session.request(url=url,method='GET')
    script = """
                    () => {
                        return {
                             document.documentElement.clientWidth,
                            height: document.documentElement.clientHeight,
                            deviceScaleFactor: window.devicePixelRatio,
                        }
                    }
                   """
    try:
        res.html.render(keep_page = True)
        async def main():
    
            await res.html.page.waitFor(1000)
            await res.html.page.setViewport({'width': 1366, 'height': 768})
            url_list = await  res.html.page.xpath('//div[@class="img"]/a')
            for url in url_list:
                url_link = await (await url.getProperty('href')).jsonValue()
                print(url_link)
        asyncio.get_event_loop().run_until_complete(main())
    except Exception as e:
        print(e)
    finally:
        session.close()
    
    
  • 相关阅读:
    Hibernate初级
    Servlet, Listener 、 Filter.
    DBCP数据源
    数据库连接池
    MySQL入门笔记
    20170330 webservice代理类测试
    20170330 ABAP代理生成
    20170329 隐士增强问题
    ABAP rfc 发布webservice 错误
    ABAP 性能优化001
  • 原文地址:https://www.cnblogs.com/ruhai/p/11318347.html
Copyright © 2011-2022 走看看