zoukankan      html  css  js  c++  java
  • Step by Step of "Web scraping with Python" ----Richard Lawson ---1/n

    source code ,  book:"Web scraping with Python"

    1. trying the first function, but run into errors all the time, let me figure out how to fix it

    1.1  code:

    import urllib2
    from urllib.parse import urlparse
    
    
    def download1(url):
        """Simple downloader"""
    return urllib2.urlopen(url).read()
        #return urllib.urlopen(url).read()
        #return urllib.urlopen()
    download1('http://example.webscraping.com')
    

      I think I have tried the orginal code, using urllib2, but failed, then I tried urllib3,  but it doesn't work neither.

    then I want to try reinstall "urllib2" with "pip3 install urllib2", failed again, and error message as below

    >>> import urllib2
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ImportError: No module named 'urllib2'
    >>> 
    cor@debian:~/zorktoolkit/usr/local/factory$ pip3 install urllib2
    Collecting urllib2
    Exception:
    Traceback (most recent call last):
      File "/usr/lib/python3/dist-packages/pip/basecommand.py", line 215, in main
        status = self.run(options, args)
      File "/usr/lib/python3/dist-packages/pip/commands/install.py", line 353, in run
        wb.build(autobuilding=True)
      File "/usr/lib/python3/dist-packages/pip/wheel.py", line 749, in build
        self.requirement_set.prepare_files(self.finder)
      File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 380, in prepare_files
        ignore_dependencies=self.ignore_dependencies))
      File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 554, in _prepare_file
        require_hashes
      File "/usr/lib/python3/dist-packages/pip/req/req_install.py", line 278, in populate_link
        self.link = finder.find_requirement(self, upgrade)
      File "/usr/lib/python3/dist-packages/pip/index.py", line 465, in find_requirement
        all_candidates = self.find_all_candidates(req.name)
      File "/usr/lib/python3/dist-packages/pip/index.py", line 423, in find_all_candidates
        for page in self._get_pages(url_locations, project_name):
      File "/usr/lib/python3/dist-packages/pip/index.py", line 568, in _get_pages
        page = self._get_page(location)
      File "/usr/lib/python3/dist-packages/pip/index.py", line 683, in _get_page
        return HTMLPage.get_page(link, session=self.session)
      File "/usr/lib/python3/dist-packages/pip/index.py", line 795, in get_page
        resp.raise_for_status()
      File "/usr/share/python-wheels/requests-2.12.4-py2.py3-none-any.whl/requests/models.py", line 893, in raise_for_status
        raise HTTPError(http_error_msg, response=self)
    requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/urllib2/
    

    does this mean I can't install urllib2 with pip3 ? I have no idea on it. let me try apt install again, but no luck.

    cor@debian:~$ sudo apt-get install urllib2
    Reading package lists... Done
    Building dependency tree       
    Reading state information... Done
    E: Unable to locate package urllib2
    

      

    2. let's google it

    WARNING: Security researches have found several poisoned packages on PyPI, including a package named urllib, which will 'phone home' when installed. 
    If you used pip install urllib some time after June 2017, remove that package as soon as possible. You can't, and you don't need to. urllib2 is the name of the library included in Python 2.
    You can use the urllib.request library included with Python 3, instead.
    The urllib.request library works the same way urllib2 works in Python 2. Because it is already included you don't need to install it. If you are following a tutorial that tells you to use urllib2 then you'll find you'll run into more issues. Your tutorial was written for Python 2,
    not Python 3. Find a different tutorial, or install Python 2.7 and continue your tutorial on that version. You'll find urllib2 comes with that version. Alternatively, install the requests library for a higher-level and easier to use API. It'll work on both Python 2 and 3.

      

    finally , the code can  pass like this,  why "import urllib" & "return urllib.request.urlopen(url)" doesn't work ?

    # -*- coding: utf-8 -*-
    
    from urllib import  request
    
    def download1(url):
        """Simple downloader"""
        # before
        #return urllib.urlopen(url).read()
        #after, using urllib.request instead
        return request.urlopen(url)
    download1('http://example.webscraping.com')
    

      

      

  • 相关阅读:
    24、面向对象(内置方法)
    23、面向对象(包装)
    22、面向对象(反射)
    21、面向对象(封装)
    20、面向对象(多态)
    19、面向对象(继承)
    18、面向对象(静态属性、类方法、静态方法)
    LeetCode 3. Longest Substring Without Repeating Characters
    LeetCode 2.Add Two Numbers
    LeetCode 1. Two Sum
  • 原文地址:https://www.cnblogs.com/winditsway/p/12557624.html
Copyright © 2011-2022 走看看