source code , book:"Web scraping with Python"
1. trying the first function, but run into errors all the time, let me figure out how to fix it
1.1 code:
import urllib2 from urllib.parse import urlparse def download1(url): """Simple downloader"""
return urllib2.urlopen(url).read()
#return urllib.urlopen(url).read() #return urllib.urlopen() download1('http://example.webscraping.com')
I think I have tried the orginal code, using urllib2, but failed, then I tried urllib3, but it doesn't work neither.
then I want to try reinstall "urllib2" with "pip3 install urllib2", failed again, and error message as below
>>> import urllib2 Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named 'urllib2' >>> cor@debian:~/zorktoolkit/usr/local/factory$ pip3 install urllib2 Collecting urllib2 Exception: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/pip/basecommand.py", line 215, in main status = self.run(options, args) File "/usr/lib/python3/dist-packages/pip/commands/install.py", line 353, in run wb.build(autobuilding=True) File "/usr/lib/python3/dist-packages/pip/wheel.py", line 749, in build self.requirement_set.prepare_files(self.finder) File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 380, in prepare_files ignore_dependencies=self.ignore_dependencies)) File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 554, in _prepare_file require_hashes File "/usr/lib/python3/dist-packages/pip/req/req_install.py", line 278, in populate_link self.link = finder.find_requirement(self, upgrade) File "/usr/lib/python3/dist-packages/pip/index.py", line 465, in find_requirement all_candidates = self.find_all_candidates(req.name) File "/usr/lib/python3/dist-packages/pip/index.py", line 423, in find_all_candidates for page in self._get_pages(url_locations, project_name): File "/usr/lib/python3/dist-packages/pip/index.py", line 568, in _get_pages page = self._get_page(location) File "/usr/lib/python3/dist-packages/pip/index.py", line 683, in _get_page return HTMLPage.get_page(link, session=self.session) File "/usr/lib/python3/dist-packages/pip/index.py", line 795, in get_page resp.raise_for_status() File "/usr/share/python-wheels/requests-2.12.4-py2.py3-none-any.whl/requests/models.py", line 893, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/urllib2/
does this mean I can't install urllib2 with pip3 ? I have no idea on it. let me try apt install again, but no luck.
cor@debian:~$ sudo apt-get install urllib2 Reading package lists... Done Building dependency tree Reading state information... Done E: Unable to locate package urllib2
2. let's google it
WARNING: Security researches have found several poisoned packages on PyPI, including a package named urllib, which will 'phone home' when installed.
If you used pip install urllib some time after June 2017, remove that package as soon as possible. You can't, and you don't need to. urllib2 is the name of the library included in Python 2.
You can use the urllib.request library included with Python 3, instead.
The urllib.request library works the same way urllib2 works in Python 2. Because it is already included you don't need to install it. If you are following a tutorial that tells you to use urllib2 then you'll find you'll run into more issues. Your tutorial was written for Python 2,
not Python 3. Find a different tutorial, or install Python 2.7 and continue your tutorial on that version. You'll find urllib2 comes with that version. Alternatively, install the requests library for a higher-level and easier to use API. It'll work on both Python 2 and 3.
finally , the code can pass like this, why "import urllib" & "return urllib.request.urlopen(url)" doesn't work ?
# -*- coding: utf-8 -*- from urllib import request def download1(url): """Simple downloader""" # before #return urllib.urlopen(url).read() #after, using urllib.request instead return request.urlopen(url) download1('http://example.webscraping.com')