source code , book:"Web scraping with Python"
1. trying the first function, but run into errors all the time, let me figure out how to fix it
1.1 code:
import urllib2
from urllib.parse import urlparse
def download1(url):
"""Simple downloader"""
return urllib2.urlopen(url).read()
#return urllib.urlopen(url).read()
#return urllib.urlopen()
download1('http://example.webscraping.com')
I think I have tried the orginal code, using urllib2, but failed, then I tried urllib3, but it doesn't work neither.
then I want to try reinstall "urllib2" with "pip3 install urllib2", failed again, and error message as below
>>> import urllib2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'urllib2'
>>>
cor@debian:~/zorktoolkit/usr/local/factory$ pip3 install urllib2
Collecting urllib2
Exception:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/usr/lib/python3/dist-packages/pip/commands/install.py", line 353, in run
wb.build(autobuilding=True)
File "/usr/lib/python3/dist-packages/pip/wheel.py", line 749, in build
self.requirement_set.prepare_files(self.finder)
File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 380, in prepare_files
ignore_dependencies=self.ignore_dependencies))
File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 554, in _prepare_file
require_hashes
File "/usr/lib/python3/dist-packages/pip/req/req_install.py", line 278, in populate_link
self.link = finder.find_requirement(self, upgrade)
File "/usr/lib/python3/dist-packages/pip/index.py", line 465, in find_requirement
all_candidates = self.find_all_candidates(req.name)
File "/usr/lib/python3/dist-packages/pip/index.py", line 423, in find_all_candidates
for page in self._get_pages(url_locations, project_name):
File "/usr/lib/python3/dist-packages/pip/index.py", line 568, in _get_pages
page = self._get_page(location)
File "/usr/lib/python3/dist-packages/pip/index.py", line 683, in _get_page
return HTMLPage.get_page(link, session=self.session)
File "/usr/lib/python3/dist-packages/pip/index.py", line 795, in get_page
resp.raise_for_status()
File "/usr/share/python-wheels/requests-2.12.4-py2.py3-none-any.whl/requests/models.py", line 893, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/urllib2/
does this mean I can't install urllib2 with pip3 ? I have no idea on it. let me try apt install again, but no luck.
cor@debian:~$ sudo apt-get install urllib2 Reading package lists... Done Building dependency tree Reading state information... Done E: Unable to locate package urllib2
2. let's google it
WARNING: Security researches have found several poisoned packages on PyPI, including a package named urllib, which will 'phone home' when installed.
If you used pip install urllib some time after June 2017, remove that package as soon as possible. You can't, and you don't need to. urllib2 is the name of the library included in Python 2.
You can use the urllib.request library included with Python 3, instead.
The urllib.request library works the same way urllib2 works in Python 2. Because it is already included you don't need to install it. If you are following a tutorial that tells you to use urllib2 then you'll find you'll run into more issues. Your tutorial was written for Python 2,
not Python 3. Find a different tutorial, or install Python 2.7 and continue your tutorial on that version. You'll find urllib2 comes with that version. Alternatively, install the requests library for a higher-level and easier to use API. It'll work on both Python 2 and 3.
finally , the code can pass like this, why "import urllib" & "return urllib.request.urlopen(url)" doesn't work ?
# -*- coding: utf-8 -*-
from urllib import request
def download1(url):
"""Simple downloader"""
# before
#return urllib.urlopen(url).read()
#after, using urllib.request instead
return request.urlopen(url)
download1('http://example.webscraping.com')