zoukankan html css js c++ java

Step by Step of "Web scraping with Python" ----Richard Lawson ---1/n

source code , book:"Web scraping with Python"

1. trying the first function, but run into errors all the time, let me figure out how to fix it

1.1 code:

import urllib2
from urllib.parse import urlparse


def download1(url):
    """Simple downloader"""
    return urllib2.urlopen(url).read()

    #return urllib.urlopen(url).read()
    #return urllib.urlopen()
download1('http://example.webscraping.com')

　　I think I have tried the orginal code, using urllib2, but failed, then I tried urllib3, but it doesn't work neither.

then I want to try reinstall "urllib2" with "pip3 install urllib2", failed again, and error message as below

>>> import urllib2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'urllib2'
>>> 
cor@debian:~/zorktoolkit/usr/local/factory$ pip3 install urllib2
Collecting urllib2
Exception:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/usr/lib/python3/dist-packages/pip/commands/install.py", line 353, in run
    wb.build(autobuilding=True)
  File "/usr/lib/python3/dist-packages/pip/wheel.py", line 749, in build
    self.requirement_set.prepare_files(self.finder)
  File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 380, in prepare_files
    ignore_dependencies=self.ignore_dependencies))
  File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 554, in _prepare_file
    require_hashes
  File "/usr/lib/python3/dist-packages/pip/req/req_install.py", line 278, in populate_link
    self.link = finder.find_requirement(self, upgrade)
  File "/usr/lib/python3/dist-packages/pip/index.py", line 465, in find_requirement
    all_candidates = self.find_all_candidates(req.name)
  File "/usr/lib/python3/dist-packages/pip/index.py", line 423, in find_all_candidates
    for page in self._get_pages(url_locations, project_name):
  File "/usr/lib/python3/dist-packages/pip/index.py", line 568, in _get_pages
    page = self._get_page(location)
  File "/usr/lib/python3/dist-packages/pip/index.py", line 683, in _get_page
    return HTMLPage.get_page(link, session=self.session)
  File "/usr/lib/python3/dist-packages/pip/index.py", line 795, in get_page
    resp.raise_for_status()
  File "/usr/share/python-wheels/requests-2.12.4-py2.py3-none-any.whl/requests/models.py", line 893, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/urllib2/

does this mean I can't install urllib2 with pip3 ? I have no idea on it. let me try apt install again, but no luck.

cor@debian:~$ sudo apt-get install urllib2
Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Unable to locate package urllib2

2. let's google it

WARNING: Security researches have found several poisoned packages on PyPI, including a package named urllib, which will 'phone home' when installed. 
If you used pip install urllib some time after June 2017, remove that package as soon as possible.

You can't, and you don't need to.

urllib2 is the name of the library included in Python 2. 
You can use the urllib.request library included with Python 3, instead. 
The urllib.request library works the same way urllib2 works in Python 2. Because it is already included you don't need to install it.

If you are following a tutorial that tells you to use urllib2 then you'll find you'll run into more issues. Your tutorial was written for Python 2,
 not Python 3. Find a different tutorial, or install Python 2.7 and continue your tutorial on that version. You'll find urllib2 comes with that version.

Alternatively, install the requests library for a higher-level and easier to use API. It'll work on both Python 2 and 3.

finally , the code can pass like this, why "import urllib" & "return urllib.request.urlopen(url)" doesn't work ?

# -*- coding: utf-8 -*-

from urllib import  request

def download1(url):
    """Simple downloader"""
    # before
    #return urllib.urlopen(url).read()
    #after, using urllib.request instead
    return request.urlopen(url)
download1('http://example.webscraping.com')

查看全文

相关阅读:
How To Compile Qt with Visual Studio 2010
VCL线程的同步方法 Synchronize（用消息来同步）
Delphi中怎么结束线程（这个线程是定时执行的）（方案二）
编程之美寻找数组中的最大值和最小值
 Delphi中怎么结束线程（这个线程是定时执行的）（方案一）
Delphi线程同步（临界区、互斥、信号量，包括详细代码）
Delphi管理多线程之线程局部存储：threadvar
Delphi之通过代码示例学习XML解析、StringReplace的用法（异常控制 good）
Delphi的文件操作（定义，关联，打开，读写，关闭）
Android 中单位讲解

原文地址：https://www.cnblogs.com/winditsway/p/12557624.html