zoukankan      html  css  js  c++  java
  • Python爬虫入门遇到的坑

    1. 环境 

    - Python
      mac os预装的python 

    $ python -V  
    Python 2.7.10
    $ where python
    /usr/bin/python
    $ ls /System/Library/Frameworks/Python.framework/Versions
    2.3     2.5     2.6     2.7     Current
    $ ls /Library/Frameworks/Python.framework/Versions (用户安装的目录)

    - IDE
      Pycharm
    - 辅助
      安装pip

    sudo easy_install pip

    - Python库

    sudo pip install requests (默认安装requests 2.13.0) 
    sudo pip install BeautifulSoup (默认安装BeautifulSoup 3.2.1)
    sudo pip install lxml (默认安装lxml 3.7.3)

    2. 问题

    - 问题1

    代码:
    soup = BeautifulSoup(html, 'lxml')
    报错:
    Traceback (most recent call last):
    File "/Users/cuizhenyu/Documents/Codes/Python/DownloadMeitu/LibBeautifulSoupTest.py", line 15, in <module>
    soup = BeautifulSoup(html) #soup = BeautifulSoup(html, 'lxml')报错
    TypeError: 'module' object is not callable
    解决:
    from BeautifulSoup import BeautifulSoup

    - 问题2

    代码:
    soup = BeautifulSoup(html, 'lxml')
    报错:
    Traceback (most recent call last):
    File "/Users/cuizhenyu/Documents/Codes/Python/DownloadMeitu/LibBeautifulSoupTest.py", line 15, in <module>
    soup = BeautifulSoup(html, 'lxml') #soup = BeautifulSoup(html, 'lxml')报错
    File "/Library/Python/2.7/site-packages/BeautifulSoup.py", line 1522, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
    File "/Library/Python/2.7/site-packages/BeautifulSoup.py", line 1147, in __init__
    self._feed(isHTML=isHTML)
    File "/Library/Python/2.7/site-packages/BeautifulSoup.py", line 1189, in _feed
    SGMLParser.feed(self, markup)
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 104, in feed
    self.goahead(0)
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 138, in goahead
    k = self.parse_starttag(i)
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 296, in parse_starttag
    self.finish_starttag(tag, attrs)
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 338, in finish_starttag
    self.unknown_starttag(tag, attrs)
    File "/Library/Python/2.7/site-packages/BeautifulSoup.py", line 1338, in unknown_starttag
    self.endData()
    File "/Library/Python/2.7/site-packages/BeautifulSoup.py", line 1251, in endData
    (not self.parseOnlyThese.text or
    AttributeError: 'str' object has no attribute 'text'
    解决:
    当前BeautifulSoup是v3版,不支持lxml等,需用v4版。

     

  • 相关阅读:
    看懂SqlServer查询计划
    Android开发16——获取网络资源之基础应用
    Android开发15——给TextView加上滚动条
    PeekMessage、GetMessage的区别
    获取不到Repeater控件中的CheckBox选中状态
    第十九讲:动态链接库
    孙鑫VC++视频教程笔记
    CEdit 控制键盘操作
    网络编程中粘包的处理方法
    VC++编程之道读书笔记(2)
  • 原文地址:https://www.cnblogs.com/mulisheng/p/6665350.html
Copyright © 2011-2022 走看看