zoukankan      html  css  js  c++  java
  • 为Qemu aarch32添加BeautifulSoup4模块

    环境

    Qemu:2.8.0
    开发板:vexpress-ca9
     

    概述

    上一篇博文已经可以让我们的开发板可以成功的ping通百度了,据说Python的网络功能也很强大,而Beautiful Soup是python的一个库,但不是标准库,因此需要单独安装,最主要的功能是从网页抓取数据。
     

    正文

    一、先用python自带的urllib库试一试
    net.py3: 这个是python3版本的
    1 #!/usr/bin/env python3
    2 
    3 from urllib.request import urlopen
    4 html = urlopen("http://www.pythonscraping.com/pages/page1.html");
    5 print(html.read())
    net.py2:这个是python2版本的
    1 #!/usr/bin/env python2
    2 
    3 from urllib2 import urlopen
    4 html = urlopen("http://www.pythonscraping.com/pages/page1.html");
    5 print(html.read())
    我们运行看看结果:
    [root@vexpress ~]# ./net.py3 
    b'<html>
    <head>
    <title>A Useful Page</title>
    </head>
    <body>
    <h1>An Interesting Title</h1>
    <div>
    Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
    </div>
    </body>
    </html>
    '
    [root@vexpress ~]# 
    [root@vexpress ~]# ./net.py2
    <html>
    <head>
    <title>A Useful Page</title>
    </head>
    <body>
    <h1>An Interesting Title</h1>
    <div>
    Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
    </div>
    </body>
    </html>

    其实Python提供了一个工具2to3,将Python2版本的代码转换为Python3版本, 我们在板子上面试试。

    但是运行后提示找不到2to3:
    [root@vexpress ~]# 2to3 net.py2 
    -/bin/sh: 2to3: not found
    但是用which命令查找这个工具发现2to3确实存在
    [root@vexpress ~]# which 2to3
    /usr/bin/2to3
    我们打开/usr/bin/2to3,看到问题所在:
    1 #!/home/pengdonglin/src/qemu/python_cross_compile/Python2/aarch32/bin/python2.7
    2 
    3 import sys
    4 from lib2to3.main import main
    5 sys.exit(main("lib2to3.fixes"))

    问题出在第一行, 修改如下:

    1 #!/usr/bin/env python2
    2 
    3 import sys
    4 from lib2to3.main import main
    5 sys.exit(main("lib2to3.fixes"))
    然后再次运行:
     1 [root@vexpress ~]# 2to3 net.py2 
     2 RefactoringTool: Skipping optional fixer: buffer
     3 RefactoringTool: Skipping optional fixer: idioms
     4 RefactoringTool: Skipping optional fixer: set_literal
     5 RefactoringTool: Skipping optional fixer: ws_comma
     6 RefactoringTool: Refactored net.py2
     7 --- net.py2    (original)
     8 +++ net.py2    (refactored)
     9 @@ -1,7 +1,7 @@
    10  #!/usr/bin/env python2
    11  
    12 -from urllib2 import urlopen
    13 +from urllib.request import urlopen
    14  
    15  html = urlopen("http://www.pythonscraping.com/pages/page1.html");
    16  
    17 -print(html.read())
    18 +print((html.read()))
    19 RefactoringTool: Files that need to be modified:
    20 RefactoringTool: net.py2

    可以看到以+开始的行就是对应Python3版本的,使用下面的命令会将自动将转换后的文件存储下来:

     1 [root@vexpress ~]# 2to3 net.py2 -w -n -o /tmp/
     2 lib2to3.main: Output in '/tmp/' will mirror the input directory '' layout.
     3 RefactoringTool: Skipping optional fixer: buffer
     4 RefactoringTool: Skipping optional fixer: idioms
     5 RefactoringTool: Skipping optional fixer: set_literal
     6 RefactoringTool: Skipping optional fixer: ws_comma
     7 RefactoringTool: Refactored net.py2
     8 --- net.py2    (original)
     9 +++ net.py2    (refactored)
    10 @@ -1,6 +1,6 @@
    11  #!/usr/bin/env python2
    12  
    13 -from urllib2 import urlopen
    14 +from urllib.request import urlopen
    15  
    16  html = urlopen("http://www.pythonscraping.com/pages/page1.html");
    17  
    18 RefactoringTool: Writing converted net.py2 to /tmp/net.py2.
    19 RefactoringTool: Files that were modified:
    20 RefactoringTool: net.py2

    可以看到/tmp/net.py2对应的就是Python3版本的:

    1 [root@vexpress ~]# cat /tmp/net.py2 
    2 
    3 #!/usr/bin/env python2
    4 from urllib.request import urlopen
    5 html = urlopen("http://www.pythonscraping.com/pages/page1.html");
    6 print((html.read()))
    当然,第一行还需要我们手动修改。
     
    二、添加BeautifulSoup4模块
    由于这个模块是纯Python实现的,所以可以先在PC上面安装这个模块,然后再拷贝到板子上面,毕竟Python代码跟具体的平台无关。
    1、为PC安装BeautifulSoup4
    1 sudo apt-get install python-pip
    2 sudo apt-get install python3-pip
    3 sudo apt-get install python-bs4
    4 sudo pip install beautifulsoup4
    5 sudo pip3 install beautifulsoup4
    这样就会在对应版本Python的dist-packages下面看到bs4的目录
    1 $ls /usr/lib/python2.7/dist-packages/bs4 
    2 builder/  dammit.py  dammit.pyc  diagnose.py  diagnose.pyc  element.py  element.pyc  __init__.py  __init__.pyc  testing.py  testing.pyc  tests/
    3 $ls /usr/lib/python3/dist-packages/bs4/
    4 builder/  dammit.py  diagnose.py  element.py  __init__.py  __pycache__/  testing.py  tests/ 
    有时也会安装到site-packages下面, 然后将这两个bs4文件夹拷贝到共享目录下:
    1 $cp /usr/lib/python2.7/dist-packages/bs4 /nfsroot/bs4_python2 -raf
    2 $cp /usr/lib/python3/dist-packages/bs4 /nfsroot/bs4_python3 -raf
    如果遇到问题,也可以采用源码安装的方式, 可以到
    下载最新的BeautifulSoup4版本, 我下载的是https://www.crummy.com/software/BeautifulSoup/bs4/download/4.5/beautifulsoup4-4.5.3.tar.gz,然后解压缩:
    1 $tar -xf beautifulsoup4-4.5.3.tar.gz 
    2 $ls beautifulsoup4-4.5.3
    3 AUTHORS.txt  beautifulsoup4.egg-info/  bs4/  convert-py3k*  COPYING.txt  doc/  doc.zh/  MANIFEST.in  NEWS.txt  PKG-INFO  README.txt  scripts/  setup.cfg  setup.py  test-all-versions*  TODO.txt
    在顶层目录下的bs4是用于Python2的,然后通过工具convert-py3k可以生成Python3版本的:
    cd beautifulsoup4-4.5.3/
    ./convert-py3k

    在目录py3k下面的bs4就是用于Python3的,我们可以将这两个bs4分别拷贝到共享目录下:

    $cp -raf bs4 /nfsroot/bs4_python2
    $cp -raf py3k/bs4 /nfsroot/bs4_python3
    同时也应该给PC上面拷贝一份:
    sudo cp -raf bs4 /usr/local/lib/python2.7/site-packages/
    sudo cp -raf py3k/bs4 /usr/local/lib/python3.6/site-packages/
    2、然后将对应版本bs4放到板子上面
    1 [root@vexpress ~]# mount -t nfs -o nolock 192.168.1.100:/nfsroot /mnt
    2 [root@vexpress ~]# cp -raf /mnt/bs4_python2 /usr/lib/python2.7/site-packages/bs4
    3 [root@vexpress ~]# cp -raf /mnt/bs4_python3/ /usr/lib/python3.6/site-packages/bs4
    验证有没有问题, 执行import bs4:
     1 [root@vexpress ~]# python2
     2 Python 2.7.13 (default, Mar 24 2017, 17:04:57) 
     3 [GCC 4.8.3 20140320 (prerelease)] on linux2
     4 Type "help", "copyright", "credits" or "license" for more information.
     5 >>> import bs4
     6 >>> 
     7 [root@vexpress ~]# python3
     8 Python 3.6.0 (default, Mar 24 2017, 17:02:49) 
     9 [GCC 4.8.3 20140320 (prerelease)] on linux
    10 Type "help", "copyright", "credits" or "license" for more information.
    11 >>> import bs4
    12 >>> 

    如果导入的时候没有报错,表示一切正常。

    3、编写测试程序
    bs4.py3: Python3版本
    1 #!/usr/bin/env python3
    2 
    3 from urllib.request import urlopen
    4 from bs4 import BeautifulSoup
    5 html = urlopen("http://www.pythonscraping.com/pages/page1.html")
    6 bsObj = BeautifulSoup(html.read(), "html.parser")
    7 print(bsObj.h1)
    bs4.py2:Python2版本
    1 #!/usr/bin/env python2
    2 
    3 from urllib2 import urlopen
    4 from bs4 import BeautifulSoup
    5 html = urlopen("http://www.pythonscraping.com/pages/page1.html")
    6 bsObj = BeautifulSoup(html.read(), "html.parser")
    7 print(bsObj.h1)
    运行:
    1 [root@vexpress ~]# ./bs4.py3
    2 <h1>An Interesting Title</h1>
    3 [root@vexpress ~]# ./bs4.py2
    4 <h1>An Interesting Title</h1>
     
    完。
  • 相关阅读:
    IPv6基础介绍
    SNMP(Simple Network Mnagement Protocol)——简单网络管理协议详解
    GRE(Generic Routing Encapsulation)——通用路由封装协议详解
    NAT(Network Address Translation)网络地址转换详解
    PPPoE(Point to Point Protocol over Ethernet)——以太网上的点对点协议详解
    链路聚合详解——Link Aggregation
    MongoDB快速copy笔记
    MongoDB导入导出和踩过的坑
    Linux离线安装RabbitMQ
    VSCode 开发、运行和调试
  • 原文地址:https://www.cnblogs.com/pengdonglin137/p/6812620.html
Copyright © 2011-2022 走看看