python基础教程学习笔记十五

zoukankan html css js c++ java

python基础教程学习笔记十五

Python 和万维网

1 屏幕抓取

使用urllib和re提取信息

from utllib import urlopen

import re

p=re.compile('<h3><a .*?><a .*? href="(.*?)">(.*?)</a>')

text=urlopen('http://python.org/community/jobs').read()

for url,name in p.findall(text):

print('%s (%s)' name,url)

Tidy和 XHTML解析

Tidy是用来修复不规范且随意的html的工具

#使用tidy修复html

form subprocess import Popen,PIPE

text=open('messy.html').read()

tidy=Popen('tidy',stdin=PIPE,stdout=PIPE,stderr=PIPE)

tidy=stdin.write(text)

tidy=stdin.close()

print(tidy.stdout.read())

使用HTMLParser来解析html文件

#使用htmlparser模块的屏幕抓取程序

from urllib import urlopen

from HTMLParser import HTMLParser

class Scraper(HTMLParser):

    in_h3=False

    in_link=False

    def handle_starttag(self,tag,attrs):

        attrs=dict(attrs)

        if tag='h3':

            self.in_h3=True

        if tag='a' and 'href' in attrs:

            self.in_link=True

            self.chunks=[]

            self.url=attrs['href']

    def handle_data(self,data):

        if self.in_link:

            self.chunks.append(data)

    def handle_endtag(self,tag):

        if tag='h3':

            self.in_h3=False

        if tag='a':

            if self.in_h3 and self.in_link:

                print('%s (%s)' % (''.join(self.chunks),self.url))

            self.in_link=False

text=urlopen('http://python.org/community/jobs').read()

parser=Sraper()

parser.feed(text)

parser.close()

Buautiful soup 用来解析和检查不规范的html

2 使用CGI创建动态网页

Common gateway interface 通用网关接口

A 准备网络服务器

B 加入pound bang行

Linux:

#!/usr/bin/env python或

#!/usr/bin/python

Windows:

#!c:python32python.exe

C 设置文件许可

在linux下需要进行设置,示例代码如下:

Chmod 755 someScript.cgi

简单的CGI 角本

#!D:Python32python.exe

print ('Content-type: text/html')

print() #打印空行

print('hello word!')

该程序在tomcat下测试,需要开启CGI,需要作如下修改

配置方法:

修改conf/web.xml,打开以下两个注释



    <servlet>

        <servlet-name>cgi</servlet-name>

        <servlet-class>org.apache.catalina.servlets.CGIServlet</servlet-class>

        <init-param>

          <param-name>debug</param-name>

          <param-value>0</param-value>

        </init-param>

        <init-param>

          <param-name>cgiPathPrefix</param-name>

          <param-value>WEB-INF/cgi</param-value>

        </init-param>

         <load-on-startup>5</load-on-startup>

    </servlet>

  <servlet-mapping>

        <servlet-name>cgi</servlet-name>

        <url-pattern>/cgi-bin/*</url-pattern>

    </servlet-mapping>

修改conf/context.xml,添加privileged属性

<Context privileged="true">...</context>

将cgi程序放到WEB-INF/cgi目录中

如果是linux下,要使cgi程序有可执行权限

重启tomcate服务器

通过http://localhost:8089/cgi-bin/somescript.cgi来访问程序

使用cgitb调试

#!D:Python32python.exe

#使用cgitb进行调试,在程序开发完成后要关闭

import cgitb

cgitb.enable()

print ('Content-type: text/html')

print() #打印空行

print(1/0)

print('hello word!')

页面的显示结果为:

使用cgi模块

通过html表单提供给cgi键值对,cgi模块的fileStorage类从cgi角本中获取这些字段

Form=cgi.FieldStorage()

Name=form[‘name’].value

示例代码如下:

#!D:Python32python.exe

#使用cgitb进行调试,在程序开发完成后要关闭

import cgi

import cgitb

cgitb.enable()

#取得表单的值

form=cgi.FieldStorage()

name=form.getvalue('name','word')

print ('Content-type: text/html')

print() #打印空行

#print(1/0)

print('hello ,%s!' %name)

可以直接使用get方法进行测试

http://localhost:8089/cgi-bin/somescript.cgi?name=retacn

简单的表单

示例代码如下:

#!D:Python32python.exe

#表单

import cgi

form=cgi.FieldStorage()

name=form.getvalue('name','word')

print("""Content-type: text/html

<html>

<head>

<title>Greeting Page</title>

</head>

<body>

<h1>Hello,%s!</h1>

<form action='formTest.cgi'>

Change name<input type='text' name='name'>

<input type='submit'>

</form>

</body>

""" % name)

运行结果如下:

Mod_python

它是apache网络服务器的扩展,可以让python解释器成为apache的一部分

使用mod_python可以深入apache内核

自带的web处理程序:

CGI处理程序

Psp处理程序

Publisher handler发布处理程序

安装mod_python

Cgi处理程序

Psp

发布

网络应用程序框架

Web服务正确分析

查看全文

相关阅读:
Redis 2种持久化模式的缺陷
 我看过得最易懂的一段AOP的解释
 mysql-高性能索引策略
 几款效率神器助你走上人生巅峰
 shell脚本报错："[: =: unary operator expected"
CentOS7中使用iptables
php foreach用法和实例
 shell 学习四十五天---xargs
chain issues incorrect order,EXtra certs,Contains anchor
Ubuntu 能ping通DNS 地址无法解析域名

原文地址：https://www.cnblogs.com/retacn-yue/p/6194196.html