zoukankan      html  css  js  c++  java
  • python抓取新浪首页的小例子

    参考

    廖雪峰的python教程:http://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000/001386832653051fd44e44e4f9e4ed08f3e5a5ab550358d000

    代码:

     1 #!/usr/bin/python
     2 
     3 # import module
     4 import socket
     5 import io
     6 
     7 # create TCP object
     8 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
     9 # connect sina
    10 s.connect(('www.sina.com.cn', 80))
    11 # send request
    12 s.send('GET / HTTP/1.1
    Host: www.sina.com.cn
    Connection: close
    
    ')
    13 # receive data
    14 buffer = []
    15 while True:
    16     # every time receive 1k data
    17     d = s.recv(1024)
    18     if d:
    19         buffer.append(d)
    20     else:
    21         break
    22 data = ''.join(buffer)
    23 # close socket
    24 header, html = data.split('
    
    ', 1)
    25 print header
    26 # write receive data to file
    27 with open('sina.html', 'wb') as f:
    28     f.write(html)

    主要功能是模拟浏览器访问网页服务器,并从网页服务器获取返回信息

  • 相关阅读:
    CentOS 7 安装java 环境
    CentOS 7 替换网易yum 源
    九度:题目1553:时钟
    Maximum Subarray
    职场细节
    poj2524 Ubiquitous Religions
    九度 1526:朋友圈
    程序载入
    设备管理
    操作系统系列
  • 原文地址:https://www.cnblogs.com/lit10050528/p/4070003.html
Copyright © 2011-2022 走看看