zoukankan      html  css  js  c++  java
  • 一个简单抓取糗事百科糗事的小程序

    看糗事百科是从2008年开始的,自从买了智能手机以后,就用手机看了,想着糗百的网站上下都有广告,自己只想看糗事,不想看广告,顺便还能节省下流量,就能能不能做个程序把糗百的糗事抓下来,其他的都去掉,于是就写了下面的这段.希望糗百大神们不要追究我的责任啊,我只是研究了一下下.

    前台文件:

    <%@ Page Language="C#" AutoEventWireup="true" CodeBehind="Default.aspx.cs" Inherits="WebTest._Default" EnableViewState="false" %>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head runat="server">
      <meta name="viewport" content="width=device-width, initial-scale=1.0" />
      <title>糗事百科</title>
      <style type="text/css">
        body{margin:5px;font:12px arial,sinsun;background:#fff;}
        img{border:none;}
        a{text-decoration:none;}
        .qiushi{margin:5px 0;padding:10px;border-bottom:1px solid #ece5d8;}
      </style>
    </head>
    <body><form id="bodyForm" runat="server"></form></body></html>

    后台代码:

    1 protected void Page_Load(object sender, EventArgs e)
    2 {
    3       string URI = "http://wap3.qiushibaike.com";
    4       string pageInfo = Request.QueryString["param"] == null ? string.Empty : Request.QueryString["param"].ToString().Trim();
    5       URI = URI + pageInfo;
    6 
    7       bodyForm.InnerHtml = Server.HtmlDecode(getQiushi(URI));
    8 }
    getQiushi
     1 private string getQiushi(string URI)
     2 {
     3       WebRequest request = WebRequest.Create(URI);
     4       WebResponse result = null;
     5       result = request.GetResponse();
     6       Stream ReceiveStream = result.GetResponseStream();
     7       StreamReader sr = new StreamReader(ReceiveStream);
     8       string resultstring = sr.ReadToEnd();
     9       StringBuilder responseString = new StringBuilder();
    10 
    11       Regex regContent = new Regex("<div class=\"qiushi\">(?<content>[\\s\\S]+?)</div>");   //匹配糗事内容
    12       Regex regComment = new Regex("<p class=\"vote\">(?<content>[\\s\\S]+?)</p>");         //匹配评论
    13       Regex regUserInfo = new Regex("<p class=\"user\">(?<content>[\\s\\S]+?)</p>");        //匹配发布者信息
    
    16       Regex regLinks = new Regex("(href=\")(/[^\\s]*)(\")");                                //匹配链接
    17       Regex regPrevPage = new Regex("<a href=\".*?\">上一页</a>");                          //匹配换页
    18       Regex regNextPage = new Regex("<a href=\".*?\">下一页</a>");
    19       Regex regBlankLine = new Regex(@"[\n|\r|\r\n]");                                      //匹配换行
    20       MatchCollection mcContent = regContent.Matches(resultstring);
    21       Match mcPrevPage = regPrevPage.Match(resultstring);
    22       Match mcNextPage = regNextPage.Match(resultstring);
    23       string prevPage = "<a href=\"?param=" + mcPrevPage.ToString().Replace("<a href=\"", "").Replace("\">上一页</a>", "") + "\">上一页</a>&nbsp;&nbsp;";
    24       string nextPage = "<a href=\"?param=" + mcNextPage.ToString().Replace("<a href=\"", "").Replace("\">下一页</a>", "") + "\">下一页</a>";
    25 
    26       for (int i = 0; i < mcContent.Count; i++)
    27       {
    28         string content = mcContent[i].ToString();
    29         content = Regex.Replace(content, regComment.ToString(), "", RegexOptions.IgnoreCase);
    30         content = Regex.Replace(content, regUserInfo.ToString(), "", RegexOptions.IgnoreCase);
    
    32         content = Regex.Replace(content, regLinks.ToString(), "href=\"?param=$2\"", RegexOptions.IgnoreCase);
    33         content = Regex.Replace(content, regBlankLine.ToString(),"", RegexOptions.IgnoreCase);
    34 
    35         responseString.Append(content);
    
    37       }
    38 
    39       responseString.Append("<div style=\"text-align:center\">" + prevPage);
    40       responseString.Append(nextPage + "</div>");
    41 
    42       return responseString.ToString();
    43 }

     Page Load里面的那个param参数主要是为了获取上一页 ,下一页和标签的,现在基本的功能都实现了,没有广告了,不过不能查看留言.

  • 相关阅读:
    05day02wdt
    05day02pwm
    05day01ioctl_led
    04clock_06semqphore
    04lock_05seqlock
    04lock_03rwlock
    [git]入门-工作区、暂存区、版本库
    [git]入门-创建版本库
    [linux-脚本]shebang(shabang #!)
    [ffmpeg]安装
  • 原文地址:https://www.cnblogs.com/cnsnet/p/2518058.html
Copyright © 2011-2022 走看看