zoukankan      html  css  js  c++  java
  • curl 重定向问题

    今天在curl一个网站的时候遇到一个奇怪的问题,下面是输出:

    lxg@lxg-X240:~$ curl -L http://www.yngs.gov.cn/ -v
    * Hostname was NOT found in DNS cache
    * Trying 116.52.12.163…
    * Connected to www.yngs.gov.cn (116.52.12.163) port 80 (#0)
    GET / HTTP/1.1
    User-Agent: curl/7.38.0
    Host: www.yngs.gov.cn
    Accept: /

    < HTTP/1.1 302 Moved Temporarily
    < Date: Wed, 04 Nov 2015 14:08:49 GMT
    < Transfer-Encoding: chunked
    < Location: http://www.yngs.gov.cn/newWeb/template/index.jsp
    < Content-Type: text/html; charset=UTF-8
    < Set-Cookie: JSESSIONID=SLyTW6RR3R7zPNkkvzvpj12Q1snzzvNFQjYPDbDhYbvgTXWhSnff!-995202664; path=/; HttpOnly
    < X-Powered-By: *********
    < Set-Cookie: SANGFOR_AD=20111157; path=/
    <
    * Ignoring the response-body
    * Connection #0 to host www.yngs.gov.cn left intact
    * Issue another request to this URL: ‘http://www.yngs.gov.cn/newWeb/template/index.jsp
    * Found bundle for host www.yngs.gov.cn: 0xb89840c0
    * Re-using existing connection! (#0) with host www.yngs.gov.cn
    * Connected to www.yngs.gov.cn (116.52.12.163) port 80 (#0)
    GET /newWeb/template/index.jsp HTTP/1.1
    User-Agent: curl/7.38.0
    Host: www.yngs.gov.cn
    Accept: /
    ……… //上面的输出一直重复
    * Ignoring the response-body
    * Connection #0 to host www.yngs.gov.cn left intact
    * Maximum (50) redirects followed
    curl: (47) Maximum (50) redirects followed

    最后的错误显示超过了curl设定的最大50次跳转。
    从上面的输出来看访问http://www.yngs.gov.cn/的时候返回302跳转,跳转的url为http://www.yngs.gov.cn/newWeb/template/index.jsp,但是接着访问 http://www.yngs.gov.cn/newWeb/template/index.jsp的时候还是返回同样的302跳转,跳转后的地址是目标自身,这样肯定就会一直在 http://www.yngs.gov.cn/newWeb/template/index.jsp这个url上跳转,当超过curl设定的默认最大跳转次数50以后就异常结束了。
    既然curl有问题那么试一下wget命令看看吧,看这个命令是否也是会遇到同样的错误结果:

    lxg@lxg-X240:~$ wget http://www.yngs.gov.cn/ –debug

    —request begin—
    GET / HTTP/1.1
    User-Agent: Wget/1.16.1 (linux-gnu)
    Accept: /
    Accept-Encoding: identity
    Host: www.yngs.gov.cn
    Connection: Keep-Alive
    —request end—

    —response begin—
    HTTP/1.1 302 Moved Temporarily
    Date: Wed, 04 Nov 2015 14:18:51 GMT
    Transfer-Encoding: chunked
    Location: http://www.yngs.gov.cn/newWeb/template/index.jsp
    Content-Type: text/html; charset=UTF-8
    Set-Cookie: JSESSIONID=7JJTW6TLpKRF0vyNXtRpQrnZffkgDfB0vh6vDzQ9jhGNvRsmZxyv!-1122044597; path=/; HttpOnly
    X-Powered-By: *********
    Set-Cookie: SANGFOR_AD=20111151; path=/
    —response end—

    302 Moved Temporarily

    Stored cookie www.yngs.gov.cn -1 (ANY) / <session> <insecure> [expiry none] JSESSIONID 7JJTW6TLpKRF0vyNXtRpQrnZffkgDfB0vh6vDzQ9jhGNvRsmZxyv!-1122044597

    Stored cookie www.yngs.gov.cn -1 (ANY) / <session> <insecure> [expiry none] SANGFOR_AD 20111151
    Registered socket 3 for persistent reuse.
    URI content encoding = “UTF-8”
    位置:http://www.yngs.gov.cn/newWeb/template/index.jsp [跟随至新的 URL]

    URI content encoding = None
    –2015-11-04 22:23:09– http://www.yngs.gov.cn/newWeb/template/index.jsp
    再次使用存在的到 www.yngs.gov.cn:80 的连接。
    Reusing fd 3.

    —request begin—
    GET /newWeb/template/index.jsp HTTP/1.1
    User-Agent: Wget/1.16.1 (linux-gnu)
    Accept: /
    Accept-Encoding: identity
    Host: www.yngs.gov.cn
    Connection: Keep-Alive
    Cookie: JSESSIONID=7JJTW6TLpKRF0vyNXtRpQrnZffkgDfB0vh6vDzQ9jhGNvRsmZxyv!-1122044597; SANGFOR_AD=20111151
    —request end—

    —response begin—
    HTTP/1.1 200 OK
    Date: Wed, 04 Nov 2015 14:18:51 GMT
    Transfer-Encoding: chunked
    Content-Type: text/html; charset=UTF-8
    X-Powered-By: *********
    —response end—
    200 OK

    上面是wget的执行结果(去掉了一些多余的输出),我们看到wget能正常的获取到http://www.yngs.gov.cn/的结果,并没有出现curl遇到的死循环跳转的问题。那么也就是说这个网站本身是没有问题的,只是我们访问的时候可能缺少了一些参数。
    接着我对比了一下curlwgetrequestresponse信息,我发现两者在第一次请求http://www.yngs.gov.cn/的时候requestresponse都是差不多的,不同的可能就是user-agent。但是当再次请求302返回回来的redirect url http://www.yngs.gov.cn/newWeb/template/index.jsp的时候两者的request中的参数就有一些不一样了,wget的请求中是把第一次响应返回Cookie带上了,但是curl却是Ignoring the response-body,忽略了第一次响应返回的数据,第二次请求的时候没有带上第一次请求返回的Cookie。
    此时基本可以判断是因为curl访问的时候默认忽略了response返回的数据,redirect url的时候没有设置Cookie导致的,那么怎么来验证呢?
    第一种方法就是禁掉wget的Cookie看还能否正常获取内容:

    lxg@lxg-X240:~$ wget http://www.yngs.gov.cn/ –debug –no-cookies
    Setting –cookies (cookies) to 0
    —request begin—
    GET / HTTP/1.1
    User-Agent: Wget/1.16.1 (linux-gnu)
    Accept: /
    Accept-Encoding: identity
    Host: www.yngs.gov.cn
    Connection: Keep-Alive
    —request end—

    —response begin—
    HTTP/1.1 302 Moved Temporarily
    Date: Wed, 04 Nov 2015 14:43:41 GMT
    Transfer-Encoding: chunked
    Location: http://www.yngs.gov.cn/newWeb/template/index.jsp
    Content-Type: text/html; charset=UTF-8
    Set-Cookie: JSESSIONID=SDLtW6ZdvQwPpqGR5mBf2N1TxChNlySvTN8lhDBTQpyP3KvDdr0R!-170174379; path=/; HttpOnly
    X-Powered-By: *********
    Set-Cookie: SANGFOR_AD=20111158; path=/

    —response end—
    –2015-11-04 22:46:33– http://www.yngs.gov.cn/newWeb/template/index.jsp
    再次使用存在的到 www.yngs.gov.cn:80 的连接。
    Reusing fd 3.

    —request begin—
    GET /newWeb/template/index.jsp HTTP/1.1
    User-Agent: Wget/1.16.1 (linux-gnu)
    Accept: /
    Accept-Encoding: identity
    Host: www.yngs.gov.cn
    Connection: Keep-Alive
    —request end—

    —response begin—
    HTTP/1.1 302 Moved Temporarily
    Date: Wed, 04 Nov 2015 14:42:16 GMT
    Transfer-Encoding: chunked
    Location: http://www.yngs.gov.cn/newWeb/template/index.jsp
    Content-Type: text/html; charset=UTF-8
    Set-Cookie: JSESSIONID=YQJTW6ZLpWhm7pfr3LzL6lkQdQ1XbnBMCHQhjn7vZ2yptMJvsJvW!-1122044597; path=/; HttpOnly
    X-Powered-By: *********
    Set-Cookie: SANGFOR_AD=20111151; path=/
    —response end—

    …………….
    URI content encoding = None
    已超过 20 次重定向。

    我们看到wget最后也是以尝试20次跳转失败结束。
    第二中方法就是开启curl的Cookie:

    lxg@lxg-X240:~$ curl -L -b /tmp/curl.cookies http://www.yngs.gov.cn/
    * Hostname was NOT found in DNS cache
    * Trying 116.52.12.163…
    * Connected to www.yngs.gov.cn (116.52.12.163) port 80 (#0)
    GET / HTTP/1.1
    User-Agent: curl/7.38.0
    Host: www.yngs.gov.cn
    Accept: /

    < HTTP/1.1 302 Moved Temporarily
    < Date: Wed, 04 Nov 2015 14:55:53 GMT
    < Transfer-Encoding: chunked
    < Location: http://www.yngs.gov.cn/newWeb/template/index.jsp
    < Content-Type: text/html; charset=UTF-8
    * Added cookie JSESSIONID=”lswQW6cZzRtvyGkkJm0hL8RscHT98bcC3YD4f4V1RCJvLLwb2ZMJ!-1122044597” for domain www.yngs.gov.cn, path /, expire 0
    < Set-Cookie: JSESSIONID=lswQW6cZzRtvyGkkJm0hL8RscHT98bcC3YD4f4V1RCJvLLwb2ZMJ!-1122044597; path=/; HttpOnly
    < X-Powered-By: *********
    * Added cookie SANGFOR_AD=”20111151” for domain www.yngs.gov.cn, path /, expire 0
    < Set-Cookie: SANGFOR_AD=20111151; path=/
    <
    * Ignoring the response-body
    * Connection #0 to host www.yngs.gov.cn left intact
    * Issue another request to this URL: ‘http://www.yngs.gov.cn/newWeb/template/index.jsp
    * Found bundle for host www.yngs.gov.cn: 0xb8b74108
    * Re-using existing connection! (#0) with host www.yngs.gov.cn
    * Connected to www.yngs.gov.cn (116.52.12.163) port 80 (#0)
    GET /newWeb/template/index.jsp HTTP/1.1
    User-Agent: curl/7.38.0
    Host: www.yngs.gov.cn
    Accept: /
    Cookie: JSESSIONID=lswQW6cZzRtvyGkkJm0hL8RscHT98bcC3YD4f4V1RCJvLLwb2ZMJ!-1122044597; SANGFOR_AD=20111151

    < HTTP/1.1 200 OK
    < Date: Wed, 04 Nov 2015 14:55:53 GMT
    < Transfer-Encoding: chunked
    < Content-Type: text/html; charset=UTF-8
    < X-Powered-By: *********
    <
    <!DOCTYPE html>
    <html xmlns=”http://www.w3.org/1999/xhtml”>
    <head>
    ………………..

    Connection #0 to host www.yngs.gov.cn left intact
    </html>

    curl成功获取到了结果。
    从上面的结果也就验证了之前的猜想。

    curl -h
    ...
    -L/--location      Follow Location: hints (H)
        --location-trusted Follow Location: and send authentication even
                        to other hostnames (H)
    -m/--max-time <seconds> Maximum time allowed for the transfer
        --max-redirs <num> Maximum number of redirects allowed (H)
        --max-filesize <bytes> Maximum file size to download (H/F)
    ...

    curl -L --max-redirs 100000 URL

  • 相关阅读:
    判断文件结束,feof……
    第五篇 分治思想(例子待加)
    第四篇 枚举思想
    第三篇 贪心思想
    第二篇 递归思想
    第一篇 递推思想
    爬虫系列
    整数划分问题
    html中a标签做容器的问题
    H5学习小结——div+css创建电子商务静态网页
  • 原文地址:https://www.cnblogs.com/doseoer/p/5203215.html
Copyright © 2011-2022 走看看