zoukankan      html  css  js  c++  java
  • URL编码

    本文的目的是设计一个完毕URL编码的C++类。

    在我以前的项目中。我须要从VC++ 6.0应用程序中POST数据,而这些数据须要进行URL编码。

    我在MSDN中查找能依据提供的字符串生成URL编码的相关类或API。但我没有找到。因此我必须设计一个自己的URLEncode C++类。

    URLEncoder.exe是一个使用URLEncode类的MFC对话框程序。

    怎样处理

    一些特殊字符在Internet上传送是件棘手的事情, 经URL编码特殊处理。能够使全部字符安全地从Internet传送。

    比如,回车的ASCII值是13,在发送FORM数据时候这就觉得是一行数据的结束。

    通常。全部应用程序採用HTTP或HTTPS协议在client和server端传送数据。server端从client接收数据有两种基本方法:

    1、数据能够从HTTP头传送(COOKIES或作为FORM数据发送)

    2、能够包括在URL中的查询部分

    当数据包括在URL。它必须遵循URL语法进行编码。

    在WEBserver端,数据自己主动解码。考虑一下以下的URL,哪个数据是作为查询參数。

    比如:http://WebSite/ResourceName?

    Data=Data

    WebSite是URL名称

    ResourceName能够是ASP或Servlet名称

    Data是须要发送的数据。假设MIME类型是Content-Type: application/x-www-form-urlencoded,则要求进行编码。

    RFC 1738

    RFC 1738指明了统一资源定位(URLs)中的字符应该是US-ASCII字符集的子集。这是受HTML的限制,还有一方面,同意在文档中使用全部ISO-8859-1(ISO-Latin)字符集。这将意味着在HTML FORM里POST的数据(或作为查询字串的一部分),全部HTML编码必须被编码。

    ISO-8859-1 (ISO-Latin)字符集

    在下表中。包括了完整的ISO-8859-1 (ISO-Latin)字符集,表格提供了每一个字符范围(10进制),描写叙述,实际值,十六进制值,HTML结果。某个范围中的字符是否安全。

    Character range(decimal) Type Values Safe/Unsafe
    0-31 ASCII Control Characters These characters are not printable Unsafe
    32-47 Reserved Characters '' ''!?

    #$%&''()*+,-./

    Unsafe
    48-57 ASCII Characters and Numbers 0-9 Safe
    58-64 Reserved Characters :;<=>?

    @

    Unsafe
    65-90 ASCII Characters A-Z Safe
    91-96 Reserved Characters []^_` Unsafe
    97-122 ASCII Characters a-z Safe
    123-126 Reserved Characters {|}~ Unsafe
    127 Control Characters '' '' Unsafe
    128-255 Non-ASCII Characters '' '' Unsafe

    全部不安全的ASCII字符都须要编码。比如,范围(32-47, 58-64, 91-96, 123-126)。
    下表描写叙述了这些字符为什么不安全。

    Character Unsafe Reason Character Encode
    "<" Delimiters around URLs in free text %3C
    > Delimiters around URLs in free text %3E
    . Delimits URLs in some systems %22
    # It is used in the World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. %23
    { Gateways and other transport agents are known to sometimes modify such characters %7B
    } Gateways and other transport agents are known to sometimes modify such characters %7D
    | Gateways and other transport agents are known to sometimes modify such characters %7C
    Gateways and other transport agents are known to sometimes modify such characters %5C
    ^ Gateways and other transport agents are known to sometimes modify such characters %5E
    ~ Gateways and other transport agents are known to sometimes modify such characters %7E
    [ Gateways and other transport agents are known to sometimes modify such characters %5B
    ] Gateways and other transport agents are known to sometimes modify such characters %5D
    ` Gateways and other transport agents are known to sometimes modify such characters %60
    + Indicates a space (spaces cannot be used in a URL) %20
    / Separates directories and subdirectories %2F
    ? Separates the actual URL and the parameters %3F
    & Separator between parameters specified in the URL %26

    怎样实现

    字符的URL编码是将字符转换到8位16进制并在前面加上''%''前缀。比如。US-ASCII字符集中空格是10进制的32或16进制的20。因此,URL编码是%20。

    URLEncode: URLEncode是一个C++类,来实现字符串的URL编码。CURLEncode类包括例如以下函数:

    isUnsafeString

    decToHex

    convert

    URLEncode

    URLEncode()函数完毕编码过程,URLEncode检查每一个字符,看是否安全。

    假设不安全将用%16进制值进行转换并加入
    到原始字符串中。

    代码片断:

    class CURLEncode
    {
    private:
    static CString csUnsafeString;
    CString (char num, int radix);
    bool isUnsafe(char compareChar);
    CString convert(char val);
     
    public:
    CURLEncode() { };
    virtual ~CURLEncode() { };
    CString (CString vData);
    };
     
    bool CURLEncode::isUnsafe(char compareChar)
    {
    bool bcharfound = false;
    char tmpsafeChar;
    int m_strLen = 0;
     
    m_strLen = csUnsafeString.GetLength();
    for(int ichar_pos = 0; ichar_pos < m_strLen ;ichar_pos++)
    {
    tmpsafeChar = csUnsafeString.GetAt(ichar_pos);
    if(tmpsafeChar == compareChar)
    {
    bcharfound = true;
    break;
    }
    }
    int char_ascii_value = 0;
    //char_ascii_value = __toascii(compareChar);
    char_ascii_value = (int) compareChar;
     
    if(bcharfound == false &&  char_ascii_value > 32 &&
    char_ascii_value < 123)
    {
    return false;
    }
    // found no unsafe chars, return false
    else
    {
    return true;
    }
     
    return true;
    }
     
    CString CURLEncode::decToHex(char num, int radix)
    {
    int temp=0;
    CString csTmp;
    int num_char;
     
    num_char = (int) num;
    if (num_char < 0)
    num_char = 256 + num_char;
     
    while (num_char >= radix)
    {
    temp = num_char % radix;
    num_char = (int)floor(num_char / radix);
    csTmp = hexVals[temp];
    }
     
    csTmp += hexVals[num_char];
     
    if(csTmp.GetLength() < 2)
    {
    csTmp += ''0'';
    }
     
    CString strdecToHex(csTmp);
    // Reverse the String
    strdecToHex.MakeReverse();
     
    return strdecToHex;
    }
     
    CString CURLEncode::convert(char val)
    {
    CString csRet;
    csRet += "%";
    csRet += decToHex(val, 16);
    return  csRet;
    }



    參考:

    URL编码: http://www.blooberry.com/indexdot/html/topics/urlencoding.htm.

    RFC 1866: The HTML 2.0 规范 (纯文本). 附录包括了字符表: http://www.rfc-editor.org/rfc/rfc1866.txt.

    Web HTML 2.0 版本号(RFC 1866) : http://www.w3.org/MarkUp/html-spec/html-spec_13.html.

    The HTML 3.2 (Wilbur) 建议: http://www.w3.org/MarkUp/Wilbur/.

    The HTML 4.0 建议: http://www.w3.org/TR/REC-html40/.

    W3C HTML 国际化区域: http://www.w3.org/International/O-HTML.html.
  • 相关阅读:
    面向对象、构造函数的区别
    写一个function,清除字符串前后的空格。(兼容所有浏览器)
    两个DIV高度自适应方法(左右两个DIV高度一样)
    js数组去重
    input框处理删除小图标的功能
    查找显示高亮
    JSON.parse()和JSON.stringify()
    jquery封装
    怎么理解HTML语义化
    html5语义化标签
  • 原文地址:https://www.cnblogs.com/gccbuaa/p/6747392.html
Copyright © 2011-2022 走看看