zoukankan      html  css  js  c++  java
  • RFC1738

    原文地址: http://www.ietf.org/rfc/rfc1738.txt

    Network Working Group                                     T.Berners-Lee

    Request for Comments: 1738                                          CERN

    Category: Standards Track                                    L. Masinter

                                                          Xerox Corporation

                                                                 M. McCahill

                                                    University of Minnesota

                                                                    Editors

                                                              December 1994

                        Uniform Resource Locators (URL)

    Status of this Memo

       This document specifiesan Internet standards track protocol for the

       Internet community, andrequests discussion and suggestions for

       improvements.  Please refer to the current edition of the"Internet

       Official ProtocolStandards" (STD 1) for the standardization state

       and status of thisprotocol.  Distribution of this memo isunlimited.

    Abstract

       This document specifies aUniform Resource Locator (URL), the syntax

       and semantics offormalized information for location and access of

       resources via theInternet.

    1. Introduction

       This document describesthe syntax and semantics for a compact string

       representation for aresource available via the Internet. These

       strings are called"Uniform Resource Locators" (URLs).

       The specification isderived from concepts introduced by the World-

       Wide Web globalinformation initiative, whose use of such objects

       dates from 1990 and isdescribed in "Universal Resource Identifiers

       in WWW", RFC 1630.The specification of URLs is designed to meet the

       requirements laid out in"Functional Requirements for Internet

       Resource Locators"[12].

       This document was writtenby the URI working group of the Internet

       Engineering TaskForce.  Comments may be addressed to theeditors, or

       to the URI-WG<uri@bunyip.com>. Discussions of the group are archived

       at<URL:http://www.acl.lanl.gov/URI/archive/uri-archive.index.html>

    Berners-Lee, Masinter & McCahill                                [Page 1]

    RFC 1738            UniformResource Locators (URL)        December1994

    2. General URL Syntax

       Just as there are manydifferent methods of access to resources,

       there are several schemesfor describing the location of such

       resources.

       The generic syntax forURLs provides a framework for new schemes to

       be established usingprotocols other than those defined in this

       document.

       URLs are used to `locate'resources, by providing an abstract

       identification of theresource location.  Having located aresource,

       a system may perform avariety of operations on the resource, as

       might be characterized bysuch words as `access', `update',

       `replace', `find attributes'.In general, only the `access' method

       needs to be specified forany URL scheme.

    2.1. The main parts of URLs

       A full BNF description ofthe URL syntax is given in Section 5.

       In general, URLs arewritten as follows:

           <scheme>:<scheme-specific-part>

       A URL contains the nameof the scheme being used (<scheme>) followed

       by a colon and then astring (the <scheme-specific-part>) whose

       interpretation depends onthe scheme.

       Scheme names consist of asequence of characters. The lower case

       letters"a"--"z", digits, and the characters plus ("+"),period

       ("."), andhyphen ("-") are allowed. For resiliency, programs

       interpreting URLs shouldtreat upper case letters as equivalent to

       lower case in schemenames (e.g., allow "HTTP" as well as "http").

    2.2. URL Character Encoding Issues

       URLs are sequences ofcharacters, i.e., letters, digits, and special

       characters. A URLs may berepresented in a variety of ways: e.g., ink

       on paper, or a sequenceof octets in a coded character set. The

       interpretation of a URLdepends only on the identity of the

       characters used.

       In most URL schemes, thesequences of characters in different parts

       of a URL are used torepresent sequences of octets used in Internet

       protocols. For example,in the ftp scheme, the host name, directory

       name and file names aresuch sequences of octets, represented by

       parts of the URL.  Within those parts, an octet may berepresented by

    Berners-Lee, Masinter & McCahill                                [Page 2]

    RFC 1738            UniformResource Locators (URL)        December1994

       the chararacter which hasthat octet as its code within the US-ASCII

       [20] coded character set.

       In addition, octets maybe encoded by a character triplet consisting

       of the character"%" followed by the two hexadecimal digits (from

      "0123456789ABCDEF") which forming the hexadecimal value of theoctet.

       (The characters"abcdef" may also be used in hexadecimal encodings.)

       Octets must be encoded ifthey have no corresponding graphic

       character within theUS-ASCII coded character set, if the use of the

       corresponding characteris unsafe, or if the corresponding character

       is reserved for someother interpretation within the particular URL

       scheme.

       No corresponding graphicUS-ASCII:

       URLs are written onlywith the graphic printable characters of the

       US-ASCII coded characterset. The octets 80-FF hexadecimal are not

       used in US-ASCII, and theoctets 00-1F and 7F hexadecimal represent

       control characters; thesemust be encoded.

       Unsafe:

       Characters can be unsafefor a number of reasons.  The space

       character is unsafebecause significant spaces may disappear and

       insignificant spaces maybe introduced when URLs are transcribed or

       typeset or subjected tothe treatment of word-processing programs.

       The characters"<" and ">" are unsafe because they are used as the

       delimiters around URLs infree text; the quote mark (""") is used to

       delimit URLs in somesystems.  The character "#" isunsafe and should

       always be encoded becauseit is used in World Wide Web and in other

       systems to delimit a URLfrom a fragment/anchor identifier that might

       follow it.  The character "%" is unsafe becauseit is used for

       encodings of othercharacters.  Other characters are unsafebecause

       gateways and othertransport agents are known to sometimes modify

       such characters. Thesecharacters are "{", "}", "|", "\","^", "~",

       "[","]", and "`".

       All unsafe charactersmust always be encoded within a URL. For

       example, the character"#" must be encoded within URLs even in

       systems that do notnormally deal with fragment or anchor

       identifiers, so that ifthe URL is copied into another system that

       does use them, it willnot be necessary to change the URL encoding.

    Berners-Lee, Masinter & McCahill                                [Page 3]

    RFC 1738            UniformResource Locators (URL)        December1994

       Reserved:

       Many URL schemes reservecertain characters for a special meaning:

       their appearance in thescheme-specific part of the URL has a

       designated semantics. Ifthe character corresponding to an octet is

       reserved in a scheme, theoctet must be encoded.  The characters";",

       "/","?", ":", "@", "=" and"&" are the characters which may be

       reserved for specialmeaning within a scheme. No other characters may

       be reserved within ascheme.

       Usually a URL has thesame interpretation when an octet is

       represented by acharacter and when it encoded. However, this is not

       true for reservedcharacters: encoding a character reserved for a

       particular scheme maychange the semantics of a URL.

       Thus, only alphanumerics,the special characters "$-_.+!*'(),", and

       reserved characters usedfor their reserved purposes may be used

       unencoded within a URL.

       On the other hand,characters that are not required to be encoded

       (including alphanumerics)may be encoded within the scheme-specific

       part of a URL, as long asthey are not being used for a reserved

       purpose.

    2.3 Hierarchical schemes and relative links

       In some cases, URLs areused to locate resources that contain

       pointers to otherresources. In some cases, those pointers are

       represented as relativelinks where the expression of the location of

       the second resource is interms of "in the same place as this one

       except with the followingrelative path". Relative links are not

       described in thisdocument. However, the use of relative links

       depends on the originalURL containing a hierarchical structure

       against which therelative link is based.

       Some URL schemes (such asthe ftp, http, and file schemes) contain

       names that can beconsidered hierarchical; the components of the

       hierarchy are separatedby "/".

    Berners-Lee, Masinter & McCahill                                [Page 4]

    RFC 1738            UniformResource Locators (URL)        December 1994

    3. Specific Schemes

       The mapping for someexisting standard and experimental protocols is

       outlined in the BNFsyntax definition.  Notes on particularprotocols

       follow. The schemescovered are:

       ftp                     File Transfer protocol

       http                    Hypertext Transfer Protocol

       gopher                  The Gopher protocol

       mailto                  Electronic mail address

       news                    USENET news

       nntp                    USENET news using NNTPaccess

       telnet                  Reference to interactivesessions

       wais                    Wide Area InformationServers

       file                    Host-specific file names

       prospero                Prospero Directory Service

       Other schemes may bespecified by future specifications. Section 4 of

       this document describeshow new schemes may be registered, and lists

       some scheme names thatare under development.

    3.1. Common Internet Scheme Syntax

       While the syntax for therest of the URL may vary depending on the

       particular schemeselected, URL schemes that involve the direct use

       of an IP-based protocolto a specified host on the Internet use a

       common syntax for thescheme-specific data:

           //<user>:<password>@<host>:<port>/<url-path>

       Some or all of the parts"<user>:<password>@", ":<password>",

      ":<port>", and "/<url-path>" may beexcluded.  The scheme specific

       data start with a doubleslash "//" to indicate that it complies with

       the common Internetscheme syntax. The different components obey the

       following rules:

        user

            An optional username. Some schemes (e.g., ftp) allow the

            specification of auser name.

        password

            An optionalpassword. If present, it follows the user

            name separated fromit by a colon.

       The user name (andpassword), if present, are followed by a

       commercial at-sign"@". Within the user and password field, any ":",

       "@", or"/" must be encoded.

    Berners-Lee, Masinter & McCahill                                [Page 5]

    RFC 1738            UniformResource Locators (URL)        December1994

       Note that an empty username or password is different than no user

       name or password; thereis no way to specify a password without

       specifying a user name.E.g., <URL:ftp://@host.com/> has an empty

       user name and nopassword, <URL:ftp://host.com/> has no user name,

       while<URL:ftp://foo:@host.com/> has a user name of "foo" and an

       empty password.

        host

            The fully qualifieddomain name of a network host, or its IP

            address as a set offour decimal digit groups separated by

            ".". Fullyqualified domain names take the form as described

            in Section 3.5 of RFC 1034 [13] and Section2.1 of RFC 1123

            [5]: a sequence ofdomain labels separated by ".", each domain

            label starting andending with an alphanumerical character and

            possibly alsocontaining "-" characters. The rightmost domain

            label will neverstart with a digit, though, which

            syntacticallydistinguishes all domain names from the IP

            addresses.

        port

            The port number toconnect to. Most schemes designate

            protocols that havea default port number. Another port number

            may optionally besupplied, in decimal, separated from the

            host by a colon. Ifthe port is omitted, the colon is as well.

        url-path

            The rest of thelocator consists of data specific to the

            scheme, and is knownas the "url-path". It supplies the

            details of how thespecified resource can be accessed. Note

            that the"/" between the host (or port) and the url-path is

            NOT part of theurl-path.

       The url-path syntaxdepends on the scheme being used, as does the

       manner in which it isinterpreted.

    3.2. FTP

       The FTP URL scheme isused to designate files and directories on

       Internet hosts accessibleusing the FTP protocol (RFC959).

       A FTP URL follow thesyntax described in Section 3.1.  If:<port> is

       omitted, the portdefaults to 21.

    Berners-Lee, Masinter & McCahill                                [Page 6]

    RFC 1738            UniformResource Locators (URL)        December1994

    3.2.1. FTP Name and Password

       A user name and passwordmay be supplied; they are used in the ftp

       "USER" and"PASS" commands after first making the connection to the

       FTP server.  If no user name or password is supplied andone is

       requested by the FTPserver, the conventions for "anonymous" FTP are

       to be used, as follows:

            The user name"anonymous" is supplied.

            The password issupplied as the Internet e-mail address

            of the end useraccessing the resource.

       If the URL supplies auser name but no password, and the remote

       server requests apassword, the program interpreting the FTP URL

       should request one fromthe user.

    3.2.2. FTP url-path

       The url-path of a FTP URLhas the following syntax:

           <cwd1>/<cwd2>/.../<cwdN>/<name>;type=<typecode>

       Where <cwd1>through <cwdN> and <name> are (possibly encoded) strings

       and <typecode> isone of the characters "a", "i", or "d".  The part

       ";type=<typecode>"may be omitted. The <cwdx> and <name> parts may be

       empty. The whole url-pathmay be omitted, including the "/"

       delimiting it from theprefix containing user, password, host, and

       port.

       The url-path isinterpreted as a series of FTP commands as follows:

          Each of the<cwd> elements is to be supplied, sequentially, as the

          argument to a CWD(change working directory) command.

          If the typecode is"d", perform a NLST (name list) command with

          <name> as theargument, and interpret the results as a file

          directory listing.

          Otherwise, perform aTYPE command with <typecode> as the argument,

          and then access thefile whose name is <name> (for example, using

          the RETR command.)

       Within a name or CWDcomponent, the characters "/" and ";" are

       reserved and must beencoded. The components are decoded prior to

       their use in the FTPprotocol.  In particular, if theappropriate FTP

       sequence to access a particularfile requires supplying a string

       containing a"/" as an argument to a CWD or RETR command, it is

    Berners-Lee, Masinter & McCahill                                [Page 7]

    RFC 1738            UniformResource Locators (URL)        December 1994

       necessary to encode each"/".

       For example, the URL<URL:ftp://myname@host.dom/%2Fetc/motd> is

       interpreted by FTP-ing to"host.dom", logging in as "myname"

       (prompting for a passwordif it is asked for), and then executing

       "CWD /etc" andthen "RETR motd". This has a different meaning from

      <URL:ftp://myname@host.dom/etc/motd> which would "CWDetc" and then

       "RETR motd";the initial "CWD" might be executed relative to the

       default directory for"myname". On the other hand,

      <URL:ftp://myname@host.dom//etc/motd>, would "CWD " witha null

       argument, then "CWDetc", and then "RETR motd".

       FTP URLs may also be usedfor other operations; for example, it is

       possible to update a fileon a remote file server, or infer

       information about it fromthe directory listings. The mechanism for

       doing so is not spelledout here.

    3.2.3. FTP Typecode is Optional

       The entire;type=<typecode> part of a FTP URL is optional. If it is

       omitted, the clientprogram interpreting the URL must guess the

       appropriate mode to use.In general, the data content type of a file

       can only be guessed fromthe name, e.g., from the suffix of the name;

       the appropriate type codeto be used for transfer of the file can

       then be deduced from thedata content of the file.

    3.2.4 Hierarchy

       For some file systems,the "/" used to denote the hierarchical

       structure of the URLcorresponds to the delimiter used to construct a

       file name hierarchy, andthus, the filename will look similar to the

       URL path. This does NOTmean that the URL is a Unix filename.

    3.2.5. Optimization

       Clients accessingresources via FTP may employ additional heuristics

       to optimize the interaction.For some FTP servers, for example, it

       may be reasonable to keepthe control connection open while accessing

       multiple URLs from thesame server. However, there is no common

       hierarchical model to theFTP protocol, so if a directory change

       command has been given,it is impossible in general to deduce what

       sequence should be givento navigate to another directory for a

       second retrieval, if thepaths are different.  The only reliable

       algorithm is todisconnect and reestablish the control connection.

    Berners-Lee, Masinter & McCahill                                [Page 8]

    RFC 1738            UniformResource Locators (URL)        December1994

    3.3. HTTP

       The HTTP URL scheme isused to designate Internet resources

       accessible using HTTP(HyperText Transfer Protocol).

       The HTTP protocol isspecified elsewhere. This specification only

       describes the syntax ofHTTP URLs.

       An HTTP URL takes theform:

         http://<host>:<port>/<path>?<searchpart>

       where <host> and<port> are as described in Section 3.1. If :<port>

       is omitted, the portdefaults to 80.  No user name or passwordis

       allowed.  <path> is an HTTP selector, and<searchpart> is a query

       string. The <path>is optional, as is the <searchpart> and its

       preceding "?".If neither <path> nor <searchpart> is present, the "/"

       may also be omitted.

       Within the <path>and <searchpart> components, "/", ";", "?"are

       reserved.  The "/" character may be usedwithin HTTP to designate a

       hierarchical structure.

    3.4. GOPHER

       The Gopher URL scheme isused to designate Internet resources

       accessible using theGopher protocol.

       The base Gopher protocolis described in RFC 1436 and supports items

       and collections of items(directories). The Gopher+ protocol is a set

       of upward compatibleextensions to the base Gopher protocol and is

       described in [2]. Gopher+supports associating arbitrary sets of

       attributes and alternatedata representations with Gopher items.

       Gopher URLs accommodateboth Gopher and Gopher+ items and item

       attributes.

    3.4.1. Gopher URL syntax

       A Gopher URL takes theform:

         gopher://<host>:<port>/<gopher-path>

       where <gopher-path>is one of

          <gophertype><selector>

           <gophertype><selector>%09<search>

          <gophertype><selector>%09<search>%09<gopher+_string>

    Berners-Lee, Masinter & McCahill                                [Page 9]

    RFC 1738            UniformResource Locators (URL)        December1994

       If :<port> isomitted, the port defaults to 70. <gophertype> is a

       single-character field todenote the Gopher type of the resource to

       which the URL refers. Theentire <gopher-path> may also be empty, in

       which case the delimiting"/" is also optional and the <gophertype>

       defaults to"1".

       <selector> is theGopher selector string.  In the Gopherprotocol,

       Gopher selector stringsare a sequence of octets which may contain

       any octets except 09hexadecimal (US-ASCII HT or tab) 0A hexadecimal

       (US-ASCII character LF),and 0D (US-ASCII character CR).

       Gopher clients specifywhich item to retrieve by sending the Gopher

       selector string to aGopher server.

       Within the<gopher-path>, no characters are reserved.

       Note that some Gopher<selector> strings begin with a copy of the

       <gophertype>character, in which case that character will occur twice

       consecutively. The Gopherselector string may be an empty string;

       this is how Gopherclients refer to the top-level directory on a

       Gopher server.

    3.4.2 Specifying URLs for Gopher Search Engines

       If the URL refers to asearch to be submitted to a Gopher search

       engine, the selector isfollowed by an encoded tab (%09) and the

       search string. To submita search to a Gopher search engine, the

       Gopher client sends the<selector> string (after decoding), a tab,

       and the search string tothe Gopher server.

    3.4.3 URL syntax for Gopher+ items

       URLs for Gopher+ itemshave a second encoded tab (%09) and a Gopher+

       string. Note that in thiscase, the %09<search> string must be

       supplied, although the<search> element may be the empty string.

       The<gopher+_string> is used to represent information required for

       retrieval of the Gopher+item. Gopher+ items may have alternate

       views, arbitrary sets ofattributes, and may have electronic forms

       associated with them.

       To retrieve the dataassociated with a Gopher+ URL, a client will

       connect to the server andsend the Gopher selector, followed by a tab

       and the search string(which may be empty), followed by a tab and the

       Gopher+ commands.

    Berners-Lee, Masinter & McCahill                               [Page 10]

    RFC 1738            UniformResource Locators (URL)        December1994

    3.4.4 Default Gopher+ data representation

       When a Gopher serverreturns a directory listing to a client, the

       Gopher+ items are taggedwith either a "+" (denoting Gopher+ items)

       or a "?"(denoting Gopher+ items which have a +ASK form associated

       with them). A Gopher URLwith a Gopher+ string consisting of only a

       "+" refers tothe default view (data representation) of the item

       while a Gopher+ stringcontaining only a "?" refer to an item with a

       Gopher electronic formassociated with it.

    3.4.5 Gopher+ items with electronic forms

       Gopher+ items which havea +ASK associated with them (i.e. Gopher+

       items tagged with a"?") require the client to fetch the item's +ASK

       attribute to get the formdefinition, and then ask the user to fill

       out the form and returnthe user's responses along with the selector

       string to retrieve theitem.  Gopher+ clients know how to dothis but

       depend on the"?" tag in the Gopher+ item description to know when to

       handle this case. The"?" is used in the Gopher+ string to be

       consistent with Gopher+protocol's use of this symbol.

    3.4.6 Gopher+ item attribute collections

       To refer to the Gopher+attributes of an item, the Gopher URL's

       Gopher+ string consistsof "!" or "$". "!" refers to the all of a

       Gopher+ item'sattributes. "$" refers to all the item attributes for

       all items in a Gopherdirectory.

    3.4.7 Referring to specific Gopher+ attributes

       To refer to specificattributes, the URL's gopher+_string is

      "!<attribute_name>" or "$<attribute_name>".For example, to refer to

       the attribute containingthe abstract of an item, the gopher+_string

       would be"!+ABSTRACT".

       To refer to severalattributes, the gopher+_string consists of the

       attribute names separatedby coded spaces. For example,

      "!+ABSTRACT%20+SMELL" refers to the +ABSTRACT and +SMELLattributes

       of an item.

    3.4.8 URL syntax for Gopher+ alternate views

       Gopher+ allows foroptional alternate data representations (alternate

       views) of items. Toretrieve a Gopher+ alternate view, a Gopher+

       client sends theappropriate view and language identifier (found in

       the item's +VIEWattribute). To refer to a specific Gopher+ alternate

       view, the URL's Gopher+string would be in the form:

    Berners-Lee, Masinter & McCahill                               [Page 11]

    RFC 1738            UniformResource Locators (URL)        December1994

           +<view_name>%20<language_name>

       For example, a Gopher+string of "+application/postscript%20Es_ES"

       refers to the Spanishlanguage postscript alternate view of a Gopher+

       item.

    3.4.9 URL syntax for Gopher+ electronic forms

       The gopher+_string for aURL that refers to an item referenced by a

       Gopher+ electronic form(an ASK block) filled out with specific

       values is a coded versionof what the client sends to the server.

       The gopher+_string is ofthe form:

    +%091%0D%0A+-1%0D%0A<ask_item1_value>%0D%0A<ask_item2_value>%0D%0A.%0D%0A

       To retrieve this item,the Gopher client sends:

          <a_gopher_selector><tab>+<tab>1<cr><lf>

          +-1<cr><lf>

          <ask_item1_value><cr><lf>

          <ask_item2_value><cr><lf>

           .<cr><lf>

       to the Gopher server.

    3.5. MAILTO

       The mailto URL scheme isused to designate the Internet mailing

       address of an individualor service. No additional information other

       than an Internet mailingaddress is present or implied.

       A mailto URL takes theform:

           mailto:<rfc822-addr-spec>

       where <rfc822-addr-spec>is (the encoding of an) addr-spec, as

       specified in RFC 822 [6].Within mailto URLs, there are no reserved

       characters.

       Note that the percentsign ("%") is commonly used within RFC 822

       addresses and must beencoded.

       Unlike many URLs, themailto scheme does not represent a data object

       to be accessed directly;there is no sense in which it designates an

       object. It has adifferent use than the message/external-body type in

       MIME.

    Berners-Lee, Masinter & McCahill                               [Page 12]

    RFC 1738            UniformResource Locators (URL)        December1994

    3.6. NEWS

       The news URL scheme isused to refer to either news groups or

       individual articles ofUSENET news, as specified in RFC 1036.

       A news URL takes one oftwo forms:

        news:<newsgroup-name>

         news:<message-id>

       A <newsgroup-name>is a period-delimited hierarchical name, such as

      "comp.infosystems.www.misc". A <message-id> correspondsto the

       Message-ID of section2.1.5 of RFC 1036, without the enclosing "<"

       and ">"; ittakes the form <unique>@<full_domain_name>.  A message

       identifier may bedistinguished from a news group name by the

       presence of thecommercial at "@" character. No additional characters

       are reserved within thecomponents of a news URL.

       If <newsgroup-name>is "*" (as in <URL:news:*>), it is used to refer

       to "all availablenews groups".

       The news URLs are unusualin that by themselves, they do not contain

       sufficient information tolocate a single resource, but, rather, are

       location-independent.

    3.7. NNTP

       The nntp URL scheme is analternative method of referencing news

       articles, useful forspecifying news articles from NNTP servers (RFC

       977).

       A nntp URL take the form:

         nntp://<host>:<port>/<newsgroup-name>/<article-number>

       where <host> and<port> are as described in Section 3.1. If :<port>

       is omitted, the portdefaults to 119.

       The<newsgroup-name> is the name of the group, while the <article-

       number> is the numericid of the article within that newsgroup.

       Note that while nntp:URLs specify a unique location for the article

       resource, most NNTPservers currently on the Internet today are

       configured only to allowaccess from local clients, and thus nntp

       URLs do not designateglobally accessible resources. Thus, the news:

       form of URL is preferredas a way of identifying news articles.

    Berners-Lee, Masinter & McCahill                               [Page 13]

    RFC 1738            UniformResource Locators (URL)        December1994

    3.8. TELNET

       The Telnet URL scheme isused to designate interactive services that

       may be accessed by theTelnet protocol.

       A telnet URL takes theform:

          telnet://<user>:<password>@<host>:<port>/

       as specified in Section3.1. The final "/" character may be omitted.

       If :<port> isomitted, the port defaults to 23.  The:<password> can

       be omitted, as well asthe whole <user>:<password> part.

       This URL does notdesignate a data object, but rather an interactive

       service. Remoteinteractive services vary widely in the means by

       which they allow remotelogins; in practice, the <user> and

       <password> suppliedare advisory only: clients accessing a telnet URL

       merely advise the user ofthe suggested username and password.

    3.9.  WAIS

       The WAIS URL scheme isused to designate WAIS databases, searches, or

       individual documentsavailable from a WAIS database. WAIS is

       described in [7]. TheWAIS protocol is described in RFC 1625 [17];

       Although the WAISprotocol is based on Z39.50-1988, the WAIS URL

       scheme is not intendedfor use with arbitrary Z39.50 services.

       A WAIS URL takes one ofthe following forms:

        wais://<host>:<port>/<database>

        wais://<host>:<port>/<database>?<search>

        wais://<host>:<port>/<database>/<wtype>/<wpath>

       where <host> and<port> are as described in Section 3.1. If :<port>

       is omitted, the portdefaults to 210.  The first formdesignates a

       WAIS database that isavailable for searching. The second form

       designates a particularsearch.  <database> is the name ofthe WAIS

       database being queried.

       The third form designatesa particular document within a WAIS

       database to be retrieved.In this form <wtype> is the WAIS

       designation of the typeof the object. Many WAIS implementations

       require that a clientknow the "type" of an object prior to

       retrieval, the type beingreturned along with the internal object

       identifier in the searchresponse.  The <wtype> is includedin the

       URL in order to allow theclient interpreting the URL adequate

       information to actuallyretrieve the document.

    Berners-Lee, Masinter & McCahill                               [Page 14]

    RFC 1738            UniformResource Locators (URL)        December1994

       The <wpath> of aWAIS URL consists of the WAIS document-id, encoded

       as necessary using themethod described in Section 2.2. The WAIS

       document-id should betreated opaquely; it may only be decomposed by

       the server that issuedit.

    3.10 FILES

       The file URL scheme isused to designate files accessible on a

       particular host computer.This scheme, unlike most other URL schemes,

       does not designate aresource that is universally accessible over the

       Internet.

       A file URL takes theform:

          file://<host>/<path>

       where <host> is thefully qualified domain name of the system on

       which the <path> isaccessible, and <path> is a hierarchical

       directory path of theform <directory>/<directory>/.../<name>.

      For example, a VMS file

        DISK$USER:[MY.NOTES]NOTE123456.TXT

       might become

        <URL:file://vms.host.edu/disk$user/my/notes/note12345.txt>

       As a special case,<host> can be the string "localhost" or the empty

       string; this isinterpreted as `the machine from which the URL is

       being interpreted'.

       The file URL scheme isunusual in that it does not specify an

       Internet protocol oraccess method for such files; as such, its

       utility in networkprotocols between hosts is limited.

    3.11 PROSPERO

       The Prospero URL schemeis used to designate resources that are

       accessed via the ProsperoDirectory Service. The Prospero protocol is

       described elsewhere [14].

       A prospero URLs takes theform:

         prospero://<host>:<port>/<hsoname>;<field>=<value>

       where <host> and<port> are as described in Section 3.1. If :<port>

       is omitted, the portdefaults to 1525. No username or password is

    Berners-Lee, Masinter & McCahill                               [Page 15]

    RFC 1738            UniformResource Locators (URL)        December1994

       allowed.

       The <hsoname> isthe host-specific object name in the Prospero

       protocol, suitablyencoded.  This name is opaque andinterpreted by

       the Prospero server.  The semicolon ";" is reserved andmay not

       appear without quoting inthe <hsoname>.

       Prospero URLs areinterpreted by contacting a Prospero directory

       server on the specifiedhost and port to determine appropriate access

       methods for a resource,which might themselves be represented as

       different URLs. ExternalProspero links are represented as URLs of

       the underlying accessmethod and are not represented as Prospero

       URLs.

       Note that a slash"/" may appear in the <hsoname> without quoting and

       no significance may beassumed by the application.  Thoughslashes

       may indicate hierarchicalstructure on the server, such structure is

       not guaranteed. Note thatmany <hsoname>s begin with a slash, in

       which case the host orport will be followed by a double slash: the

       slash from the URLsyntax, followed by the initial slash from the

       <hsoname>. (E.g.,<URL:prospero://host.dom//pros/name> designates a

       <hsoname> of"/pros/name".)

       In addition, after the<hsoname>, optional fields and values

       associated with aProspero link may be specified as part of the URL.

       When present, eachfield/value pair is separated from each other and

       from the rest of the URLby a ";" (semicolon).  The nameof the field

       and its value areseparated by a "=" (equal sign). If present, these

       fields serve to identifythe target of the URL.  For example, the

       OBJECT-VERSION field canbe specified to identify a specific version

       of an object.

    4. REGISTRATION OF NEW SCHEMES

       A new scheme may beintroduced by defining a mapping onto a

       conforming URL syntax,using a new prefix. URLs for experimental

       schemes may be used bymutual agreement between parties. Scheme names

       starting with thecharacters "x-" are reserved for experimental

       purposes.

       The Internet AssignedNumbers Authority (IANA) will maintain a

       registry of URL schemes.Any submission of a new URL scheme must

       include a definition ofan algorithm for accessing of resources

       within that scheme andthe syntax for representing such a scheme.

       URL schemes must havedemonstrable utility and operability. One way

       to provide such ademonstration is via a gateway which provides

       objects in the new schemefor clients using an existing protocol. If

    Berners-Lee, Masinter & McCahill                               [Page 16]

    RFC 1738            UniformResource Locators (URL)        December1994

       the new scheme does notlocate resources that are data objects, the

       properties of names inthe new space must be clearly defined.

       New schemes should try tofollow the same syntactic conventions of

       existing schemes, whereappropriate.  It is likewise recommended

       that, where a protocolallows for retrieval by URL, that the client

       software have provisionfor being configured to use specific gateway

       locators for indirectaccess through new naming schemes.

       The following scheme havebeen proposed at various times, but this

       document does not definetheir syntax or use at this time. It is

       suggested that IANAreserve their scheme names for future definition:

       afs              Andrew File System global filenames.

       mid              Message identifiers forelectronic mail.

       cid              Content identifiers for MIME bodyparts.

       nfs              Network File System (NFS) filenames.

       tn3270           Interactive 3270 emulation sessions.

       mailserver       Access to data available from mailservers.

       z39.50           Access to ANSI Z39.50 services.

    5. BNF for specific URL schemes

       This is a BNF-likedescription of the Uniform Resource Locator

       syntax, using theconventions of RFC822, except that "|" is used to

       designate alternatives,and brackets [] are used around optional or

       repeated elements.Briefly, literals are quoted with "", optional

       elements are enclosed in[brackets], and elements may be preceded

       with <n>* todesignate n or more repetitions of the following

       element; n defaults to 0.

    ; The generic form of a URL is:

    genericurl     = scheme":" schemepart

    ; Specific predefined schemes are defined here; new schemes

    ; may be registered with IANA

    url            = httpurl |ftpurl | newsurl |

                     nntpurl |telneturl | gopherurl |

                     waisurl |mailtourl | fileurl |

                     prosperourl| otherurl

    ; new schemes follow the general syntax

    otherurl       = genericurl

    ; the scheme is in lower case; interpreters should use case-ignore

    scheme         = 1*[lowalpha | digit | "+" | "-" | "." ]

    Berners-Lee, Masinter & McCahill                               [Page 17]

    RFC 1738            UniformResource Locators (URL)        December1994

    schemepart     = *xchar |ip-schemepart

    ; URL schemeparts for ip based protocols:

    ip-schemepart  ="//" login [ "/" urlpath ]

    login          = [ user [":" password ] "@" ] hostport

    hostport       = host [":" port ]

    host           = hostname |hostnumber

    hostname       = *[domainlabel "." ] toplabel

    domainlabel    = alphadigit| alphadigit *[ alphadigit | "-" ] alphadigit

    toplabel       = alpha |alpha *[ alphadigit | "-" ] alphadigit

    alphadigit     = alpha |digit

    hostnumber     = digits"." digits "." digits "." digits

    port           = digits

    user           = *[ uchar |";" | "?" | "&" | "=" ]

    password       = *[ uchar |";" | "?" | "&" | "=" ]

    urlpath        = *xchar    ; depends on protocol see section 3.1

    ; The predefined schemes:

    ; FTP (see also RFC959)

    ftpurl         ="ftp://" login [ "/" fpath [ ";type=" ftptype ]]

    fpath          = fsegment *["/" fsegment ]

    fsegment       = *[ uchar |"?" | ":" | "@" | "&" |"=" ]

    ftptype        ="A" | "I" | "D" | "a" | "i" |"d"

    ; FILE

    fileurl        ="file://" [ host | "localhost" ] "/" fpath

    ; HTTP

    httpurl        ="http://" hostport [ "/" hpath [ "?" search ]]

    hpath          = hsegment *["/" hsegment ]

    hsegment       = *[ uchar |";" | ":" | "@" | "&" |"=" ]

    search         = *[ uchar |";" | ":" | "@" | "&" |"=" ]

    ; GOPHER (see also RFC1436)

    gopherurl      ="gopher://" hostport [ / [ gtype [ selector

                     ["%09" search [ "%09" gopher+_string ] ] ] ] ]

    gtype          = xchar

    selector       = *xchar

    gopher+_string = *xchar

    Berners-Lee, Masinter & McCahill                               [Page 18]

    RFC 1738            UniformResource Locators (URL)        December1994

    ; MAILTO (see also RFC822)

    mailtourl      ="mailto:" encoded822addr

    encoded822addr = 1*xchar               ; further defined in RFC822

    ; NEWS (see also RFC1036)

    newsurl        ="news:" grouppart

    grouppart      ="*" | group | article

    group          = alpha *[alpha | digit | "-" | "." | "+" | "_" ]

    article        = 1*[ uchar |";" | "/" | "?" | ":" |"&" | "=" ] "@" host

    ; NNTP (see also RFC977)

    nntpurl        ="nntp://" hostport "/" group [ "/" digits ]

    ; TELNET

    telneturl      = "telnet://"login [ "/" ]

    ; WAIS (see also RFC1625)

    waisurl        =waisdatabase | waisindex | waisdoc

    waisdatabase   ="wais://" hostport "/" database

    waisindex      ="wais://" hostport "/" database "?" search

    waisdoc        ="wais://" hostport "/" database "/" wtype"/" wpath

    database       = *uchar

    wtype          = *uchar

    wpath          = *uchar

    ; PROSPERO

    prosperourl    ="prospero://" hostport "/" ppath *[ fieldspec ]

    ppath          = psegment *["/" psegment ]

    psegment       = *[ uchar |"?" | ":" | "@" | "&" |"=" ]

    fieldspec      =";" fieldname "=" fieldvalue

    fieldname      = *[ uchar |"?" | ":" | "@" | "&" ]

    fieldvalue     = *[ uchar |"?" | ":" | "@" | "&" ]

    ; Miscellaneous definitions

    lowalpha       ="a" | "b" | "c" | "d" | "e" |"f" | "g" | "h" |

                    "i" | "j" | "k" | "l" |"m" | "n" | "o" | "p" |

                    "q" | "r" | "s" | "t" |"u" | "v" | "w" | "x" |

                    "y" | "z"

    hialpha        ="A" | "B" | "C" | "D" | "E" |"F" | "G" | "H" | "I" |

                    "J" | "K" | "L" | "M" |"N" | "O" | "P" | "Q" | "R" |

                    "S" | "T" | "U" | "V" |"W" | "X" | "Y" | "Z"

    Berners-Lee, Masinter & McCahill                               [Page 19]

    RFC 1738            UniformResource Locators (URL)        December1994

    alpha          = lowalpha |hialpha

    digit          ="0" | "1" | "2" | "3" | "4" |"5" | "6" | "7" |

                    "8" | "9"

    safe           ="$" | "-" | "_" | "." | "+"

    extra          ="!" | "*" | "'" | "(" | ")" |","

    national       ="{" | "}" | "|" | "\" | "^" |"~" | "[" | "]" | "`"

    punctuation    ="<" | ">" | "#" | "%" |<">

    reserved       =";" | "/" | "?" | ":" | "@" |"&" | "="

    hex            = digit |"A" | "B" | "C" | "D" | "E" |"F" |

                    "a" | "b" | "c" | "d" |"e" | "f"

    escape         ="%" hex hex

    unreserved     = alpha |digit | safe | extra

    uchar          = unreserved| escape

    xchar          = unreserved| reserved | escape

    digits         = 1*digit

    6. Security Considerations

       The URL scheme does notin itself pose a security threat. Users

       should beware that thereis no general guarantee that a URL which at

       one time points to agiven object continues to do so, and does not

       even at some later timepoint to a different object due to the

       movement of objects onservers.

       A URL-related securitythreat is that it is sometimes possible to

       construct a URL such thatan attempt to perform a harmless idempotent

       operation such as theretrieval of the object will in fact cause a

       possibly damaging remoteoperation to occur.  The unsafe URL is

       typically constructed byspecifying a port number other than that

       reserved for the networkprotocol in question.  The client

       unwittingly contacts aserver which is in fact running a different

       protocol.  The content of the URL contains instructionswhich when

       interpreted according tothis other protocol cause an unexpected

       operation. An example hasbeen the use of gopher URLs to cause a rude

       message to be sent via aSMTP server.  Caution should be used when

       using any URL whichspecifies a port number other than the default

       for the protocol,especially when it is a number within the reserved

       space.

       Care should be taken whenURLs contain embedded encoded delimiters

       for a given protocol (forexample, CR and LF characters for telnet

       protocols) that these arenot unencoded before transmission.  This

       would violate theprotocol but could be used to simulate an extra

       operation or parameter,again causing an unexpected and possible

       harmful remote operationto be performed.

    Berners-Lee, Masinter & McCahill                               [Page 20]

    RFC 1738            UniformResource Locators (URL)        December 1994

       The use of URLscontaining passwords that should be secret is clearly

       unwise.

    7. Acknowledgements

       This paper builds on thebasic WWW design (RFC 1630) and much

       discussion of theseissues by many people on the network. The

       discussion wasparticularly stimulated by articles by Clifford Lynch,

       Brewster Kahle [10] andWengyik Yeong [18]. Contributions from John

       Curran, Clifford Neuman,Ed Vielmetti and later the IETF URL BOF and

       URI working group wereincorporated.

       Most recently, carefulreadings and comments by Dan Connolly, Ned

       Freed, Roy Fielding,Guido van Rossum, Michael Dolan, Bert Bos, John

       Kunze, Olle Jarnefors,Peter Svanberg and many others have helped

       refine this RFC.

    Berners-Lee, Masinter & McCahill                               [Page 21]

    RFC 1738            UniformResource Locators (URL)        December1994

    APPENDIX: Recommendations for URLs in Context

       URIs, including URLs, areintended to be transmitted through

       protocols which provide acontext for their interpretation.

       In some cases, it will benecessary to distinguish URLs from other

       possible data structuresin a syntactic structure. In this case, is

       recommended that URLs bepreceeded with a prefix consisting of the

       characters"URL:". For example, this prefix may be used to

       distinguish URLs fromother kinds of URIs.

       In addition, there aremany occasions when URLs are included in other

       kinds of text; examplesinclude electronic mail, USENET news

       messages, or printed onpaper. In such cases, it is convenient to

       have a separate syntacticwrapper that delimits the URL and separates

       it from the rest of thetext, and in particular from punctuation

       marks that might bemistaken for part of the URL. For this purpose,

       is recommended that anglebrackets ("<" and ">"), along with the

       prefix "URL:",be used to delimit the boundaries of the URL. This

       wrapper does not formpart of the URL and should not be used in

       contexts in whichdelimiters are already specified.

       In the case where afragment/anchor identifier is associated with a

       URL (following a"#"), the identifier would be placed within the

       brackets as well.

       In some cases, extrawhitespace (spaces, linebreaks, tabs, etc.) may

       need to be added to breaklong URLs across lines.  The whitespace

       should be ignored whenextracting the URL.

       No whitespace should beintroduced after a hyphen ("-") character.

       Because some typesettersand printers may (erroneously) introduce a

       hyphen at the end of linewhen breaking a line, the interpreter of a

       URL containing a linebreak immediately after a hyphen should ignore

       all unencoded whitespacearound the line break, and should be aware

       that the hyphen may ormay not actually be part of the URL.

       Examples:

          Yes, Jim, I found itunder <URL:ftp://info.cern.ch/pub/www/doc;

          type=d> but you canprobably pick it up from <URL:ftp://ds.in

         ternic.net/rfc>.  Note thewarning in <URL:http://ds.internic.

         net/instructions/overview.html#WARNING>.

    Berners-Lee, Masinter & McCahill                               [Page 22]

    RFC 1738            UniformResource Locators (URL)        December1994

    References

       [1] Anklesaria, F.,McCahill, M., Lindner, P., Johnson, D.,

           Torrey, D., and B.Alberti, "The Internet Gopher Protocol

           (a distributeddocument search and retrieval protocol)",

           RFC 1436, Universityof Minnesota, March 1993.

          <URL:ftp://ds.internic.net/rfc/rfc1436.txt;type=a>

       [2] Anklesaria, F.,Lindner, P., McCahill, M., Torrey, D.,

           Johnson, D., and B.Alberti, "Gopher+: Upward compatible

           enhancements to theInternet Gopher protocol",

           University ofMinnesota, July 1993.

          <URL:ftp://boombox.micro.umn.edu/pub/gopher/gopher_protocol

          /Gopher+/Gopher+.txt>

       [3] Berners-Lee, T.,"Universal Resource Identifiers in WWW: A

           Unifying Syntax forthe Expression of Names and Addresses of

           Objects on theNetwork as used in the World-Wide Web", RFC

           1630, CERN, June1994.

          <URL:ftp://ds.internic.net/rfc/rfc1630.txt>

       [4] Berners-Lee, T.,"Hypertext Transfer Protocol (HTTP)",

           CERN, November 1993.

          <URL:ftp://info.cern.ch/pub/www/doc/http-spec.txt.Z>

       [5] Braden, R., Editor,"Requirements for Internet Hosts --

           Application andSupport", STD 3, RFC 1123, IETF, October 1989.

          <URL:ftp://ds.internic.net/rfc/rfc1123.txt>

       [6] Crocker, D."Standard for the Format of ARPA Internet Text

           Messages", STD11, RFC 822, UDEL, April 1982.

          <URL:ftp://ds.internic.net/rfc/rfc822.txt>

       [7] Davis, F., Kahle, B., Morris, H., Salem,J., Shen, T., Wang, R.,

           Sui, J., and M.Grinbaum, "WAIS Interface Protocol Prototype

           FunctionalSpecification", (v1.5), Thinking Machines

           Corporation, April1990.

          <URL:ftp://quake.think.com/pub/wais/doc/protspec.txt>

       [8] Horton, M. and R.Adams, "Standard For Interchange of USENET

           Messages", RFC1036, AT&T Bell Laboratories, Center for Seismic

           Studies, December1987.

          <URL:ftp://ds.internic.net/rfc/rfc1036.txt>

       [9] Huitema, C.,"Naming: Strategies and Techniques", Computer

           Networks and ISDNSystems 23 (1991) 107-110.

    Berners-Lee, Masinter & McCahill                               [Page 23]

    RFC 1738            UniformResource Locators (URL)        December 1994

      [10] Kahle, B.,"Document Identifiers, or International Standard

           Book Numbers for theElectronic Age", 1991.

          <URL:ftp://quake.think.com/pub/wais/doc/doc-ids.txt>

      [11] Kantor, B. and P.Lapsley, "Network News Transfer Protocol:

           A Proposed Standardfor the Stream-Based Transmission of News",

           RFC 977, UC San Diego& UC Berkeley, February 1986.

          <URL:ftp://ds.internic.net/rfc/rfc977.txt>

      [12] Kunze, J.,"Functional Requirements for Internet Resource

           Locators", Workin Progress, December 1994.

          <URL:ftp://ds.internic.net/internet-drafts

          /draft-ietf-uri-irl-fun-req-02.txt>

      [13] Mockapetris, P.,"Domain Names - Concepts and Facilities",

           STD 13, RFC 1034,USC/Information Sciences Institute,

           November 1987.

          <URL:ftp://ds.internic.net/rfc/rfc1034.txt>

      [14] Neuman, B., and S.Augart, "The Prospero Protocol",

           USC/InformationSciences Institute, June 1993.

          <URL:ftp://prospero.isi.edu/pub/prospero/doc

          /prospero-protocol.PS.Z>

      [15] Postel, J. and J.Reynolds, "File Transfer Protocol (FTP)",

           STD 9, RFC 959,USC/Information Sciences Institute,

           October 1985.

           <URL:ftp://ds.internic.net/rfc/rfc959.txt>

      [16] Sollins, K. and L.Masinter, "Functional Requirements for

           Uniform ResourceNames", RFC 1737, MIT/LCS, Xerox Corporation,

           December 1994.

          <URL:ftp://ds.internic.net/rfc/rfc1737.txt>

      [17] St. Pierre, M,Fullton, J., Gamiel, K., Goldman, J., Kahle, B.,

           Kunze, J., Morris,H., and F. Schiettecatte, "WAIS over

           Z39.50-1988",RFC 1625, WAIS, Inc., CNIDR, Thinking Machines

           Corp., UC Berkeley,FS Consulting, June 1994.

           <URL:ftp://ds.internic.net/rfc/rfc1625.txt>

      [18] Yeong, W."Towards Networked Information Retrieval", Technical

           report 91-06-25-01,Performance Systems International, Inc.

          <URL:ftp://uu.psi.com/wp/nir.txt>, June 1991.

      [19] Yeong, W.,"Representing Public Archives in the Directory",

           Work in Progress,November 1991.

    Berners-Lee, Masinter & McCahill                               [Page 24]

    RFC 1738            UniformResource Locators (URL)        December1994

      [20] "Coded CharacterSet -- 7-bit American Standard Code for

           InformationInterchange", ANSI X3.4-1986.

    Editors' Addresses

    Tim Berners-Lee

    World-Wide Web project

    CERN,

    1211 Geneva 23,

    Switzerland

    Phone: +41 (22)767 3755

    Fax: +41 (22)767 7155

    EMail: timbl@info.cern.ch

    Larry Masinter

    Xerox PARC

    3333 Coyote Hill Road

    Palo Alto, CA 94034

    Phone: (415) 812-4365

    Fax: (415) 812-4333

    EMail: masinter@parc.xerox.com

    Mark McCahill

    Computer and Information Services,

    University of Minnesota

    Room 152 Shepherd Labs

    100 Union Street SE

    Minneapolis, MN 55455

    Phone: (612) 625 1300

    EMail: mpm@boombox.micro.umn.edu

    Berners-Lee, Masinter & McCahill                               [Page 25]

  • 相关阅读:
    mysql数据库 详解
    0810 smarty
    抽象类
    Nginx 负载均衡策略
    Nginx 负载均衡配置和策略
    内置Web Server
    PHP运行方式
    MySQL create table 语法
    MySQL 索引的使用
    MySQL的 explain 解析
  • 原文地址:https://www.cnblogs.com/newdefence/p/1869262.html
Copyright © 2011-2022 走看看