What is XML
XML is not itself a language, it is a set of rules to build a markup language for containing and managing information.
-
XML does not have any pre-defined markup.
-
XML allows you to define customized markup(element,attribute) by yourself.
<?xml version="1.0"?>
<time-o-gram pri="important">
<to>Sarah</to>
<subject>Reminder</subject>
<message>Don't forget to recharge K-9
<emphasis>twice a day</emphasis>.
Also, I think we should have his
bearings checked out. See you soon
(or late). I have a date with
some <villain>Daleks</villain>...
</message>
<from>The Doctor</from>
</time-o-gram>
-
XML set up strict rules to keep your customized markup language well-formed and valid.
XML markup language components
As a markup language, XML language consists the same basic components soma are same as HTML some are its own.
-
element----the building blocks of XML, containers, inline, empty
the different is: you can define customized name of the element; white-space, tab, new line will have effect
-
attribute----convey more info about element, unique index, describe property
an element can has as many numbers of attribute
<kiosk music="bagpipes" color="red" id="page-81527">
Attribute value must be quoted in single/double quotes.
<choice test='msg="hi"'/>
An attribute can have several value
<team persons="sue joe jane">
<team person1="sue" person2="joe" person3="jane">
<team person="sue" person="joe" person="jane">
Reserved attribute: xml:lang, xml:space, xml:link, xml:attribute
-
namespace
originally use to define elements come from different document type. To avoid name crash.
More generally, namespace help XML processer sort out different groups of elements for different treatment. For example, transformation language XSLT relies on namespaces to distinguish between XML objects which are data, and which are instructions for processing the data. The instructional elements and attributes have an xsl: namespace prefix. Anything without a namespace prefix is treated as data in the transformation process.
Declared before you can use it, generally it is declared in form of an attribute inside the root element.
<?xml version="1.0"?>
<myns:journal xmlns:myns="http://www.psycholabs.org/mynamespace/">
<myns:experiment>
<myns:date>March 4, 2001</myns:date>
<myns:subject>Effects of Caffeine on Psychokinetic
Ability</myns:subject>
<myns:abstract>The experiment consists of a subject, a can of
caffeinated soda, and a goldfish tank. The ability to make a
goldfish turn in a circle through the power of a human's mental
control is given by the well-known equation:
<eq:formula xmlns:eq="http://www.mathstuff.org/">
<eq:variable>P</eq:variable> =
<eq:variable>m</eq:variable>
<eq:variable>M</eq:variable> /
<eq:variable>d</eq:variable>
</eq:formula>
where P is the probability it will turn in a given time interval,
m is the mental acuity of the fish, M is the mental acuity of
the subject, and d is the distance between
fish and subject.</myns:abstract>
...
</myns:experiment>
</myns:journal>
-
entities
a. character entity:
Name Value
& &
' '
> >
< <
" "
b. other entity
<?xml version="1.0"?>
<!DOCTYPE message SYSTEM "/xmlstuff/dtds/message.dtd"
[
<!ENTITY client "Mr. Rufus Xavier Sasperilla">
<!ENTITY agent "Ms. Sally Tashuns">
<!ENTITY phone "<number>617-555-1299</number>">
]>
<message>
<opening>Dear &client;</opening>
<body>We have an exciting opportunity for you! A set of
ocean-front cliff dwellings in Piñata, Mexico have been
renovated as time-share vacation homes. They're going fast! To
reserve a place for your holiday, call &agent; at ☎.
Hurry, &client;. Time is running out!</body>
</message>
-
misc markup
comments: <!-- -->
Bad: <!------------------------------------------------------------>
Bad: <!-- -- Don't do this! -- -->
CDATA sections: <![CDATA[]]>
<![CDATA[if (&x < &y)]]>
Processing instructions: <!name data ?>
<?xyz stop: the presses?>
-
XML prologe
XML declaration: version, encoding, standalone
<?xml version='1.0' encoding='US-ASCII' standalone='yes'?>
Document type decplaration.
-
tree view
according to the element relationship, XML treat its doc as a document tree, each element/attribute/plain text is called a node, the top element is called root node, the bottom element are called leaves, parent, child, ancestor, siblings.
well-formed XML
start with xml declaration----standalone=”yes”, obey the rules.
-
each element must have start and end tag.
<list>
<listitem>soupcan</listitem>
<listitem>alligator</listitem>
<listitem>tree</listitem>
</list>
<list>
<listitem>soupcan
<listitem>alligator
<listitem>tree
</list>
-
empty element must have a slash before the end bracket.
<graphic filename="icon.png"/>
<graphic filename="icon.png">
-
all attribute values must be in single or double quotes.
<figure filename="icon.png"/>
<figure filename=icon.png/>
-
element should not be overlapped.
<a>A good <b>nesting</b>
example.</a>
<a>This is <b>a poor</a>
nesting scheme.</b>
-
isolated markup characters may not appear in parser content.
<equation>5 < 2</equation>
<equation>5 < 2</equation>
-
element names must start only with letters and underscores, and may contain only letters, numbers, hyphens, periods, and underscores, colons are allowed for namespaces.
<example-one>
<_example2>
<Example.Three>
<bad*characters>
<illegal space>
<99number-start>
Valid XML
Well-formed and validated by the Schema.
standalone=”no”
good habit for designing XML
A good markup language has a thoughtful design, makes good use of containers and attributes, names objects clearly, and has a logical hierarchical structure.
Element:
names should be representations of the element's purpose in the document and should be readable by humans as well as machines. Follow the convention of alllowercase letters and avoid alternating cases
The position of an element inside another element is important. The order of elements is always preserved
Attribute:
Use the attribute sparingly, for it will clutter up the markup, attribute is to convey specific info for element, they should not hole content.
Use an element when:
• The content is more than a few words long. Some XML parsers may have an upper limit to how many characters an attribute can contain, and long attribute values are hard to read.
• Order matters. Attribute order in an element is ignored, but the order of elements is significant.
• The information is part of the content of the document, not just a parameter to adjust the behavior of the element.
Use an attribute when:
• The information modifies the element in a subtle way that would affect processing, but is not part of the content. For example, you may want to specify a particular kind of bullet for a bulleted list: <bulletlist bullettype="filledcircle">
• You want to restrict the value. Using a DTD, you can ensure that an attribute is a member of a set of predefined values.
• The information is a unique identifier or a reference to an identifier in another element. XML provides special mechanisms for testing identifiers in attributes to ensure that links are not broken. See Section
Advantage of XML
-
application-specific markup language
Make up your own markup language to express your information in the best way possible. Or, if you like, you can use an existing set of tags that someone else has made.
-
unambiguous structure
XML takes a hard line when it comes to structure. A document should be marked up in such a way that there are no two ways to interpret the names, order, and hierarchy of the elements.
In addition to the basic syntax check, you can create your own rules for how a document should look.
-
presentation stored elsewhere
-
keep it simple
-
max error checking