zoukankan      html  css  js  c++  java
  • Html Agility Pack

    Html Agility Pack - API
    Parser
    Selectors
    Manipulation
    Traversing
    Writer
    Utilities
    Attributes

    HTML Parser

    HTML Parser allow you to parse HTML and return an HtmlDocument.

    Html Parser
    Name Description
    From File Loads an HTML document from a file.
    From String Loads the HTML document from the specified string.
    From Web Gets an HTML document from an Internet resource.
    From Browser Gets an HTML document from a WebBrowser.

    Load Html From String

    HtmlDocument.LoadHtml method loads the HTML document from the specified string.

    Example

    The following example loads an Html from the specified string.

    var html = @"<!DOCTYPE html>
    <html>
    <body>
    <h1>This is <b>bold</b> heading</h1>
    <p>This is <u>underlined</u> paragraph</p>
    <h2>This is <i>italic</i> heading</h2>
    </body>
    </html> ";

    var htmlDoc = new HtmlDocument();
    htmlDoc.LoadHtml(html);

    var htmlBody = htmlDoc.DocumentNode.SelectSingleNode("//body");

    Console.WriteLine(htmlBody.OuterHtml);



    HTML Selectors

    Selectors allow you to select HTML node from HtmlDocument.

    Methods
    Name Description
    SelectNodes() Selects a list of nodes matching the XPath expression.
    SelectSingleNode(String) Selects the first XmlNode that matches the XPath expression.

    HTML SelectSingleNode

    SelectSingleNode Method

    Selects first HtmlNode matching the HtmlAgilityPack.HtmlNode.XPath expression.

    Parameters:

    xpath: The XPath expression. May not be null.

    Returns:

    The first HtmlAgilityPack.HtmlNode that matches the XPath query or a null reference if no matching node was found.

    Examples

    The following example selects the first node matching the XPath expression using SelectNodes method.

    var htmlDoc = new HtmlDocument();
    htmlDoc.LoadHtml(html);

    string name = htmlDoc.DocumentNode
    .SelectSingleNode("//td/input")
    .Attributes["value"].Value;

    ///如果用child.SelectSingleNode("//*[@class="titlelnk"]").InnerText这样的方式查询,是永远以整个document为基准来查询,
    ///这点就不好,理应以当前child节点的html为基准才对。

    Write(sw, String.Format("推荐:{0}", hn.SelectSingleNode("//*[@class="diggnum"]").InnerText));
    Write(sw, String.Format("标题:{0}", hn.SelectSingleNode("//*[@class="titlelnk"]").InnerText));
    Write(sw, String.Format("介绍:{0}", hn.SelectSingleNode("//*[@class="post_item_summary"]").InnerText));
    Write(sw, String.Format("信息:{0}", hn.SelectSingleNode("//*[@class="post_item_foot"]").InnerText));

    HTML Manipulation

    Traversing allow you to traverse through HTML node.

    Properties
    Name Description
    InnerHtml Gets or Sets the HTML between the start and end tags of the object.
    InnerText Gets the text between the start and end tags of the object.
    OuterHtml Gets the object and its content in HTML.
    ParentNode Gets the parent of this node (for nodes that can have parents).
    Methods
    Name Description
    AppendChild() Adds the specified node to the end of the list of children of this node.
    AppendChildren() Adds the specified node to the end of the list of children of this node.
    Clone() Creates a duplicate of the node
    CloneNode(Boolean) Creates a duplicate of the node.
    CloneNode(String) Creates a duplicate of the node and changes its name at the same time.
    CloneNode(String, Boolean) Creates a duplicate of the node and changes its name at the same time.
    CopyFrom(HtmlNode) Creates a duplicate of the node and the subtree under it.
    CopyFrom(HtmlNode, Boolean) Creates a duplicate of the node.
    CreateNode() Creates an HTML node from a string representing literal HTML.
    InsertAfter() Inserts the specified node immediately after the specified reference node.
    InsertBefore Inserts the specified node immediately before the specified reference node.
    PrependChild Adds the specified node to the beginning of the list of children of this node.
    PrependChildren Adds the specified node list to the beginning of the list of children of this node.
    Remove Removes node from parent collection
    RemoveAll Removes all the children and/or attributes of the current node.
    RemoveAllChildren Removes all the children of the current node.
    RemoveChild(HtmlNode) Removes the specified child node.
    RemoveChild(HtmlNode, Boolean) Removes the specified child node.
    ReplaceChild() Replaces the child node oldChild with newChild node.


    HTML Traversing

    Traversing allow you to traverse through HTML node.

    Properties
    Name Description
    ChildNodes Gets all the children of the node.
    FirstChild Gets the first child of the node.
    LastChild Gets the last child of the node.
    NextSibling Gets the HTML node immediately following this element.
    ParentNode Gets the parent of this node (for nodes that can have parents).
    Methods
    Name Description
    Ancestors() Gets all the ancestor of the node.
    Ancestors(String) Gets ancestors with matching name.
    AncestorsAndSelf() Gets all anscestor nodes and the current node.
    AncestorsAndSelf(String) Gets all anscestor nodes and the current node with matching name.
    DescendantNodes Gets all Descendant nodes for this node and each of child nodes
    DescendantNodesAndSelf Returns a collection of all descendant nodes of this element, in document order
    Descendants() Gets all Descendant nodes in enumerated list
    Descendants(String) Get all descendant nodes with matching name
    DescendantsAndSelf() Returns a collection of all descendant nodes of this element, in document order
    DescendantsAndSelf(String) Gets all descendant nodes including this node
    Element Gets first generation child node matching name
    Elements Gets matching first generation child nodes matching name

    HTML Writer

    Save HtmlDocument && Write HtmlNode

    HtmlDocument - Methods
    Name Description
    Save(Stream) Saves the HTML document to the specified stream.
    Save(StreamWriter) Saves the HTML document to the specified StreamWriter.
    Save(TextWriter) Saves the HTML document to the specified TextWriter.
    Save(String) Saves the mixed document to the specified file.
    Save(XmlWriter) Saves the HTML document to the specified XmlWriter.
    Save(Stream, Encoding) Saves the HTML document to the specified stream.
    Save(String, Encoding) Saves the mixed document to the specified file.
    HtmlNode - Methods
    Name Description
    WriteContentTo() Saves all the children of the node to a string.
    WriteContentTo(TextWriter) Saves all the children of the node to the specified TextWriter.
    WriteTo() Saves the current node to a string.
    WriteTo(TextWriter) Saves the current node to the specified TextWriter.
    WriteTo(XmlWriter) Saves the current node to the specified XmlWriter.


    HTML Utilities

    HtmlDocument Utilities

    HtmlDocument Methods
    Name Description
    DetectEncoding(Stream) Detects the encoding of an HTML stream.
    DetectEncoding(TextReader) Detects the encoding of an HTML text provided on a TextReader.
    DetectEncoding(String) Detects the encoding of an HTML file.
    DetectEncodingAndLoad(String) Detects the encoding of an HTML document from a file first, and then loads the file.
    DetectEncodingAndLoad(String, Boolean) Detects the encoding of an HTML document from a file first, and then loads the file.


    HTML Attributes

    Traversing allow you to traverse through HTML node.

    Methods
    Name Description
    Add(HtmlAttribute) Adds supplied item to collection
    Add(String, String) Adds a new attribute to the collection with the given values
    Append(String) Creates and inserts a new attribute as the last attribute in the collection.
    Append(HtmlAttribute) Inserts the specified attribute as the last attribute in the collection.
    Append(String, string) Creates and inserts a new attribute as the last attribute in the collection.
    Remove() Removes all attributes from the collection
    Remove(String) Removes an attribute from the list, using its name. If there are more than one attributes with this name, they will all be removed.
    Remove(HtmlAttribute) Removes a given attribute from the list.
    RemoveAll() Remove all attributes in the list.
    RemoveAt() Removes the attribute at the specified index.
    SetAttributeValue() Helper method to set the value of an attribute of this node. If the attribute is not found, it will be created automatically.

  • 相关阅读:
    Python获取会议部分的信息内容(不断完善中)
    TensorFlow学习笔记(UTF-8 问题解决 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte)
    Tensorflow学习笔记(对MNIST经典例程的)的代码注释与理解
    virtualenvwrapper
    最小二乘法与梯度下降的区别
    Git
    Jupyter
    Linux
    Iterm2
    Homebrew
  • 原文地址:https://www.cnblogs.com/micro-chen/p/8085024.html
Copyright © 2011-2022 走看看