Join the social network of Tech Nerds, increase skill rank, get work, manage projects...
 
  • HTML Parsing

    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 0
    • 161
    Comment on it

    HTML Agility Pack

    Html Agility pack is a free and open source library that can be used to simply read and write HTML documents. Html Agility pack constructs a Document Object Model (DOM) view of the HTML document being parsed. Developers can easily pass through different nodes to its children and vice versa in the HTML It also returns the specific path of node through XPath expressions.

    It contains properties for traversing the DOM including :

    1. ParentNode,
    2. ChildNodes,
    3. NextSibling,
    4. PreviousSibling ,

    It also contains properties that define information about the node it self:

    1. Name - gets or sets the node's name. For HTML elements this property returns (or assigns) the name of the tag - "body" for the tag, "p" for a <p> tag, and so on.
    2. Attributes - returns the collection of attributes for this element, if any.

    3. InnerHtml - gets or sets the HTML content within the node.
    4. InnerText - returns the text within the node.
    5. NodeType - indicates the type of the node. Can be Document, Element, Comment, or Text.
    6. We can get the list of all nodes in the doc. e.g. if we need all label nodes in the doc

      var labelNodes= document.DocumentNode.SelectNodes("//label");
      

      Now labelNodes will contain list of labels in the HTML doc. We can iterate through all nodes using for each loop.

      if (labelNodes != null)
      {
         foreach (var node in labelNodes)
         {
            if (node.Attributes["name"] != null && node.Attributes["content"] != null)
            {
               ... output node.Attributes["name"].Value and node.Attributes["content"].Value ...
            }
         }
      }
      

      Best part of using this as a HTML parser is , we dont need to mess up with regular expressions or any tring operations.

 0 Comment(s)

Sign In
                           OR                           
                           OR                           
Register

Sign up using

                           OR                           
Forgot Password
Fill out the form below and instructions to reset your password will be emailed to you:
Reset Password
Fill out the form below and reset your password: