HTML Agility Pack
Html Agility pack is a free and open source library that can be used to simply read and write HTML documents.
Html Agility pack constructs a Document Object Model (DOM) view of the HTML document being parsed.
Developers can easily pass through different nodes to its children and vice versa in the HTML
It also returns the specific path of node through XPath expressions.
It contains properties for traversing the DOM including :
-
ParentNode,
-
ChildNodes,
-
NextSibling,
-
PreviousSibling ,
It also contains properties that define information about the node it self:
- Name - gets or sets the node's name. For HTML elements this property returns (or assigns) the name of the tag - "body" for the tag, "p" for a <p> tag, and so on.
Attributes - returns the collection of attributes for this element, if any.
-
InnerHtml - gets or sets the HTML content within the node.
-
InnerText - returns the text within the node.
-
NodeType - indicates the type of the node. Can be Document, Element, Comment, or Text.
We can get the list of all nodes in the doc. e.g. if we need all label nodes in the doc
var labelNodes= document.DocumentNode.SelectNodes("//label");
Now labelNodes will contain list of labels in the HTML doc.
We can iterate through all nodes using for each loop.
if (labelNodes != null)
{
foreach (var node in labelNodes)
{
if (node.Attributes["name"] != null && node.Attributes["content"] != null)
{
... output node.Attributes["name"].Value and node.Attributes["content"].Value ...
}
}
}
Best part of using this as a HTML parser is , we dont need to mess up with regular expressions or any tring operations.
0 Comment(s)