What is HTML DOM? – C# Parsing

Aspose.HTML for .NET is a .NET library that allows you to access and manipulate the HTML DOM in C# and other .NET languages. It provides classes and methods that enable you to load and parse HTML documents, navigate the DOM tree, and access and modify document elements, attributes, and content.

Please visit the Editing an HTML Document article that gives you basic information on how to read or modify the Document Object Model (DOM). You’ll explore how to create an HTML Element and how to work with it using Aspose.HTML for .NET API.

Document Object Model

The Document Object Model, or DOM for short, is a standard cross-platform programming API that helps programmers access and modify parts of a document. DOM defines the structure of a document as a tree with a hierarchy of nodes, where each node represents a part of the document, such as an element, class, tag, attribute, or text. For example, each piece, such as an image or piece of text, is called a “node.” A DOM tree is how or with what structure a document is represented in memory. In other words, the Document Object Model creates a logical document structure and defines objects, properties, events, and methods to access and modify them.

HTML DOM

The HTML DOM is an API for representing an HTML document that allows you to access and manipulate the content of an HTML document. It provides a tree structure of the document, where each element is represented as a tree node. Each branch of the tree ends with a node, and each node contains objects. The HTML DOM is implemented as a standard by the World Wide Web Consortium (W3C) and is supported by all modern web browsers. It provides a consistent and standardized way to access and manipulate HTML elements, making it an effective tool for creating dynamic and interactive web pages.

The HTML DOM is a document model loaded in the browser and representing the document as a node tree, where each node represents part of the document, such as an element, text string, or comment. We describe the elements in the tree in the same way as would a family tree – there are ancestors, descendants, parents, and children. For example, an HTML document with the following structure (fig. 1) will be represented by a DOM tree with a document object at the top, child nodes for the <html> element, a child node for the <head> element, and so on.

1<html>
2    <head>
3        <title>HTML document tree</title>
4    </head>
5    <body>
6        <h1>HTML DOM</h1>
7        <p>HTML DOM is a programming interface for HTML documents.</p>
8    </body>
9</html>

Text “HTML document tree”

Why DOM is required?

Let’s point out a few aspects of why DOM is needed:

Thus, The HTML DOM is a standard for how to get, change, add, or delete HTML elements. Moreover, the DOM allows web pages to be dynamic and interactive while making it possible for search engines and accessibility tools to understand and interact with them.

Accessing HTML DOM using C#

HTML DOM defines HTML elements as objects, providing a set of properties and methods that you can use to access and manage them. Each element in an HTML document is represented by a node in the DOM tree, and each node has its own set of properties and methods. As an object-oriented representation of a web page, it can be modified using Aspose.HTML C# library.

How HTML DOM defines HTML elements as objects?

Aspose.HTML for .NET provides a set of classes and methods that allow you to access and manipulate the HTML DOM in C#. You can use the HTMLDocument class to load and parse an HTML document. For example, you can use the following code to load an HTML file and access the <body> element of the document:

1using Aspose.Html;
2...
3
4    using var document = new HTMLDocument(documentPath);
5    var body = document.Body;

DOM properties

Let’s look at a C# example of how to use the HTMLDocument class to access the DOM and modify the content of an HTML file. In the following C# example, the document.Body.InnerHTML property is used to access the <body> element. It represents the content of the document’s <body> element, and you can use the InnerHtml property, for example, to get or set the element’s inner HTML.

 1using Aspose.Html;
 2using System.IO;
 3...
 4
 5    // Prepare a path to a source HTML file
 6    string documentPath = Path.Combine(DataDir, "document.html");
 7
 8    // Prepare a path for edited file saving 
 9    string savePath = Path.Combine(OutputDir, "document-edited.html");
10
11    // Initialize an HTML document from the file
12    using var document = new HTMLDocument(documentPath);
13
14    // Write the content of the HTML document into the console output
15    Console.WriteLine(document.DocumentElement.OuterHTML); // output: <html><head></head><body>Hello, World!</body></html>
16
17    // Edit the content of the body element
18    document.Body.InnerHTML = "<p>HTML is the standard markup language for Web pages.</p>";
19
20    // Write the content of the HTML document into the console output
21    Console.WriteLine(document.DocumentElement.OuterHTML); // output: <html><head></head><body><p>HTML is the standard markup language for Web pages.</p></body></html>
22
23    // Save the edited HTML file
24    document.Save(savePath);

In the C# example above, we take the following steps:

DOM methods

HTML DOM defines a set of methods that can be used to access and control all HTML elements. You can use these methods to perform various tasks, such as creating, modifying, and deleting elements, and managing their properties and events. For example, the most commonly used methods are:

Let’s look at the C# example that demonstrates how to use the HTMLDocument class to create new elements and text nodes, and how to use the AppendChild() method to add them to an HTML document.

 1using Aspose.Html;
 2using System.IO;
 3...
 4
 5    // Prepare a path for edited file saving 
 6    string savePath = Path.Combine(OutputDir, "dom.html");
 7
 8    // Initialize an empty HTML document
 9    using var document = new HTMLDocument();
10
11    // Declare a variable body that references the <body> element
12    var body = document.Body;
13    
14    // Create an <h1> element with text content
15    var h1 = document.CreateElement("h1");
16    var text1 = document.CreateTextNode("HTML DOM");
17    h1.AppendChild(text1);
18
19    // Create a <p> element with text content
20    var p = document.CreateElement("p");
21    var text2 = document.CreateTextNode("HTML Document Object Model is a programming interface for HTML documents.");
22    p.AppendChild(text2);
23
24    // Add new elements into <body>
25    body.AppendChild(h1);
26    body.AppendChild(p);
27
28    // Save the document to a file
29    document.Save(savePath);

The HTMLDocument class provides the main entry point for working with the DOM, it allows you to load and parse HTML documents and access nodes of the DOM tree. In the example, we used the HTMLDocument class to create a new HTML document, and the CreateElement() and CreateTextNode() methods of the HTMLDocument class to create new elements and text nodes.

Aspose.HTML offers free online HTML Web Applications that are an online collection of converters, mergers, SEO tools, HTML code generators, URL tools, and more. The applications work on any operating system with a web browser and do not require any additional software installation. Easily convert, merge, encode, generate HTML code, extract data from the web, or analyze web pages in terms of SEO wherever you are. Use our collection of HTML Web Applications to perform your daily matters and make your workflow seamless!

Text “Banner HTML Web Applications”

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.