Edit HTML Document

Aspose.HTML for Java provides a robust API to create, modify, and manage HTML documents programmatically using the Document Object Model (DOM). This article demonstrates how to edit HTML documents, including node manipulation, setting styles, and working with inline and internal CSS.

Document Object Model

The Document Object Model, or DOM for short, is a standard cross-platform programming API that helps programmers access and modify parts of a document. DOM defines the structure of a document as a tree with a hierarchy of nodes, where each node represents a part of the document, such as an element, class, tag, attribute, or text. For example, each piece, such as an image or piece of text, is called a node. A DOM tree is how or with what structure a document is represented in memory. In other words, the Document Object Model creates a logical document structure and defines objects, properties, events, and methods to access and modify them. The DOM is widely used in web development for tasks such as managing web page elements and responding to user interactions.

Edit HTML with Java

HTML DOM defines HTML elements as objects, providing a set of properties and methods that you can use to access and manage them. Each element in an HTML document is represented by a node in the DOM tree, and each node has its own set of properties and methods.

As we already mentioned in the article Create HTML Document the implementation of HTMLDocument as well as the whole DOM are based on WHATWG DOM standard. So, it is easy to use Aspose.HTML having a basic knowledge of HTML and JavaScript languages. The DOM package is represented with the following fundamental data types:

ClassDescription
DocumentThe Document class represents the entire HTML, XML or SVG document. Conceptually, it is the root of the document tree and provides the primary access to the document’s data.
EventTargetThe EventTarget class is implemented by all Nodes in an implementation that supports the DOM Event Model.
NodeThe Node class is the primary datatype for the entire Document Object Model. It represents a single node in the document tree.
ElementThe element type is based on node and represents a base class for HTML, XML or SVG DOM.
AttrThe Attr class represents an attribute in an Element object. Typically the allowable values for the attribute are defined in a schema associated with the document.

DOM Methods

HTML DOM defines a set of methods that can be used to access and control all HTML elements. You can use these methods to perform various tasks, such as creating, modifying, and deleting elements, and managing their properties and events. The following is a brief list of useful API methods provides by the core data types:

MethodDescription
Document.getElementById(elementId)The method, when invoked, must return the first element whose ID is elementId and null if there is no such element otherwise.
Document.getElementsByTagName(name)The method must return the list of elements with the given name.
Document.createElement(localName)The method creates the HTML element specified by tagName, or an HTMLUnknownElement if tagName isn’t recognized.
Node.appendChild(node)The method adds a node to the end of the list of children of a specified parent node.
Element.setAttribute(name, value)Sets the value of an attribute on the specified element.
Element.getAttribute(name)The method returns the value of a specified attribute on the element.
Element.innerHTMLReturns a fragment of markup contained within the element.

There are many ways you can edit HTML by using our library. You can modify the document by inserting new nodes, removing, or editing the content of existing nodes. If you need to create a new node, the following methods are ones that need to be invoked:

MethodDescription
Document.createCDATASection(data)Creates a CDATASection node whose value is the specified string.
Document.createComment(data)Creates a Comment node given the specified string.
Document.createDocumentFragment()Creates a new empty DocumentFragment into which DOM nodes can be added to build an offscreen DOM tree.
Document.createElement(localName)Creates an element of the type specified.
Document.createEntityReference(name)Creates an EntityReference object.
Document.createProcessingInstruction(target, data)Creates an ProcessingInstruction with the specified name and data.
Document.createTextNode(data)Creates a Text node given the specified string.

Once you have new nodes are created, there are several methods in DOM that can help you to insert nodes into the document tree. The following list describes the most common way of inserting nodes:

MethodDescription
Node.insertBefore(node, child)Inserts the node before the reference child node
Node.appendChild(node)Adds the node to the list of children of the current node

To remove a node from the HTML DOM tree, please use the Node.removeChild(child) method.

For a complete list of interfaces and methods represented in the DOM package please visit API Reference Source.

Edit HTML Document Tree

Let’s look at how to edit an HTML document using a DOM tree and the mentioned above functional. Consider simple steps to create and edit HTML. The following Java code demonstrates how to create an HTML document from scratch, add styled text paragraph, and save the result:

  1. Create an instance of an HTML document using HTMLDocument() constructor.
  2. Create a <style> element using createElement(“style”) method.
  3. Call the setTextContent() method to set the specified text content within the style element. The text content .gr { color: green } is a CSS rule. It targets elements with the class name "gr" and sets their color to green.
  4. Use the getElementsByTagName(name) method to find the <head> element and append the style element as a child to the head element.
  5. Create a paragraph element with class-name "gr" using createElement("p") and setClassName("gr") methods.
  6. Create a text node and add it as a child to the <p> element – use the createTextNode() and appendChild() methods.
  7. Add the paragraph to the document body.
  8. Save the HTML document to a file using save() method.
 1// Create an instance of the HTMLDocument class
 2HTMLDocument document = new HTMLDocument();
 3
 4// Create a style element and assign the green color for all elements with class-name equals "gr"
 5Element style = document.createElement("style");
 6style.setTextContent(".gr { color: green }");
 7
 8// Find the document header element and append the style element to the header
 9Element head = document.getElementsByTagName("head").get_Item(0);
10head.appendChild(style);
11
12// Create a paragraph element with class-name "gr"
13HTMLParagraphElement p = (HTMLParagraphElement) document.createElement("p");
14p.setClassName("gr");
15
16// Create a text node
17Text text = document.createTextNode("Hello, World!!");
18
19// Append the text node to the paragraph
20p.appendChild(text);
21
22// Append the paragraph to the document body element
23document.getBody().appendChild(p);
24
25// Save the HTML document to a file
26document.save("using-dom.html");
27
28// Create an instance of the PDF output device and render the document into this device
29PdfDevice device = new PdfDevice("using-dom.html");
30
31// Render HTML to PDF
32document.renderTo(device);

The resulting HTML file looks like this:

1<html>
2	<head>
3		<style>.gr { color: green; }</style>
4	</head>
5	<body>
6		<p class="gr"> Hello, World!! </p>
7	</body>
8</html>

Using setInnerHTML() and getOuterHTML() methods

Having DOM objects gives you a powerful tool to manipulate with an HTML Document. However, sometime much better to work just with Class String. The following example demonstrates how to create an HTML document using the Aspose.HTML Java library: set the body element’s content and output the HTML document to the console using setInnerHTML() and getOuterHTML() methods of the Element class:

  1. Create an instance of the HTMLDocument class using the HTMLDocument() constructor. It creates an empty HTML document.
  2. To output the original content of the HTML document to the console, use the getOuterHTML() method. The output will be <html><head></head><body></body></html> since the document is initially empty.
  3. Use the setInnerHTML() method to set the content of the <body> element: add an HTML <p> element with the text content to the body element.
  4. Print the updated content of the HTML document to the console using the getOuterHTML() method.
 1// Create an instance of the HTMLDocument class
 2HTMLDocument document = new HTMLDocument();
 3
 4// Write the content of the HTML document into the console output
 5System.out.println(document.getDocumentElement().getOuterHTML());
 6// output: <html><head></head><body></body></html>
 7
 8// Set the content of the <body> element
 9document.getBody().setInnerHTML("<p>HTML is the standard markup language for Web pages.</p>");
10
11// Write the content of the HTML document into the console output
12System.out.println(document.getDocumentElement().getOuterHTML());
13// output: <html><head></head><body><p>HTML is the standard markup language for Web pages.</p></body></html>

Working with Styles

Inline CSS

Cascading Style Sheets (CSS) is a style sheet language used for describing how webpages look in the browser. Aspose.HTML not only support CSS out-of-the-box but also gives you instruments to manipulate with document styles just on the fly before converting the HTML document to the other formats.

When CSS is written using the style attribute inside of an HTML tag, it’s called an “inline style CSS”. The Inline CSS gives you to apply an individual style to one HTML element at a time. You set CSS to an HTML element by using the style attribute with any CSS properties defined within it. In the following code snippet, you can see how to specify CSS style properties for an HTML <p> element:

 1// Create an instance of an HTML document with specified content
 2String content = "<p> Inline CSS </p>";
 3HTMLDocument document = new HTMLDocument(content, ".");
 4
 5// Find the paragraph element to set a style attribute
 6HTMLElement paragraph = (HTMLElement) document.getElementsByTagName("p").get_Item(0);
 7
 8// Set the style attribute
 9paragraph.setAttribute("style", "font-size: 250%; font-family: verdana; color: #cd66aa");
10
11// Save the HTML document to a file
12document.save("edit-inline-css.html");
13
14// Create an instance of the PDF output device and render the document into this device
15PdfDevice device = new PdfDevice("edit-inline-css.html");
16document.renderTo(device);

In this particular example, color, font-size and font-family apply to the <p> element. The fragment of rendered pdf page looks like this:

Text “Inline CSS”

External CSS

Add a <style> element to the document’s <head> for global styles:

 1// Create an instance of an HTML document with specified content
 2String content = "<div><p>Internal CSS</p><p>An internal CSS is used to define a style for a single HTML page</p></div>";
 3HTMLDocument document = new HTMLDocument(content, ".");
 4
 5// Create a style element with text content
 6Element style = document.createElement("style");
 7style.setTextContent(".frame1 { margin-top:50px; margin-left:50px; padding:20px; width:360px; height:90px; background-color:#a52a2a; font-family:verdana; color:#FFF5EE;} \r\n" +
 8        ".frame2 { margin-top:-90px; margin-left:160px; text-align:center; padding:20px; width:360px; height:100px; background-color:#ADD8E6;}");
 9
10// Find the document header element and append the style element to the header
11Element head = document.getElementsByTagName("head").get_Item(0);
12head.appendChild(style);
13
14// Find the first paragraph element to inspect the styles
15HTMLElement paragraph = (HTMLElement) document.getElementsByTagName("p").get_Item(0);
16paragraph.setClassName("frame1");
17
18// Find the last paragraph element to inspect the styles
19HTMLElement lastParagraph = (HTMLElement) document.getElementsByTagName("p").get_Item(document.getElementsByTagName("p").getLength() - 1);
20lastParagraph.setClassName("frame2");
21
22// Set a font-size to the first paragraph
23paragraph.getStyle().setFontSize("250%");
24paragraph.getStyle().setTextAlign("center");
25
26// Set a color and font-size to the last paragraph
27lastParagraph.getStyle().setColor("#434343");
28lastParagraph.getStyle().setFontSize("150%");
29lastParagraph.getStyle().setFontFamily("verdana");
30
31// Save the HTML document to a file
32document.save("edit-internal-css.html");
33
34// Create an instance of the PDF output device and render the document on that device
35PdfDevice device = new PdfDevice("edit-internal-css.html");
36
37// Render HTML to PDF
38document.renderTo(device);

The figure illustrates the fragment of rendered “edit-internal-css.pdf” file:

Text “Internal CSS”

Conclusion

Aspose.HTML for Java offers a powerful and flexible API for editing HTML documents. You can create, manipulate, and render web content programmatically by leveraging the DOM. With its adherence to modern standards and advanced features, Aspose.HTML for Java streamlines complex web development tasks. By leveraging these features, you can effectively manage and customize HTML content for your specific needs.

You can download the complete examples and data files from GitHub.

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.