Create HTML Document – Create, Load HTML in Java
HTML Document
The HTMLDocument is a starting point for the Aspose.HTML class library. You can load the HTML page into the Document Object Model (DOM) using the HTMLDocument class, then programmatically read, modify, or remove HTML in the document. The HTMLDocument class has several overloaded constructors allowing you to create a blank document or to load HTML from a file, URL, or stream.
The HTMLDocument provides an in-memory representation of an HTML DOM and is entirely based on W3C DOM and WHATWG DOM specifications supported by many modern browsers. If you are familiar with WHATWG DOM, WHATWG HTML, and JavaScript standards, you will find it quite comfy to use the Aspose.HTML library. Otherwise, you can visit www.w3schools.com, where you can find many examples and tutorials on how to work with HTML documents.
Create an Empty HTML Document
Once the document object is created, it can be filled later with HTML elements. The following code snippet shows the usage of the default HTMLDocument() constructor to create an empty HTML document and save it to a file.
1// Initialize an empty HTML Document
2com.aspose.html.HTMLDocument document = new com.aspose.html.HTMLDocument();
3try {
4 // Save the HTML document to a disk
5 document.save("create-empty-document.html");
6} finally {
7 if (document != null) {
8 document.dispose();
9 }
10}
After the creation, file create-empty-document.html appears with the initial document structure: the empty document includes elements such as <html>
<head>
and <body>
. More details about HTML files saving are in the
Save HTML Document article.
Create a New HTML Document
To generate a document programmatically from scratch, use the HTMLDocument()
constructor with no parameters, as shown in the code snippet above. You can then populate the document with content, such as creating a text node and adding it to the body of the document:
1// Initialize an empty HTML Document
2com.aspose.html.HTMLDocument document = new com.aspose.html.HTMLDocument();
3
4// Prepare an output path for the document saving
5String documentPath = "create-new-document.html";
6
7try {
8 // Create a text element and add it to the document
9 Text text = document.createTextNode("How to Create an HTML Document?");
10 document.getBody().appendChild(text);
11
12 // Save the document to a disk
13 document.save(documentPath);
14} finally {
15 if (document != null) {
16 document.dispose();
17 }
18}
Load HTML from a File
Following code snippet shows how to load the HTMLDocument from an existing file:
1// Prepare a 'load-from-file.html' file
2try (java.io.FileWriter fileWriter = new java.io.FileWriter("load-from-file.html")) {
3 fileWriter.write("Load HTML from a File!");
4}
5
6// Load from a 'load-from-file.html' file
7com.aspose.html.HTMLDocument document = new com.aspose.html.HTMLDocument("load-from-file.html");
8
9// Write the document content to the output stream.
10System.out.println(document.getDocumentElement().getOuterHTML());
Load HTML from a URL
In the next code snippet you can see how to load a web page into HTMLDocument.
In case if you pass a wrong URL that can’t be reached right at the moment, the library throws the DOMException with specialized code ‘NetworkError’ to inform you that the selected resource can not be found.
1// Load a document from 'https://docs.aspose.com/html/net/creating-a-document/document.html' web page
2com.aspose.html.HTMLDocument document = new com.aspose.html.HTMLDocument("https://docs.aspose.com/html/net/creating-a-document/document.html");
3
4// Write the document content to the output stream
5System.out.println(document.getDocumentElement().getOuterHTML());
Load HTML from HTML Code
If you prepare an HTML code as an in-memory Class String or Class InputStream objects, you don’t need to save them to the file, simply pass your HTML code into specialized constructors. The following Java code snippet demonstrates how to create an HTML document using the Aspose.HTML library in Java from scratch and save it to a file:
1// Prepare HTML code
2String html_code = "<p>Load HTML from HTML Code</p>";
3
4// Initialize a document from the string variable
5com.aspose.html.HTMLDocument document = new com.aspose.html.HTMLDocument(html_code, ".");
6
7// Save the document to a disk
8document.save("create-from-string.html");
In case your HTML code has the linked resources (styles, scripts, images, etc.), you need to pass a valid baseUrl parameter to the constructor of the document. It will be used to resolve the location of the resource during the document loading.
Load HTML from a Stream
To create an HTML document from a stream, you can use the HTMLDocument(stream, string) constructor:
1// Create a memory stream object
2MemoryStream mem = new MemoryStream();
3StreamWriter sw = new StreamWriter(mem);
4
5sw.write("<p>Hello, World! I can load HTML from a stream!</p>");
6
7// It is important to set the position to the beginning since HTMLDocument starts the reading exactly from the current position within the stream
8sw.flush();
9mem.seek(0, SeekOrigin.Begin);
10
11// Initialize a document from the stream variable
12com.aspose.html.HTMLDocument document = new com.aspose.html.HTMLDocument(mem, ".");
13
14// Save the document to a disk
15document.save("load-from-stream.html");
SVG Document
Since Scalable Vector Graphics (SVG) is a part of W3C standards and could be embedded into HTMLDocument, we implemented SVGDocument and all its functionality. Our implementation is based on official specification SVG 2 specification, so you can load, read, manipulate SVG documents as it described officially.
Since SVGDocument and HTMLDocument are based on the same WHATWG DOM standard, the all operations such as loading, reading, editing, converting and saving are similar for both documents. So, the all examples where you can see manipulation with HTMLDocument are applicable for SVGDocument as well.
The example below shows you how to load the SVG Document from the in-memory Class String variable:
1// Initialize an SVG document from a string object
2com.aspose.html.dom.svg.SVGDocument document = new com.aspose.html.dom.svg.SVGDocument("<svg xmlns='http://www.w3.org/2000/svg'><circle cx='60' cy='60' r='40'/></svg>", ".");
3
4// Write the document content to the output stream
5System.out.println(document.getDocumentElement().getOuterHTML());
MHTML Document
MHTML stands for MIME encapsulation of aggregate HTML documents. It is a speficalized format to create web page archives. The Aspose.HTML library supports this format, but with some limitations. We only support the rendering operations from MHTML to the supported output formats. For more details, please read Converting Between Formats article.
EPUB Document
For EPUB format, which represents an electronic publication format, we have the same limitation as for MHTML. We only support the rendering operations from EPUB to the supported output formats. For more details, please read Converting Between Formats article.
Asynchronous Operations
We realize that loading a document could be a resource-intensive operation since it’s required loading not only the document itself but all linked resources and processing all scripts. So, in the following code snippets, we show you how to use asynchronous operations and load HTMLDocument without blocking the main thread:
1// Initialize an AutoResetEvent
2AutoResetEvent resetEvent = new AutoResetEvent(false);
3
4// Create an instance of an HTML document
5com.aspose.html.HTMLDocument document = new com.aspose.html.HTMLDocument();
6
7// Create a string variable for OuterHTML property reading
8StringBuilder outerHTML = new StringBuilder();
9
10// Subscribe to 'ReadyStateChange' event
11// This event will be fired during the document loading process
12document.OnReadyStateChange.add(new com.aspose.html.dom.events.DOMEventHandler() {
13 @Override
14 public void invoke(Object sender, com.aspose.html.dom.events.Event e) {
15 // Check the value of the 'ReadyState' property
16 // This property is representing the status of the document. For detail information please visit https://www.w3schools.com/jsref/prop_doc_readystate.asp
17 if (document.getReadyState().equals("complete"))
18 {
19 // Fill the outerHTML variable by value of loaded document
20 outerHTML.append(document.getDocumentElement().getOuterHTML());
21 resetEvent.set();
22 }
23 }
24});
25
26// Wait 5 seconds for the file to load
27if (!resetEvent.waitOne(5000))
28{
29 System.out.println("Thread works too long, more than 5000 ms");
30}
31System.out.println("outerHTML = " + outerHTML);
ReadyStateChange is not the only event that can used to handle an async loading operation, you can also subscribe for Load event, as it follows:
1AutoResetEvent resetEvent = new AutoResetEvent(false);
2
3// Initialize an HTML document
4com.aspose.html.HTMLDocument document = new com.aspose.html.HTMLDocument();
5AtomicBoolean isLoading = new AtomicBoolean(false);
6
7// Subscribe to the 'OnLoad' event
8// This event will be fired once the document is fully loaded
9document.OnLoad.add(new DOMEventHandler() {
10 @Override
11 public void invoke(Object o, Event event) {
12 isLoading.set(true);
13 resetEvent.set();
14 }
15});
16
17// Navigate asynchronously at the specified Uri
18document.navigate("https://docs.aspose.com/html/net/creating-a-document/document.html");
19
20// Here the document is not loaded yet
21
22// Wait 5 seconds for the file to load
23if (!resetEvent.waitOne(5000))
24{
25 System.out.println("Thread works too long, more than 5000 ms");
26}
27
28// Here is the loaded document
29System.out.println("outerHTML = " + document.getDocumentElement().getOuterHTML());
The following Java code example uses the HTMLDocumentWaiter
class in the context of working with HTML documents asynchronously in the Aspose.HTML for Java library. The HTMLDocumentWaiter
class provides constructors and methods that execute the asynchronous loading operation in a separate thread and waits until either the loading is finished or the current thread is interrupted. Let’s see what the code does:
1package com.aspose.html.examples.java;
2
3public class HTMLDocumentWaiter implements Runnable {
4
5 private Examples_Java_WorkingWithDocuments_CreatingADocument_HTMLDocumentAsynchronouslyOnLoad html;
6
7 public HTMLDocumentWaiter(Examples_Java_WorkingWithDocuments_CreatingADocument_HTMLDocumentAsynchronouslyOnLoad html) {
8 this.html = html;
9 try {
10 this.html.execute();
11 } catch (Exception e) {
12 e.printStackTrace();
13 }
14 }
15
16 @Override
17 public void run() {
18 System.out.println("Current Thread: " + Thread.currentThread().getName() + "; " + Thread.currentThread().getId());
19 try {
20 while (!Thread.currentThread().isInterrupted() && html.getMsg() == null) {
21 Thread.currentThread().sleep(60000);
22 }
23 } catch (InterruptedException ex) {
24 Thread.currentThread().interrupt();
25 }
26 }
27}
The following code snippet describes the SimpleWait
class, which contains the main() method that serves as the entry point for a Java application. Inside the main()
method, an html
instance of the class Examples_Java_WorkingWithDocuments_CreatingADocument_HTMLDocumentAsynchronouslyOnLoad
is created. It is responsible for loading the HTML document asynchronously and creates an HTMLDocumentWaiter object to wait for the loading to complete. Finally, it starts a new thread to execute the waiting process:
1package com.aspose.html.examples.java;
2
3public class SimpleWait {
4
5 public static void main(String... args) {
6
7 var html =
8 new Examples_Java_WorkingWithDocuments_CreatingADocument_HTMLDocumentAsynchronouslyOnLoad();
9 var htmlDocumentWaiter = new HTMLDocumentWaiter(html);
10 new Thread(htmlDocumentWaiter, "html").start();
11
12 }
13}
You can download the complete examples and data files from GitHub.