Convert HTML to Markdown – Aspose.HTML for Python via .NET
Markdown is a plain-text formatting syntax commonly used for documentation and readme files due to its readability and ease of writing. Initially designed to convert only to HTML, its versatility allows easy conversion to many other formats. Aspose.HTML for Python via .NET library offers the capability to reverse this process by converting HTML back to Markdown. This allows you to access, edit, and create Markdown files from any device using any text editor.
In this article, you will learn how to convert HTML to Markdown using convert_html() methods of the Converter class and how to apply MarkdownSaveOptions.
To continue following this tutorial, install and configure the Aspose.HTML for Python via .NET in your Python project. Our code examples help you to convert HTML to PDF and generate PDF files using the Python library.
Online HTML Converter
You can convert HTML to Markdown with Aspose.HTML for Python via .NET API in real time. First, load an HTML file from your local drive or URL and run the example. In the code example, the save options are set by default. You will immediately receive the result of the HTML to Markdown conversion as a separate Markdown file.
HTML to Markdown – Python Code Example
The methods of the Converter class are primarily used as the easiest way to convert an HTML code into various formats. To convert HTML to Markdown, you should follow a few steps:
- Load, open or read an HTML document. In the following example, we initialize an HTML document from a code string.
- Create a new MarkdownSaveOptions object and specify the required properties.
- To convert HTML to Markdown, use the convert_html() method of the Converter class, passing it an HTMLDocument, MarkdownSaveOptions, and the path to the output file.
To convert HTML to Markdown, you can define your own set of rules or use the predefined templates. For instance, you can use the template based on
GitLab Flavored Markdown syntax. The following code snippet shows how to convert HTML to Markdown using git
property of the MarkdownSaveOptions
:
1from aspose.html import *
2from aspose.html.converters import *
3from aspose.html.saving import *
4
5# Prepare an HTML code and save it to a file
6code = "<h1>Header 1</h1>" \
7 "<h2>Header 2</h2>" \
8 "<p>Hello World!!</p>"
9with open("document.html", "w", encoding="utf-8") as f:
10 f.write(code)
11 f.close()
12 # Call the convert_html method to convert HTML to Markdown
13 Converter.convert_html("document.html", MarkdownSaveOptions.git, "output.md")
Save Options – MarkdownSaveOptions class
The Markdown creation functionality can be enhanced with save options per your needs. The MarkdownSaveOptions has a number of properties that give you control over the conversion process:
Property | Description |
---|---|
default | This property returns a set of options that are compatible with default Markdown documentation. |
features | This property is a flag set that controls which HTML elements are converted to Markdown. For example, you can choose to convert only links and paragraphs or include a broader range of elements such as headers, lists, and tables. |
formatter | This property gets or sets the Markdown formatting style. Options available: GIT and DEFAULT. |
git | This property returns a set of options that are compatible with GitLab Flavored Markdown, which is a popular extension of Markdown used by GitLab. |
resource_handling_options | This property provides access to a ResourceHandlingOptions object which is used to configure how resources are handled during the conversion process. |
Convert HTML to Markdown – Using features
Property
The most important option is features. This option allows you to enable/disable the conversion of the particular element. The following example shows how to process only links and paragraphs, other HTML elements remain as is:
1import os
2from aspose.html.converters import *
3from aspose.html.saving import *
4
5# Prepare directories and paths
6output_dir = "output/"
7if not os.path.exists(output_dir):
8 os.makedirs(output_dir)
9
10save_path = os.path.join(output_dir, "options-output.md")
11
12# Prepare HTML code and save it to a file
13code = "<h1>Header 1</h1>" \
14 "<h2>Header 2</h2>" \
15 "<p>Hello, World!!</p>" \
16 "<a href="https://docs.aspose.com/">aspose</a>"
17with open(os.path.join(output_dir, "options.html"), "w") as file:
18 file.write(code)
19
20# Create an instance of SaveOptions and set up the rule:
21# – only <a> and <p> elements will be converted to Markdown
22options = MarkdownSaveOptions()
23options.features = MarkdownFeatures.LINK | MarkdownFeatures.AUTOMATIC_PARAGRAPH
24
25# Call the convert_html() method to convert HTML to Markdown
26Converter.convert_html(os.path.join(output_dir, "options.html"), options, save_path)
Features nesting
Markdown supports a lot of features, but not all of them can be used together. As an example list elements inside of table elements would not be converted. The following table shows what features can be nested. Each feature is a member of the MarkdownFeatures enumeration.
Parent feature | Features which can be processed inside |
---|---|
Header | Link, Emphasis, Strong, InlineCode, Image, Strikethrough, Video |
Blockquote | Any |
List | AutomaticParagraph, Link, Emphasis, Strong, InlineCode, Image, LineBreak, Strikethrough, Video, TaskList, List |
Link | Emphasis, Strong, InlineCode, Image, LineBreak, Strikethrough |
AutomaticParagraph | Link, Emphasis, Strong, InlineCode, Image, LineBreak, Strikethrough |
Strikethrough | Link, Emphasis, Strong, InlineCode, Image, LineBreak |
Table | Video, Strikethrough, Image, InlineCode, Emphasis, Strong, Link |
Emphasis | Link, InlineCode, Image, LineBreak, Strikethrough, Video |
Strong | Link, InlineCode, Image, LineBreak, Strikethrough, Video |
Limitation
Markdown is a lightweight and easy-to-use syntax. However, not all HTML elements can be converted to Markdown since there is no equivalent in Markdown syntax. Elements such as STYLE, SCRIPT, LINK, EMBED, etc. will be discarded during conversion.
Inline HTML
Markdown allows you to specify the pure HTML code, which will be rendered as is. The feature that allows this behavior is called “Inline HTML”. In order to use it, you should place one of the specific elements supported by this feature at the beginning of the new line. Or you can mark one of such elements as “Inline HTML”, by adding the attribute markdown with the value inline to this element. Here is a small example that demonstrates how to use this attribute:
1import os
2from aspose.html.converters import *
3from aspose.html.saving import *
4
5# Prepare directories and paths
6output_dir = "output/"
7if not os.path.exists(output_dir):
8 os.makedirs(output_dir)
9
10save_path = os.path.join(output_dir, "inline-html.md")
11
12# Prepare HTML code and save it to a file
13code = "text<div markdown="inline"><code>text</code></div>"
14with open(os.path.join(output_dir, "inline.html"), "w") as file:
15 file.write(code)
16
17# Call the convert_html() method for HTML to Markdown conversion
18Converter.convert_html(os.path.join(output_dir, "inline.html"), MarkdownSaveOptions(), save_path)
19
20# Output file will contain: text\r\n<div markdown="inline"><code>text</code></div>
As you can see, the content of the <div>
element is not converted to Markdown and is treated by Markdown Processor as-is. The list of elements that support this feature is different for every Markdown processor.
The original Markdown specification supports these tags: BLOCKQUOTE, H1, H2, H3, H4, H5, H6, P, PRE, OL, UL, DL, DIV, INS, DEL, IFRAME, FIELDSET, NOSCRIPT, FORM, MATH.
The GitLab Flavored Markdown extends this list with the next tags: ARTICLE, FOOTER, NAV, ASIDE, HEADER, ADDRESS, HR, DD, FIGURE, FIGCAPTION, ABBR, VIDEO, AUDIO, OUTPUT, CANVAS, SECTION, DETAILS, HGROUP, SUMMARY.
The Markdown Syntax – Basic Tutorial article provides information on the Markdown markup language’s main elements and the Markdown syntax details.
Download our Aspose.HTML for Python via .NET library to successfully, quickly, and easily convert your HTML, MHTML, EPUB, SVG, and Markdown documents to the most popular formats.
Aspose.HTML offers a free online HTML to MD Converter that converts HTML to Markdown with high quality, easy and fast. Just upload, convert your files and get results in a few seconds!