Convert HTML to Markdown – C# Examples and Online Converter
Markdown is a markup language with a plain-text-formatting syntax. Markdown is often used as a format for documentation and readme files since it allows writing in an easy-to-read and easy-to-write style. It is popular with technical writers for its simplicity of use, lightweight learning and broad support. Its design allows it to be easily converted to many output formats, but initially, it was created to convert only to HTML. Aspose.HTML class library provides a reversed conversion from HTML to Markdown. You can access and edit Markdown files or create new content from any device in any text editor.
In this article, you find information on how to convert HTML to MD using ConvertHTML() methods of the Converter class, and how to apply MarkdownSaveOptions. Our code examples help you to convert HTML to Markdown using the C# library.
Online HTML Converter
You can convert HTML to Markdown with Aspose.HTML for .NET API in real time. First, load an HTML file from your local drive and then run the example. In this example, the save options are set by default. You will immediately receive the result of the HTML to Markdown conversion as a separate Markdown file.
HTML to Markdown by a few lines of code
You can convert HTML to Markdown format using C# and other .NET programming languages. The static methods of the Converter class are primarily used as the easiest way to convert an HTML code into various formats. The following code snippet shows how to convert HTML to Markdown literally with a few lines of code!
1// Prepare HTML code and save it to a file
2var code = "<h1>Header 1</h1>" +
3 "<h2>Header 2</h2>" +
4 "<p>Convert HTML to Markdown</p>";
5File.WriteAllText("convert.html", code);
6
7// Call ConvertHTML() method to convert HTML to Markdown
8Converter.ConvertHTML("convert.html", new MarkdownSaveOptions(), Path.Combine(OutputDir, "convert.md"));
Save Options
The Markdown creation functionality can be enhanced with save options per your needs. The MarkdownSaveOptions has a number of properties that give you control over the conversion process. The most important option is MarkdownSaveOptions.Features. This option allows you to enable/disable the conversion of the particular element.
Property | Description |
---|---|
Default | This property returns a set of options that are compatible with default Markdown documentation. |
Features | A flag set that controls which HTML elements are converted to Markdown. |
Formatter | This property gets or sets the Markdown formatting style. |
Git | This property returns a set of options that are compatible with GitLab Flavored Markdown. |
ResourceHandlingOptions | Gets a ResourceHandlingOptions object which is used for configuration of resources handling. |
To learn more about MarkdownSaveOptions, please read the Fine-Tuning Converters article.
Convert HTML to Markdown using MarkdownSaveOptions
To convert HTML to Markdown with MarkdownSaveOptions
specifying, you should follow a few steps:
- Load an HTML file using one of the HTMLDocument() constructors of the HTMLDocument class.
- Create a new MarkdownSaveOptions object.
- Use the ConvertHTML() method of the Converter class to save HTML as a Markdown file. You need to pass the HTMLDocument, MarkdownSaveOptions, and output file path to the ConvertHTML() method to convert HTML to Markdown.
The following example shows how to process only links and paragraphs, other HTML elements remain as is:
1// Prepare a path for converted file saving
2string savePath = Path.Combine(OutputDir, "options-output.md");
3
4// Prepare HTML code and save it to the file
5var code = "<h1>Header 1</h1>" +
6 "<h2>Header 2</h2>" +
7 "<p>Hello, World!!</p>" +
8 "<a href='aspose.com'>aspose</a>";
9File.WriteAllText(Path.Combine(OutputDir, "options.html"), code);
10
11// Create an instance of SaveOptions and set up the rule:
12// - only <a> and <p> elements will be converted to Markdown
13var options = new MarkdownSaveOptions();
14options.Features = MarkdownFeatures.Link | MarkdownFeatures.AutomaticParagraph;
15
16// Call the ConvertHTML() method to convert the HTML to Markdown
17Converter.ConvertHTML(Path.Combine(OutputDir, "options.html"), options, savePath);
To convert HTML to Markdown you can define your own set of rules or use the predefined templates. For instance, you can use the template based on GitLab Flavored Markdown syntax:
1// Prepare a path for converted file saving
2string savePath = Path.Combine(OutputDir, "output-git.md");
3
4// Prepare HTML code and save it to the file
5var code = "<h1>Header 1</h1>" +
6 "<h2>Header 2</h2>" +
7 "<p>Hello, World!!</p>";
8File.WriteAllText(Path.Combine(OutputDir, "document.html"), code);
9
10// Call ConvertHTML() method to convert HTML to Markdown
11Converter.ConvertHTML(Path.Combine(OutputDir, "document.html"), MarkdownSaveOptions.Git, savePath);
Limitation
Markdown is a lightweight and easy-to-use syntax. However, not all HTML elements can be converted to Markdown since there is no equivalent in Markdown syntax. Elements such as STYLE, SCRIPT, LINK, EMBED, etc. will be discarded during conversion.
Inline HTML
Markdown allows you to specify the pure HTML code, which will be rendered as is. The feature, which allows this behavior, is called “Inline HTML”. In order to use it, you should place one of the specific elements, supported by this feature, at the beginning of new line. Or you can mark one of such elements as “Inline HTML”, by adding the attribute markdown with the value inline to this element. Here is a small example that demonstrates how to use this attribute:
1// Prepare a path for converted file saving
2string savePath = Path.Combine(OutputDir, "inline-html.md");
3
4// Prepare HTML code and save it to the file
5var code = "text<div markdown='inline'><code>text</code></div>";
6File.WriteAllText(Path.Combine(OutputDir, "inline.html"), code);
7
8// Call ConvertHTML() method to convert HTML to Markdown.
9Converter.ConvertHTML(Path.Combine(OutputDir, "inline.html"), new MarkdownSaveOptions(), savePath);
10
11// Output file will contain: text\r\n<div markdown="inline"><code>text</code></div>
As you can see, the content of the <div>
element is not converted to Markdown and is treated by Markdown Processor as-is. The list of elements that support this feature is different for every Markdown processor.
The original Markdown specification supports these tags: BLOCKQUOTE, H1, H2, H3, H4, H5, H6, P, PRE, OL, UL, DL, DIV, INS, DEL, IFRAME, FIELDSET, NOSCRIPT, FORM, MATH.
The GitLab Flavored Markdown extends this list with the next tags: ARTICLE, FOOTER, NAV, ASIDE, HEADER, ADDRESS, HR, DD, FIGURE, FIGCAPTION, ABBR, VIDEO, AUDIO, OUTPUT, CANVAS, SECTION, DETAILS, HGROUP, SUMMARY.
Features nesting
Markdown supports a lot of features, but not all of them can be used together. As an example list elements inside of table elements would not be converted. The following table shows what features can be nested. Each feature is a member of the MarkdownFeatures enumeration.
Parent feature | Features which can be processed inside |
---|---|
Header | Link, Emphasis, Strong, InlineCode, Image, Strikethrough, Video |
Blockquote | Any |
List | AutomaticParagraph, Link, Emphasis, Strong, InlineCode, Image, LineBreak, Strikethrough, Video, TaskList, List |
Link | Emphasis, Strong, InlineCode, Image, LineBreak, Strikethrough |
AutomaticParagraph | Link, Emphasis, Strong, InlineCode, Image, LineBreak, Strikethrough |
Strikethrough | Link, Emphasis, Strong, InlineCode, Image, LineBreak |
Table | Video, Strikethrough, Image, InlineCode, Emphasis, Strong, Link |
Emphasis | Link, InlineCode, Image, LineBreak, Strikethrough, Video |
Strong | Link, InlineCode, Image, LineBreak, Strikethrough, Video |
You can download the complete examples and data files from GitHub.
Download our Aspose.HTML for .NET library allows you to successfully, quickly, and easily convert your HTML, MHTML, EPUB, SVG, and Markdown documents to the most popular formats.
Aspose.HTML offers a free online HTML to MD Converter that converts HTML to Markdown with high quality, easy and fast. Just upload, convert your files and get results in a few seconds!