Paragraph Features Supported on HTML Import

General Formatting

Paragraph style and formatting can be imported from HTML in the form of tags such as <h1> to <h6> or from <p> tags that have CSS styles.

<h1> to <h6> tags are imported into the Aspose.Words DOM as the built-in Heading styles: Heading 1 - Heading 6.

Inline CSS (through use of the style attribute) is imported as direct formatting on the paragraph (stored in the ParagraphFormat of the Paragraph node).

An Embedded or Linked CSS style (through use of the class attribute) is imported as a Style and applied to the Paragraph node in the document. This style formatting can be accessed using the ParagraphFormat.Style property. A linked CSS sheet can also be downloaded automatically from an external address on the internet.

When there is conflicting formatting on inline and embedded/external CSS, as with CSS the formatting from inline styles are taken first, then the embedded formatting and finally the external formatting.

Feature Supported Comment See Also
Paragraph Style Yes Styles are imported from embedded or external style sheets. If there is no linked style sheet of either of these kinds then the document is imported with no styles (apart from default Normal style). To make sure styles are imported use a style sheet of any kind. There is a load option to control whether embedded or external style sheets are read or skipped during HTML import. There is also an option to supply your own CSS style sheet instead.

- ParagraphFormat

- ParagraphFormat.Style

Alignment Yes Imported from the “text-align” paragraph style attribute. - ParagraphFormat.Alignment
Right to Left Paragraph Planned   - ParagraphFormat.Bidi
Bullets and Numbers Yes Imported from <ol>, <ul>, <li> tags.

Simulated lists using <p> and <span> look correct but will not be imported as proper lists in the DOM.

- ParagraphFormat.ListFormat

- ParagraphFormat.ListLabel

Outline Level Planned   - ParagraphFormat.OutlineLevel
Run Properties for the Paragraph Mark Planned Can be implemented with Microsoft Office specific techniques.

During import the formatting from the last span from <p> becomes the font properties for the paragraph.
- ParagraphFormat.ParagraphBreakFont
Suppress Line Numbers Planned   - ParagraphFormat.SurpressLineNumbers
Suppress Hyphenation Planned   - ParagraphFormat.SurpressAutoHyphens

Indents

Feature Supported Comment See Also
Left Indent Yes Imported from margin-left on style attribute. - ParagraphFormat.LeftIndent
Right Indent Yes Imported from margin-right on style attribute. - ParagraphFormat.RightIndent
First Line Indent Yes Imported from text-indent on style attribute. - ParagraphFormat.FirstLineIndent
Hanging Indent Yes Imported from a combination of margin-left and text-indent style attribute. - ParagraphFormat.FirstLineIndent
Mirror Indents N/A    
Automatically Adjust Right Indent N/A    

Spacing

Feature Supported Comment See Also
Space Before Yes Imported from “margin-top” style attribute.

If this attribute is missing from a paragraph during import from HTML then Space Before is set to Auto.
- ParagraphFormat.SpaceBefore
Space After Yes Imported from “margin-bottom” style attribute.

If this attribute is missing from a paragraph during import from HTML then Space After is set to Auto.
- ParagraphFormat.SpaceAfter
Space Auto Yes Paragraphs imported from HTML without margin-top or margin-bottom style attributes are imported as Auto spacing by default.

- ParagraphFormat.SpaceBeforeAuto

- ParagraphFormat.SpaceAfterAuto

Line Spacing Yes Imported from “line-height” style attribute.

- ParagraphFormat.LineSpacing

- ParagraphFormat.LineSpacingRule

No Space between Conforming Paragraphs Planned   - ParagraphFormat.NoSpaceBetweenParagraphsOfSameStyle
Snap To Grid Planned    

Keeps and Breaks

Feature Supported Comment See Also
Widow/Orphan Control Yes Imported from “widows” CSS attribute.

A value of 0 from this attribute is imported as Widow/Orphan control as being disabled. A value of 1 or greater is imported as enabled.

A paragraph without this attrubite is automatically given Widow/Orphan control in the model.
- ParagraphFormat.WidowControl
Keep With Next Yes Imported from style attribute with “page-break-after:avoid”. - ParagraphFormat.KeepWithNext
Keep Lines Together Yes Imported from style attribute with “page-break-inside:avoid”. - ParagraphFormat.KeepTogether
Page Break Before Yes Imported from “page-break-before” on style attribute. - ParagraphFormat.PageBreakBefore

Text Frames

This is the legacy text frames from Word 97, not to be confused with the Autoshape Textbox which is discussed under Drawing Objects.

Text frames are preserved in the model but there is no API or node to modify or access information about frames.

Frames exported to HTML as paragraphs surronded by a border.

These are round-tripped back to a document with similar formatting but not as actual text frames

Feature Supported Comment See Also
Text Frames Planned    

Tab Stops

All features of tab stops are supported in Aspose.Words except for relative tab stops.

Using Aspose.Words you can find tab stops based off position or index. You can change tab stop features like position, alignment etc or remove tabstops completely.

Tab stops are not natively available in HTML so Aspose.Words exports spacing as a set of non-breaking spaces. These can not be imported back as tab-stops again.

In future improvements, Aspose.Words will convert tab stops as a fixed space which will should allow proper round-trip. In the same way we will also provide support for the Microsoft Office mso-tab-count attribute.

See the following link in the documentation for further information:

  • ParagraphFormat.TabStops
Feature Supported Comment See Also
Absolute Position Planned   - TabStop.Position
Relative Position Planned A relative position tab can be inserted in Microsoft Word using the “Insert Alignment Tab” button. This type of tab is relative to either the page margin or the indent of the paragraph.

This allows tab stops to appear in the same relative place even when the position of the paragraph or page is modified.

Currently Aspose.Words supports these types of tab stops in OOXML and WordML formats only. There is currently no API to retrieve the properties of this tab e.g RelativeTo, Alignment, Leader etc. Further support is planned.

This feature might be supported during HTML import if a proper analog can be found.
- AbsolutePositionTab
Alignment: Left, Center, Right, Decimal, Bar Planned   - TabStop.Alignment
Leader Planned   - TabStop.Leader

Drop Caps

Drop Caps are partially supported and preserved during document conversion. A drop cap is a text frame which is imported as a separate paragraph (from the rest of the paragraph as seen in the source document).

You can modify drop cap properties and position, however the new settings are not applied to the drop cap. You cannot yet create new drop caps (although you can easily simulate them through the use of a textbox).

This will be improved in a future version of Aspose.Words.

Drop cap is a frame. During import the appearance of a drop cap is round-tripped correctly, however it is not imported as proper drop cap therefore options cannot be modified.

See the following links in the documentation for further information:

  • ParagraphFormat.DropCapPositon
  • ParagraphFormat.LinesToDrop
Feature Supported Comment See Also
Drop Caps Yes    

Borders

Borders are imported from border-style, border-width etc on style or from indivudal borders using border-xxx-style and border-xxx-width etc style attributes.

A div with embedded or linked CSS containing a border style has all of the paragraphs and spans inside the div imported with full borders. This will be improved in a future version.

Feature Supported Comment See Also
Border Sides Yes  

- ParagraphFormat.Borders

- LineStyle

Shadow Planned   - Border.Shadow
3D Frame Planned   - Border.LineStyle
Style Yes   - Border.LineStyle
Color Yes   - Border.Color
Width Yes   - Border.LineWidth
Distance from Text Yes Imported from “padding-xxx” settings. - Border.DistanceFromText

Shading

Fill color imported from “background-color” on style attribute.

Currently cell background is imported as paragraph shading. This will be improved in a future version of Aspose.Words.

See the following link in the documentation for further information:

  • ParagraphFormat.Shading
Feature Supported Comment See Also
Shading Yes    

Asian Typography

Asian Typography settings is fully supported during conversion. However there is currently no API to access or modify these settings.

Feature Supported Comment See Also
Use Asian Rules for Controlling First and Last Characters Planned    
Allow Latin Text to Wrap in the Middle of a Word Planned    
Allow Hanging Punctuation Planned    
Allow Punctuation at Start of a Line to Compress Planned    
Automatically Adjust Space between Asian and Latin Text Planned    
Automatically Adjust Space between Asian Text and Numbers Planned    
Text Vertical Alignment Planned