Aspose.Words for Python via .NET 22.7 Release Notes

Major Features

There are 85 improvements and fixes in this regular monthly release. The most notable are:

  • Implemented an ability to convert PDF documents to fixed page formats with high fidelity and performance.
  • Implemented support of WCAG 2.0 PDF.
  • Implemented our own glyph outlines parsing for OpenType(CFF) fonts.
  • Introduced new HTML import mode for block-level elements.
  • Provided an ability to set shadow formatting of the shape object.

Full List of Issues Covering all Changes in this Release (Reported by .NET Users)

Key Summary Category
WORDSNET-13702 Support parsing of glyph data for OpenType(CFF) New Feature
WORDSNET-15752 Support DATABASE field New Feature
WORDSNET-19220 Add feature to support WCAG 2.0 PDF New Feature
WORDSNET-23295 Add a flag to take EXIF orientation in account while inserting a JPEG image by LINQ Reporting Engine New Feature
WORDSNET-23654 Add a new mode for import HTML block-level elements during inserting HTML via DocumentBuilder.InsertHtml method New Feature
WORDSNET-18125 Make sure saving to tagged PDF follows Section 508 Guidelines Enhancement
WORDSNET-6892 TextBox is not preserved on HTML import Bug
WORDSNET-14009 Text Font and Gradient fill not saved in PDF output Bug
WORDSNET-20981 Word document converted to PDF results in different font for last page Bug
WORDSNET-21368 Unexpected Bold Formatting to custom style during Word to HTML to Word conversion Bug
WORDSNET-22323 DOCX to PDF conversion issue with formula/equation rendering Bug
WORDSNET-22948 Import of SVG image differs from what is in browser Bug
WORDSNET-23313 Invalidate document layout after calling Document.Compare with two PDF documents Bug
WORDSNET-23544 Document missing sections after saving Bug
WORDSNET-23646 Date X-Axis shows values with incorrect step Bug
WORDSNET-23684 Incorrect calculation of indents for border box  around the formula Bug
WORDSNET-23701 Font size is not exported to HTML Bug
WORDSNET-23706 Numbering is broken after converting document to HTML Bug
WORDSNET-23709 Shape stroke is not rendered to JPEG Bug
WORDSNET-23783 Consider disabling support for external resources when loading EPUB documents Bug
WORDSNET-23810 Incorrect background image after Pdf2Word conversion Bug
WORDSNET-23817 Header height is changed that leads to layout issues Bug
WORDSNET-23828 Content is removed after saving the document Bug
WORDSNET-23829 DOCX to PDF: Characters rendered as boxes Bug
WORDSNET-23841 Text orientation is turned to vertical after converting to HTML Bug
WORDSNET-23851 Data label values are rendered improperly Bug
WORDSNET-23855 CryptographicException: The input data is not a complete block Bug
WORDSNET-23865 KeepSourceFormatting does not honor source document style Bug
WORDSNET-23866 Field updating hangs if document is optimized for Word2016 Bug
WORDSNET-23867 Wrong outlines are returned for the space character Bug
WORDSNET-23869 Incorrect font detection when rendering a formula Bug
WORDSNET-23874 Thickness of hairline is different when render with .NET and .NET Standard versions Bug
WORDSNET-23875 Header row is not repeated upon rendering for a floating table Bug
WORDSNET-23878 Text is wrapped improperly Bug
WORDSNET-23886 Style applied to text is changed after open/save DOCX document Bug
WORDSNET-23888 Aspose.Words hangs for a while upon loading MHTML file Bug
WORDSNET-23889 Wrong list numbering in SDT bound to custom XML part Bug
WORDSNET-23890 Evaluation watermark in ODT document overlaps content of the document Bug
WORDSNET-23902 Redundant space between letter is added upon rendering SVG image Bug
WORDSNET-23913 FileNotFoundException is thrown upon loading DOCX document Bug
WORDSNET-23918 ArgumentException because of duplicates in CustomDocumentProperties Bug
WORDSNET-23919 Aspose.Words hangs upon updating fields or layout Bug
WORDSNET-23922 Incorrect font detection for East Asian characters when rendering a formula Bug
WORDSNET-23924 InvalidCastException is thrown upon updating fields Bug
WORDSNET-23925 Word document not saving PNG Bug
WORDSNET-23929 Text is wrapped differently after rendering Bug
WORDSNET-23936 Reverse order of replies on the comment in the air Bug
WORDSNET-23941 ZlibException: Bad state (invalid distance code) Bug
WORDSNET-23942 Images are rendered in PDF as red cross Bug
WORDSNET-23947 System.OverflowException: Value was either too large or too small for an Int32 Bug
WORDSNET-23948 InvalidOperationException: MediaBox is null Bug
WORDSNET-23950 Reply naming differences within export to PDF Bug
WORDSNET-23951 Formating issue on the lastest Pdf2Word release Bug
WORDSNET-23952 Chart axis are not visible when render as SVG Bug
WORDSNET-23954 List labels in Swedish are rendered in English Bug
WORDSNET-23955 Spacing between numbers and Chinese hieroglyphs is too big in chart axis labels Bug
WORDSNET-23958 Exception when comparing documents Bug
WORDSNET-23963 List label is added to the paragraph on the next page when ExtractPages is used Bug
WORDSNET-23965 InvalidOperationException is thrown upon rendering document Bug
WORDSNET-23974 Style separator produces line break after rendering Bug
WORDSNET-23976 Korean text is not wrapped properly when WordWrap option is disabled Bug
WORDSNET-23981 DOCX to MD conversion exception Bug
WORDSNET-24010 ImportStyle() returns null for KeepDifferentStyles Bug
WORDSNET-24034 InvalidOperationException is thrown upon comparing document Bug

Full List of Issues Covering all Changes in this Release (Reported by Java Users)

Key Summary Category
WORDSNET-21279 Arabic text rendered LTR (garbled) when converting from document to PDF Bug
WORDSNET-21764 Math equations are blurred during exporting Word to HTML on Linux Bug
WORDSNET-22648 Incorrect Rendering of Math Equations in PDF Bug
WORDSNET-22896 Font Fallback does not work properly for text within SVG images Bug
WORDSNET-23598 Part of content is moved to previous page Bug
WORDSNET-23599 Whitespaces font is reset to Arial upon importing HTML Bug
WORDSNET-23623 API fails to load EML files as MHTML Bug
WORDSNET-23781 UpdatePageLayout hangs Bug
WORDSNET-23862 Chinese text in SVG is rendered as tofu when convert to PDF Bug
WORDSNET-23877 Provide API to remove the shape shadows Bug
WORDSNET-23893 InvalidOperationException is thrown upon executing mail merge Bug
WORDSNET-23909 Numbering is changed after inserting document Bug
WORDSNET-23910 Font is changed after inserting document when KeepDifferentStyles is used Bug
WORDSNET-23927 NullReferenceException is thrown upon rendering document Bug
WORDSNET-23937 Layout is different after DOCX to PDF conversion Bug
WORDSNET-23938 FileCorruptedException is thrown upon loading DOCX document Bug
WORDSNET-23968 Hanging during export to PDF Bug
WORDSNET-23970 Header and footer are lost after rendering Bug
WORDSNET-23979 Word to PDF -  conversion issue with floating table header rows Bug
WORDSNET-23980 IF field with wildcard is updated improperly Bug
WORDSNET-24007 FileCorruptedException on loading RTF file Bug

Public API and Backward Incompatible Changes

This section lists public API changes that were introduced in Aspose.Words 22.7. It includes not only new and obsoleted public methods, but also a description of any changes in the behavior behind the scenes in Aspose.Words which may affect existing code. Any behavior introduced that could be seen as regression and modifies the existing behavior is especially important and is documented here.

Added a new mode for import HTML block-level elements during inserting HTML via DocumentBuilder.insert_html() method

Related issue: WORDSNET-23654

New HTML insertion option was added to HtmlInsertOptions enum.

class HtmlInsertOptions(IntEnum):
    ...
    
    # Preserve properties of block-level elements.
    #
    # By default, properties of parent blocks are merged and stored on their child elements (i.e. paragraphs or tables).
    # If this option is specified, properties of each block are stored separately in a special logical structure.
    # As a result, this option allows to better preserve individual borders and margins seen in the HTML document
    # and get better conversion results. The downside is that the resulting document gets harder to modify, since borders
    # and margins stored in the logical structure are not available for editing.
    #
    # Only margins and borders of 'body', 'div', and 'blockquote' HTML elements are preserved. Properties of each HTML
    # element are stored separately.
    #
    # If this option is specified, Aspose.Words mimics MS Word's behavior regarding import of block properties.
    PRESERVE_BLOCKS = 4

The new mode of import HTML block-level elements during inserting HTML via DocumentBuilder.insert_html() method allows to better preserve borders and margins seen in the HTML document and get better conversion results.

html = """
<html>
    <div style='border:dotted'>
        <div style='border:solid'>
            <p>paragraph 1</p>
            <p>paragraph 2</p>
        </div>
    </div>
</html>
"""

# Set the new mode of import HTML block-level elements.
insert_options = aw.HtmlInsertOptions.PRESERVE_BLOCKS
builder = aw.DocumentBuilder()
builder.insert_html(html, insert_options)
builder.document.save(my_dir + "sample.docx")

Added new public property shadow_format

Related issue: WORDSNET-23877

A new public shadow_format property has been added to ShapeBase class

class ShapeBase:
    ...

    @property
    def shadow_format(self) -> aw.drawing.ShadowFormat:
        """Gets shadow formatting for the shape."""
        ...

With this property customers can set or get one of the preset shadow types.

class ShadowFormat:
    ...

    @property
    def type(self) -> aw.drawing.ShadowType:
        """Gets the specified ShadowType for ShadowFormat."""
        ...

    @type.setter
    def type(self, value: aw.drawing.ShadowType):
        """Sets the specified ShadowType for ShadowFormat."""
        ...

Users can also get information about a shadow’s visibility.

class ShadowFormat:
    ...

    @property
    def visible(self) -> bool:
        """Returns True if the formatting applied to this instance is visible.
        
        Unlike clear(), assigning False to visible does not clear the formatting,
        it only hides the shape effect."""
        ...

And it is also possible to clear ShadowFormat.

class ShadowFormat:
    ...

    def clear(self):
        """Clears shadow format."""
        ...

Use Case:

doc = aw.Document("DocumentWithShape.docx")
shape = doc.first_section.body.get_child(aw.NodeType.SHAPE, 0, True).as_shape()
# Checking whether the shadow effect is visible and whether the preset type is SHADOW2.
if shape.shadow_format.visible and shape.shadow_format.shape_type == aw.drawing.ShapeType.SHADOW2:
    # Setting the preset shadow type to SHADOW7.
    shape.shadow_format.type = aw.drawing.ShadowType.SHADOW7
# Checking whether the shadow is customized, i.e. the preset type is SHADOW_MIXED.
if shape.shadow_format.type == aw.drawing.ShadowType.SHADOW_MIXED:
    # Clearing ShadowFormat.
    shape.shadow_format.clear()

ReportBuildOptions.RESPECT_JPEG_EXIF_ORIENTATION enum member

Related issue: WORDSNET-23295

The following member has been added to the ReportBuildOptions enum:

class ReportBuildOptions(IntEnum):
    ...

    # Specifies that the engine should use EXIF ​​image orientation values to appropriately rotate inserted
    # JPEG images.
    RESPECT_JPEG_EXIF_ORIENTATION = 16

The option can be applied while building a report in the following way:

engine = aw.reporting.ReportingEngine()
engine.options |= aw.reporting.ReportBuildOptions.RESPECT_JPEG_EXIF_ORIENTATION
engine.build_report(...)

Added new class for saving PDFs to other fixed formats

Related feature task: WORDSNET-23059

We’ve added a new way to work with PDF input files. Now they can be converted into a fixed format without using Words layout model.

I.e. the feature runs without Document class and returns the result in a stream object.

Example:

pdf_renderer = aw.pdf2word.fixedformats.PdfFixedRenderer();
options = aw.pdf2word.fixedformats.PdfFixedOptions()
options.page_index = 0
options.page_count = 2
result_stream = pdf_renderer.save_pdf_as_html(pdf_stream, options)

Pros:

  • More accurate conversion (positions of text and other elements).
  • Better performance and memory usage (less logic to run, no need to build flow models, etc).

Cons:

  • The list of output formats is limited for now (PDF, Html, XPS, Jpeg, Png, Tiff, Bmp).
  • There is no way to edit the data during the conversion.
  • A small amount of options such as Password, page range and Jpeg image quality.

Supported methods:

save_pdf_as_html(...)
save_pdf_as_xps(...)
save_pdf_as_images(...)
save_pdf_as_pdf(...)

Available options:

  • page_index and page_count can be used to select a subset of pages.
  • password - allows to decode an encrypted PDF. The result would be decrypted.
  • jpeg_quality - can be provided before save_pdf_as_images calls to setup output Jpeg image quality.
  • image_format - should be used to specify the output image format for save_pdf_as_images.

All options are optional and can be ommited in favor of default values.