Aspose.Words for Python via .NET 22.5 Release Notes
Major Features
There are 126 improvements and fixes in this regular monthly release. The most notable are:
- Added support for loading EPUB documents.
- Added support for loading XML documents.
- Added support of “Envelope No. 10” page size for printing.
- Implemented rendering of a border box around the MathML formulas and the strike lines.
- Improved font detection when rendering characters in MathML formulas.
- Improved text wrapping for RTL paragraphs with custom left indent.
Full List of Issues Covering all Changes in this Release (Reported by .NET Users)
Key | Summary | Category |
---|---|---|
WORDSNET-3822 | Table headers are not wrapped properly | New Feature |
WORDSNET-8319 | Table column widths are calculated incorrectly during rendering | New Feature |
WORDSNET-8487 | Paragraphs followed by Tightly wrapped Shapes render incorrectly in PDF | New Feature |
WORDSNET-8838 | Support loading EPUB file format | New Feature |
WORDSNET-8931 | Tab spacing is not respected in fixed page formats | New Feature |
WORDSNET-9253 | Shaping issues with Telugu, Tamil, and Chinese characters | New Feature |
WORDSNET-10869 | Add feature to format page number | New Feature |
WORDSNET-12720 | Table contents do not render correctly in output PDF | New Feature |
WORDSNET-14941 | FILLIN fields are lost in output PDF and print | New Feature |
WORDSNET-22284 | Text position is changed after DOC to PDF conversion | New Feature |
WORDSNET-22697 | Add support for loading of XML documents | New Feature |
WORDSNET-22887 | Add loading progress notification | New Feature |
WORDSNET-23577 | Add .NET 6.0 assemblies to the release build | New Feature |
WORDSNET-7128 | Text wrapping in Cell is not correct in PDF | Enhancement |
WORDSNET-8325 | WordML to PDF conversion issue with table rendering | Enhancement |
WORDSNET-9075 | Table column widths are calculated incorrectly during rendering | Enhancement |
WORDSNET-12186 | Picture and Textbox cause Aspose.Words to render content on one additional page | Enhancement |
WORDSNET-13405 | Table width in percent is not honored when converted from DOCX to XPS | Enhancement |
WORDSNET-5460 | Table inside header of RTF was not rendered in PDF | Bug |
WORDSNET-5619 | Table widths are disturbed upon rendering to PDF | Bug |
WORDSNET-8037 | WordML to PDF conversion issue with text rendering | Bug |
WORDSNET-8327 | WordML to Pdf conversion issue with shape rendering | Bug |
WORDSNET-9172 | DOCX to PDF conversion issue with table formatting | Bug |
WORDSNET-9788 | DOC to PDF conversion issue with text (date) alignment | Bug |
WORDSNET-10017 | DrawingML TextBoxes are pushed to the left beyond the left boundary in fixed page formats | Bug |
WORDSNET-10410 | Table indentation is not preserved during rendering | Bug |
WORDSNET-10700 | RTF to PDF conversion issue with table rendering | Bug |
WORDSNET-10947 | Incorrect tab positioning causes incorrect text wrapping | Bug |
WORDSNET-11123 | Table widths are not calculated correctly during rendering to PDF | Bug |
WORDSNET-11500 | Incorrect position of wrapped text on conversion to PDF | Bug |
WORDSNET-11641 | Widths of Tables and cells are not preserved during rendering to PDF | Bug |
WORDSNET-11806 | DOC to PDF conversion issue with table layout | Bug |
WORDSNET-12099 | Table layouts are not correct in PDF | Bug |
WORDSNET-12381 | Table Cells widths are incorrect in rendered PDF | Bug |
WORDSNET-12750 | Table Cells widths are incorrect in rendered PDF | Bug |
WORDSNET-12979 | RenderedDocument and lines issue within table cells | Bug |
WORDSNET-13196 | Thai font is displayed in the wrong way in PDF | Bug |
WORDSNET-14989 | Thai characters are not preserved when rendered to PDF | Bug |
WORDSNET-16037 | Field.isDirty value always false | Bug |
WORDSNET-16742 | Arabic text is not rendered correctly in output PDF | Bug |
WORDSNET-18524 | Conversion RTF to PDF inconsistent table width | Bug |
WORDSNET-19215 | OfficeMath enclosing formula is crushed when outputting PDF | Bug |
WORDSNET-19798 | Cells in Table gets misplaced during open/save a DOC | Bug |
WORDSNET-22023 | Text alignments in narrow cells of PDF differs from Word after conversion | Bug |
WORDSNET-22605 | Split string in LINQ Reporting not working as expected | Bug |
WORDSNET-22669 | Table Content Pushed Down from its Original Position in PDF | Bug |
WORDSNET-22725 | Table Cut off Issue when converting Html to Word | Bug |
WORDSNET-22726 | Exception is thrown while converting from DOCX to HTML | Bug |
WORDSNET-22733 | Extra vertical spacing added between Rows of a Table with Merged Cells | Bug |
WORDSNET-22736 | Image position is changed after MHTML to PDF Conversion | Bug |
WORDSNET-22843 | Incorrect rendering of Column3D in PDF | Bug |
WORDSNET-22987 | Import differs from what is in browser | Bug |
WORDSNET-23025 | ArgumentException: Incorrect hex length | Bug |
WORDSNET-23225 | Aspose.Words hangs on document rendering | Bug |
WORDSNET-23279 | Horizontal axis labels are wrapped improperly | Bug |
WORDSNET-23330 | Image is not visible after import from AZW3 | Bug |
WORDSNET-23332 | Aspose.Words hangs when loading a MOBI document | Bug |
WORDSNET-23370 | UpdatePageLayout throws exception | Bug |
WORDSNET-23371 | Structured Document Tag gets removed | Bug |
WORDSNET-23394 | Document.UpdatePageLayout() throws System.InvalidOperationException : Infinite loop detected | Bug |
WORDSNET-23396 | Text wrapping does not match Word | Bug |
WORDSNET-23485 | Tab is lost upon converting document to HTML | Bug |
WORDSNET-23500 | Content is shifted upon rendering document | Bug |
WORDSNET-23504 | Text is wrapped improperly upon rendering | Bug |
WORDSNET-23505 | Aspose.Words improperly selects paper source upon printing. | Bug |
WORDSNET-23511 | RemoveEmptyParagraphs cleanup option does not work in case of nested IF fields | Bug |
WORDSNET-23527 | Graphics is lost on PDF import | Bug |
WORDSNET-23531 | Math equations alignment issue | Bug |
WORDSNET-23535 | Consider disabling LoadOptions.ResourceLoadingCallback invocations for data URLs | Bug |
WORDSNET-23536 | FileCorruptedException is thrown upon loading HTML document | Bug |
WORDSNET-23540 | DOCX to PDF: Text overlapping the document layout | Bug |
WORDSNET-23545 | Problem when editing PDF form field in Chrome | Bug |
WORDSNET-23563 | Content is lost upon loading PDF document | Bug |
WORDSNET-23565 | Numbers are rendered as tofu when use NumeralFormat.ArabicIndic | Bug |
WORDSNET-23578 | Inaccurate vertical alignment in equations when saving to PDF | Bug |
WORDSNET-23588 | ArgumentException is thrown upon loading MHTML document | Bug |
WORDSNET-23596 | Text alignment in table is incorrect | Bug |
WORDSNET-23604 | List numbering is wrong for lists from HTML altChunk’s | Bug |
WORDSNET-23607 | “Unsupported file format: Unknown” on loading TXT file | Bug |
WORDSNET-23642 | DOCX to PDF conversion causes layout issues in output PDF file | Bug |
WORDSNET-23643 | Chart series are lost after DOCX to PDF conversion | Bug |
WORDSNET-23644 | Bar charts' height decreases after DOCX to PDF conversion | Bug |
WORDSNET-23660 | AW does not imitate MS Word handling of an unsupported xml element | Bug |
WORDSNET-23661 | ReportingEngine.BuildReport throws an exception on .NET 6 when reflection optimization is on | Bug |
WORDSNET-23665 | Text in category labels is not wrapped | Bug |
WORDSNET-23667 | Font name and size does not match MS Word on WML to DOCX conversion | Bug |
WORDSNET-23668 | Extra paragraph in header on WML to DOCX conversion | Bug |
WORDSNET-23672 | Incorrect shape positions on WML to DOCX conversion | Bug |
WORDSNET-23677 | Do not invoke ResourceLoadingCallback for empty URLs | Bug |
WORDSNET-23685 | Document.ExtractPages() causes line numbers restarting | Bug |
WORDSNET-23693 | InvalidOperationException: Sequence contains more than one matching element | Bug |
WORDSNET-23696 | TestSaveOdt performance test fails on net5 and net6 CLR | Bug |
WORDSNET-23698 | DOC to PDF: Text with Shadow effect not correctly converted | Bug |
WORDSNET-23699 | RTL paragraph is positioned incorrectly inside an inline table with different left and right spacings | Bug |
WORDSNET-23703 | Font is changed after appending document with KeepSourceFormatting | Bug |
WORDSNET-23707 | DOC Compare System.InvalidOperationException: Custom XML part is not found. | Bug |
WORDSNET-23715 | FileCorruptedException is thrown upon loading DOCX document | Bug |
WORDSNET-23717 | SVG letter-spacing style gets ignored when converting DOCX to PDF | Bug |
WORDSNET-23718 | Document.ExtractPages changes list numbering | Bug |
WORDSNET-23725 | Wrong paragraph format when adding an image after Pdf2Word conversion | Bug |
WORDSNET-23730 | Fix StringComparison warnings | Bug |
WORDSNET-23732 | Fix StringComparison warnings | Bug |
WORDSNET-23733 | Fix StringComparison warnings | Bug |
WORDSNET-23735 | Wrong list numbering due to loss and non-use of DurableId attribute values | Bug |
WORDSNET-23743 | Part of content is moved into table upon reading RTF | Bug |
WORDSNET-23745 | Fix StringComparison warnings in fields/mailmerge domain | Bug |
WORDSNET-23757 | Comments anchor is misplaced after the saving | Bug |
WORDSNET-23760 | PDF can’t be loaded because of “Sequence contains more than one matching element” error | Bug |
WORDSNET-23791 | Fix customer issues using SonarQube analysis | Bug |
Full List of Issues Covering all Changes in this Release (Reported by Java Users)
Key | Summary | Category |
---|---|---|
WORDSNET-15581 | RTF to PDF conversion issue with table’s cell width | New Feature |
WORDSNET-19386 | Text-shift observed during Word to PDF conversion | New Feature |
WORDSNET-17061 | Wrong Font for certain Arabic Characters used in PDF | Bug |
WORDSNET-19196 | Text position is changed in output PDF | Bug |
WORDSNET-20866 | DOC to HTML conversion throws System.NullReferenceException | Bug |
WORDSNET-21486 | Imported SVG-based 3D Pie Chart Renders Incorrectly in Word | Bug |
WORDSNET-22835 | Unexpected Column Widths after HTML with Merged Cells is Converted to DOCX | Bug |
WORDSNET-23277 | Axis labels are wrapped improperly | Bug |
WORDSNET-23569 | FileCorruptedException is thrown upon loading HTML document | Bug |
WORDSNET-23571 | Uppercase text is rendered as regular text | Bug |
WORDSNET-23592 | UpdateFields() fails with NPE | Bug |
WORDSNET-23658 | System.InvalidOperationException: Stack empty. is thrown on Range.Replace | Bug |
WORDSNET-23673 | FileCorruptedException is thrown upon loading DOCX document | Bug |
WORDSNET-23678 | Aspose.Words hangs upon rendering document | Bug |
WORDSNET-23695 | System.InvalidOperationException: Infinite loop detected. exception thrown | Bug |
WORDSNET-23716 | Images are lost after loading word 2003 XML document | Bug |
WORDSNET-23766 | Ident of list item is incorrect after comparing documents | Bug |
Public API and Backward Incompatible Changes
This section lists public API changes that were introduced in Aspose.Words 22.5. It includes not only new and obsoleted public methods, but also a description of any changes in the behavior behind the scenes in Aspose.Words which may affect existing code. Any behavior introduced that could be seen as regression and modifies the existing behavior is especially important and is documented here.
Added support for loading EPUB documents
Related issue: WORDSNET-8838
Aspose.Words now can load EPUB 2.0 documents.
EPUB is an e-book file format that uses the “.epub” file extension. A EPUB document is a collection of XHTML documents. Currently, Aspose.Words always loads all XHTML files from a EPUB document in the order in which they appear in the content file (OPF).
The following publicly visible enum values were added:
The FileFormatUtil class can now be used to determine if a file is a EPUB document. For example, the following call
info = aw.FileFormatUtil.detect_file_format("book.epub")
will return an info instance with the FileFormatInfo.load_format property set to LoadFormat.EPUB.
The use cases for loading EPUB documents are as follows:
doc = aw.Document("book.epub")
Added support for loading XML documents
Related issue: WORDSNET-22697
Aspose.Words now can load XML documents. The Extensible Markup Language (XML) is a simple text-based format for representing structured information: documents, data, configuration, books, transactions, invoices, and much more. Aspose.Words mimics MS Word behavior during import XML documents.
The following publicly visible enum value was added:
The FileFormatUtil class can now be used to determine if a file is a XML document. For example, the following call
info = aw.FileFormatUtil.detect_file_format("sample.xml")
will return an info instance with the FileFormatInfo.load_format property set to LoadFormat.XML.
The use cases for loading XML documents are as follows:
doc = aw.Document("sample.xml")
Introduced ChapterPageSeparator enum and added PageSetup.chapter_page_separator and PageSetup.heading_level_for_chapter properties
Related issue: WORDSNET-10869
The ChapterPageSeparator enum is introduced:
class ChapterPageSeparator(enum.IntEnum):
"""Defines the separator character that appears between the chapter and page number."""
# A colon.
HYPHEN = 0
# A period.
PERIOD = 1
# A colon.
COLON = 2
# An emphasized dash.
EM_DASH = 3
# A standard dash.
EN_DASH = 4
The following public properties are added to PageSetup class:
class PageSetup:
...
@property
def heading_level_for_chapter(self) -> int:
"""Gets or sets the heading level style that is applied to the chapter titles in the document.
Can be a number from 0 through 9. 0 means no chapter number if applied to page number.
Before you can create page numbers that include chapter numbers, the document headings must have a numbered outline format applied."""
...
@property
def chapter_page_separator(self) -> ChapterPageSeparator:
"""Gets or sets the separator character that appears between the chapter number and the page number.
Before you can create page numbers that include chapter numbers, the document headings must have a numbered outline format applied."""
...
Use Case:
doc = aw.Document(file_name);
page_setup = doc.first_section.page_setup
page_setup.page_number_style = aw.NumberStyle.UPPERCASE_ROMAN
page_setup.chapter_page_separator = aw.ChapterPageSeparator.COLON
page_setup.heading_level_for_chapter = 1
Slight changes in markup nodes typed collection
Related issue: WORDSNET-23774
The default indexer for markup nodes collection has been changed. Now it is the index number of a structured document tag in the collection.
class StructuredDocumentTagCollection:
...
def __getitem__(self, index: int) -> IStructuredDocumentTag:
"""Returns the structured document tag at the specified index.
:param index: An index into the collection."""
...
Along with this, it has become possible to remove a structured document tag at the specified index number, as well as remove a structured document tag by its identifier.
class StructuredDocumentTagCollection:
...
def remove(self, id: int):
"""Removes the structured document tag with the specified identifier.
:param id: The structured document tag identifier."""
...
def remove_at(self, index: int):
"""Removes a structured document tag at the specified index.
:param index: An index into the collection."""
...
The functionality that the indexer has previously performed by ID is now available through get_by_id() method.
class StructuredDocumentTagCollection:
...
def get_by_id(self, id: int) -> IStructuredDocumentTag:
"""Returns the structured document tag by identifier.
Returns None if the structured document tag with the specified identifier cannot be found.
:param id: The structured document tag identifier."""
Use Case:
structured_document_tags = doc.range.structured_document_tags
# We iterate through all collection elements, getting each element by its index number.
for i in range(structured_document_tags.count):
sdt = structured_document_tags[i]
print(std.title)
# Get the structured document tag by its Id.
sdt = structured_document_tags.get_by_id(1160505028)
if sdt is not None:
print(sdt.title)
# Remove the structured document tag by its Id.
structured_document_tags.remove(1160505028)
# Remove the structured document tag at position 0.
structured_document_tags.remove_at(0)
Added “NUMBER_10_ENVELOPE” value to PaperSize enum
Related issue: WORDSNET-23505
Added support of “Envelope No. 10” page size (4.125 x 9.5 inches) for printing.
Use Case:
# This value is used to set the page size as follows:
doc = aw.Document(file_name)
doc.first_section.page_setup.paper_size = aw.PaperSize.NUMBER_10_ENVELOPE
# Or in a similar way using DocumentBuilder:
builder = aw.DocumentBuilder(doc)
builder.page_setup.paper_size = aw.PaperSize.NUMBER_10_ENVELOPE
HtmlSaveOptions.export_text_box_as_svg was marked as obsolete
Related issue: WORDSNET-23514
The HtmlSaveOptions.export_text_box_as_svg
property is now obsolete.
The customers should use the HtmlSaveOptions.export_shapes_as_svg, which affects text boxes as well.