Aspose.Words for Java 22.5 Release Notes

Major Features

There are 130 improvements and fixes in this regular monthly release. The most notable are:

  • Added support for loading EPUB documents.
  • Added support for loading XML documents.
  • Added support of “Envelope No. 10” page size for printing.
  • Implemented rendering of a border box around the MathML formulas and the strike lines.
  • Improved font detection when rendering characters in MathML formulas.
  • Improved text wrapping for RTL paragraphs with custom left indent.

Full List of Issues Covering all Changes in this Release (Reported by Java Users)

Key Summary Category
WORDSNET-19386 Text-shift observed during Word to PDF conversion New Feature
WORDSNET-15581 RTF to PDF conversion issue with table’s cell width New Feature
WORDSJAVA-2636 JsonDataSource can’t properly parse root element of input stream. Bug
WORDSJAVA-2641 Chart Axis formatting for near-zero value. Bug
WORDSJAVA-2725 Incorrect LeftIndent() values for Xml documents Bug
WORDSJAVA-2726 Small files with ambivalent encoding and file format detection. Bug
WORDSNET-17061 Wrong Font for certain Arabic Characters used in PDF Bug
WORDSNET-23673 FileCorruptedException is thrown upon loading DOCX document Bug
WORDSNET-23678 Aspose.Words hangs upon rendering document Bug
WORDSNET-19196 Text position is changed in output PDF Bug
WORDSNET-23658 System.InvalidOperationException: Stack empty.  is thrown on Range.Replace Bug
WORDSNET-23695 System.InvalidOperationException: Infinite loop detected. exception thrown Bug
WORDSNET-23716 Images are lost after loading word 2003 XML document Bug
WORDSNET-22835 Unexpected Column Widths after HTML with Merged Cells is Converted to DOCX Bug
WORDSNET-23766 Ident of list item is incorrect after comparing documents Bug
WORDSNET-23277 Axis labels are wrapped improperly Bug
WORDSNET-23569 FileCorruptedException is thrown upon loading HTML document Bug
WORDSNET-23571 Uppercase text is rendered as regular text Bug
WORDSNET-23592 UpdateFields() fails with NPE Bug
WORDSNET-21486 Imported SVG-based 3D Pie Chart Renders Incorrectly in Word Bug
WORDSNET-20866 DOC to HTML conversion throws System.NullReferenceException Bug

Full List of Issues Covering all Changes in this Release (Reported by .NET Users)

Key Summary Category
WORDSNET-9253 Shaping issues with Telugu, Tamil, and Chinese characters New Feature
WORDSNET-8319 Table column widths are calculated incorrectly during rendering New Feature
WORDSNET-8838 Support loading EPUB file format New Feature
WORDSNET-3822 Table headers are not wrapped properly New Feature
WORDSNET-8931 Tab spacing is not respected in fixed page formats New Feature
WORDSNET-14941 FILLIN fields are lost in output PDF and print New Feature
WORDSNET-22284 Text position is changed after DOC to PDF conversion New Feature
WORDSNET-22697 Add support for loading of XML documents New Feature
WORDSNET-22887 Add loading progress notification New Feature
WORDSNET-23577 Add .NET 6.0 assemblies to the release build New Feature
WORDSNET-12720 Table contents do not render correctly in output PDF New Feature
WORDSNET-8487 Paragraphs followed by Tightly wrapped Shapes render incorrectly in PDF New Feature
WORDSNET-10869 Add feature to format page number New Feature
WORDSNET-9075 Table column widths are calculated incorrectly during rendering Enhancement
WORDSNET-7128 Text wrapping in Cell is not correct in PDF Enhancement
WORDSNET-8325 WordML to PDF conversion issue with table rendering Enhancement
WORDSNET-12186 Picture and Textbox cause Aspose.Words to render content on one additional page Enhancement
WORDSNET-13405 Table width in percent is not honored when converted from DOCX to XPS Enhancement
WORDSNET-12750 Table Cells widths are incorrect in rendered PDF Bug
WORDSNET-5460 Table inside header of RTF was not rendered in PDF Bug
WORDSNET-10700 RTF to PDF conversion issue with table rendering Bug
WORDSNET-22733 Extra vertical spacing added between Rows of a Table with Merged Cells Bug
WORDSNET-12381 Table Cells widths are incorrect in rendered PDF Bug
WORDSNET-10410 Table indentation is not preserved during rendering Bug
WORDSNET-8327 WordML to Pdf conversion issue with shape rendering Bug
WORDSNET-10947 Incorrect tab positioning causes incorrect text wrapping Bug
WORDSNET-11641 Widths of Tables and cells are not preserved during rendering to PDF Bug
WORDSNET-9172 DOCX to PDF conversion issue with table formatting Bug
WORDSNET-18524 Conversion RTF to PDF inconsistent table width Bug
WORDSNET-11806 DOC to PDF conversion issue with table layout Bug
WORDSNET-11500 Incorrect position of wrapped text on conversion to PDF Bug
WORDSNET-8037 WordML to PDF conversion issue with text rendering Bug
WORDSNET-5619 Table widths are disturbed upon rendering to PDF Bug
WORDSNET-11123 Table widths are not calculated correctly during rendering to PDF Bug
WORDSNET-12979 RenderedDocument and lines issue within table cells Bug
WORDSNET-22669 Table Content Pushed Down from its Original Position in PDF Bug
WORDSNET-10017 DrawingML TextBoxes are pushed to the left beyond the left boundary in fixed page formats Bug
WORDSNET-12099 Table layouts are not correct in PDF Bug
WORDSNET-23607 “Unsupported file format: Unknown” on loading TXT file Bug
WORDSNET-23332 Aspose.Words hangs when loading a MOBI document Bug
WORDSNET-22023 Text alignments in narrow cells of PDF differs from Word after conversion Bug
WORDSNET-13196 Thai font is displayed in the wrong way in PDF Bug
WORDSNET-19215 OfficeMath enclosing formula is crushed when outputting PDF Bug
WORDSNET-16742 Arabic text is not rendered correctly in output PDF Bug
WORDSNET-23643 Chart series are lost after DOCX to PDF conversion Bug
WORDSNET-23642 DOCX to PDF conversion causes layout issues in output PDF file Bug
WORDSNET-23644 Bar charts' height decreases after DOCX to PDF conversion Bug
WORDSNET-9788 DOC to PDF conversion issue with text (date) alignment Bug
WORDSNET-23661 ReportingEngine.BuildReport throws an exception on .NET 6 when reflection optimization is on Bug
WORDSNET-23665 Text in category labels is not wrapped Bug
WORDSNET-23668 Extra paragraph in header on WML to DOCX conversion Bug
WORDSNET-23667 Font name and size does not match MS Word on WML to DOCX conversion Bug
WORDSNET-22605 Split string in LINQ Reporting not working as expected Bug
WORDSNET-23685 Document.ExtractPages() causes line numbers restarting Bug
WORDSNET-19798 Cells in Table gets misplaced during open/save a DOC Bug
WORDSNET-23698 DOC to PDF: Text with Shadow effect not correctly converted Bug
WORDSNET-23699 RTL paragraph is positioned incorrectly inside an inline table with different left and right spacings Bug
WORDSNET-23660 AW does not imitate MS Word handling of an unsupported xml element Bug
WORDSNET-23703 Font is changed after appending document with KeepSourceFormatting Bug
WORDSNET-23707 DOC Compare System.InvalidOperationException: Custom XML part is not found. Bug
WORDSNET-23693 InvalidOperationException: Sequence contains more than one matching element Bug
WORDSNET-23672 Incorrect shape positions on WML to DOCX conversion Bug
WORDSNET-23696 TestSaveOdt performance test fails on net5 and net6 CLR Bug
WORDSNET-23715 FileCorruptedException is thrown upon loading DOCX document Bug
WORDSNET-23717 SVG letter-spacing style gets ignored when converting DOCX to PDF Bug
WORDSNET-23718 Document.ExtractPages changes list numbering Bug
WORDSNET-23725 Wrong paragraph format when adding an image after Pdf2Word conversion Bug
WORDSNET-23732 Fix StringComparison warnings Bug
WORDSNET-23225 Aspose.Words hangs on document rendering Bug
WORDSNET-22987 Import differs from what is in browser Bug
WORDSNET-23371 Structured Document Tag gets removed Bug
WORDSNET-23743 Part of content is moved into table upon reading RTF Bug
WORDSNET-23745 Fix StringComparison warnings in fields/mailmerge domain Bug
WORDSNET-22843 Incorrect rendering of Column3D in PDF Bug
WORDSNET-23730 Fix StringComparison warnings Bug
WORDSNET-23394 Document.UpdatePageLayout() throws System.InvalidOperationException : Infinite loop detected Bug
WORDSNET-23396 Text wrapping does not match Word Bug
WORDSNET-22736 Image position is changed after MHTML to PDF Conversion Bug
WORDSNET-23757 Comments anchor is misplaced after the saving Bug
WORDSNET-23760 PDF can’t be loaded because of “Sequence contains more than one matching element” error Bug
WORDSNET-23677 Do not invoke ResourceLoadingCallback for empty URLs Bug
WORDSNET-22726 Exception is thrown while converting from DOCX to HTML Bug
WORDSNET-23279 Horizontal axis labels are wrapped improperly Bug
WORDSNET-23330 Image is not visible after import from AZW3 Bug
WORDSNET-16037 Field.isDirty value always false Bug
WORDSNET-23604 List numbering is wrong for lists from HTML altChunk’s Bug
WORDSNET-23735 Wrong list numbering due to loss and non-use of DurableId attribute values Bug
WORDSNET-23791 Fix customer issues using SonarQube analysis Bug
WORDSNET-23370 UpdatePageLayout throws exception Bug
WORDSNET-23025 ArgumentException: Incorrect hex length Bug
WORDSNET-23485 Tab is lost upon converting document to HTML Bug
WORDSNET-23500 Content is shifted upon rendering document Bug
WORDSNET-23504 Text is wrapped improperly upon rendering Bug
WORDSNET-23511 RemoveEmptyParagraphs cleanup option does not work in case of nested IF fields Bug
WORDSNET-23527 Graphics is lost on PDF import Bug
WORDSNET-23531 Math equations alignment issue Bug
WORDSNET-23535 Consider disabling LoadOptions.ResourceLoadingCallback invocations for data URLs Bug
WORDSNET-23536 FileCorruptedException is thrown upon loading HTML document Bug
WORDSNET-23545 Problem when editing PDF form field in Chrome Bug
WORDSNET-23540 DOCX to PDF: Text overlapping the document layout Bug
WORDSNET-23563 Content is lost upon loading PDF document Bug
WORDSNET-23565 Numbers are rendered as tofu when use NumeralFormat.ArabicIndic Bug
WORDSNET-23578 Inaccurate vertical alignment in equations when saving to PDF Bug
WORDSNET-23505 Aspose.Words improperly selects paper source upon printing. Bug
WORDSNET-23588 ArgumentException is thrown upon loading MHTML document Bug
WORDSNET-23596 Text alignment in table is incorrect Bug
WORDSNET-14989 Thai characters are not preserved when rendered to PDF Bug
WORDSNET-23733 Fix StringComparison warnings Bug
WORDSNET-22725 Table Cut off Issue when converting Html to Word Bug

Public API and Backward Incompatible Changes

This section lists public API changes that were introduced in Aspose.Words 22.5. It includes not only new and obsoleted public methods, but also a description of any changes in the behavior behind the scenes in Aspose.Words which may affect existing code. Any behavior introduced that could be seen as regression and modifies the existing behavior is especially important and is documented here.

Added support for loading EPUB documents

Related issue: WORDSNET-8838

Aspose.Words now can load EPUB 2.0 documents.

EPUB is an e-book file format that uses the “.epub” file extension. A EPUB document is a collection of XHTML documents. Currently, Aspose.Words always loads all XHTML files from a EPUB document in the order in which they appear in the content file (OPF).

The following publicly visible enum values were added:

FileFormat.Epub
LoadFormat.Epub
WarningSource.Epub

The FileFormatUtil class can now be used to determine if a file is a EPUB document. For example, the following call

FileFormatInfo info = FileFormatUtil.DetectFileFormat("book.epub");

will return an info instance with the FileFormatInfo.LoadFormat property set to LoadFormat.Epub. Of all load options only LoadOptions.ResourceLoadingCallback currently has effect when working with EPUB documents. It is useful for loading EPUB documents when the customer wants to control how external resources are loaded,

The use cases for loading EPUB documents are as follows:

Document doc = new Document("book.epub");
or
LoadOptions options = new LoadOptions
{
    ResourceLoadingCallback = new CustomResourceLoadingCallback();
};
Document doc = new Document("book.epub", options);

Added support for loading XML documents

Related issue: WORDSNET-22697

Aspose.Words now can load XML documents. The Extensible Markup Language (XML) is a simple text-based format for representing structured information: documents, data, configuration, books, transactions, invoices, and much more. Aspose.Words mimics MS Word behavior during import XML documents.

The following publicly visible enum value was added:

LoadFormat.Xml

The FileFormatUtil class can now be used to determine if a file is a XML document. For example, the following call

FileFormatInfo info = FileFormatUtil.DetectFileFormat("sample.xml");

will return an info instance with the FileFormatInfo.LoadFormat property set to LoadFormat.Xml.

The use cases for loading XML documents are as follows:

Document doc = new Document("sample.xml");

Introduced ChapterPageSeparator enum and added PageSetup.ChapterPageSeparator and PageSetup.HeadingLevelForChapter properties

Related issue: WORDSNET-10869

The ChapterPageSeparator enum is introduced:

/// <summary>
/// Defines the separator character that appears between the chapter and page number.
/// </summary>
/// <seealso cref="PageSetup"/>
/// <seealso cref="PageSetup.ChapterPageSeparator"/>
public enum ChapterPageSeparator
{
    /// <summary>
    /// A colon.
    /// </summary>
    Hyphen = 0,
    /// <summary>
    /// A period.
    /// </summary>
    Period = 1,
    /// <summary>
    /// A colon.
    /// </summary>
    Colon = 2,
    /// <summary>
    /// An emphasized dash.
    /// </summary>
    EmDash = 3,
    /// <summary>
    /// A standard dash.
    /// </summary>
    EnDash = 4
}

The following public properties are added to PageSetup class:

/// <summary>
/// Gets or sets the heading level style that is applied to the chapter titles in the document.
/// </summary>
/// <remarks>
/// <p>Can be a number from 0 through 9. 0 means no chapter number if applied to page number.</p>
/// <p>Before you can create page numbers that include chapter numbers, the document headings must have a numbered outline format applied.</p>
/// </remarks>
public int HeadingLevelForChapter  { get; set; }
 
/// <summary>
/// Gets or sets the separator character that appears between the chapter number and the page number.
/// </summary>
/// <remarks>
/// <p>Before you can create page numbers that include chapter numbers, the document headings must have a numbered outline format applied.</p>
/// </remarks>
public ChapterPageSeparator ChapterPageSeparator { get; set; }

Use Case:

Document doc = new Document(fileName);
 
PageSetup pageSetup = doc.FirstSection.PageSetup;
 
pageSetup.PageNumberStyle = NumberStyle.UppercaseRoman;
pageSetup.ChapterPageSeparator = ChapterPageSeparator.Colon;
pageSetup.HeadingLevelForChapter = 1;

LoadOptions.ResourceLoadingCallback is no longer invoked for data URLs

Related issue: WORDSNET-23535

LoadOptions.ResourceLoadingCallback is no longer invoked for resources that are embedded as data URLs (for example, data:image/gif;base64,R0lGODlhEAAQAMQAAORH…). The reason is that these URLs do not reference external resources and are decoded in place.

LoadOptions.ResourceLoadingCallback is no longer invoked for empty URLs

Related issue: WORDSNET-23677

LoadOptions.ResourceLoadingCallback is no longer invoked for empty URLs (for example, ), because empty URLs don’t reference any external resource.

Slight changes in markup nodes typed collection

Related issue: WORDSNET-23774

The default indexer for markup nodes collection has been changed. Now it is the index number of a structured document tag in the collection.

/// <summary>
/// Returns the structured document tag at the specified index.
/// </summary>
/// <param name="index">An index into the collection.</param>
public IStructuredDocumentTag this[int index] { get; }

Along with this, it has become possible to remove a structured document tag at the specified index number, as well as remove a structured document tag by its identifier.

/// <summary>
/// Removes the structured document tag with the specified identifier.
/// </summary>
/// <param name="id">The structured document tag identifier.</param>
public void Remove(int id)
 
/// <summary>
/// Removes a structured document tag at the specified index.
/// </summary>
/// <param name="index">An index into the collection.</param>
public void RemoveAt(int index)

The functionality that the indexer has previously performed by ID is now available through GetById() method.

/// <summary>
/// Returns the structured document tag by identifier.
/// </summary>
/// <remarks>
/// <p>Returns null if the structured document tag with the specified identifier cannot be found.</p>
/// </remarks>
/// <param name="id">The structured document tag identifier.</param>
public IStructuredDocumentTag GetById(int id)

Use Case:

StructuredDocumentTags structuredDocumentTags = doc.Range.StructuredDocumentTags;
// We iterate through all collection elements, getting each element by its index number.
for (int i = 0; i < structuredDocumentTags.Count; i++)
{
    IStructuredDocumentTag sdt = structuredDocumentTags[i];
    Console.WriteLine(sdt.Title);
}
// Get the structured document tag by its Id.
sdt = structuredDocumentTags.GetById(1160505028);
if (sdt != null)
    Console.WriteLine(sdt.Title);   
// Remove the structured document tag by its Id.
structuredDocumentTags.Remove(1160505028);
// Remove the structured document tag at position 0.
structuredDocumentTags.RemoveAt(0);

Added “Number10Envelope” value to PaperSize enum

Related issue: WORDSNET-23505

Added support of “Envelope No. 10” page size for printing.

/// <summary>
/// Specifies paper size.
/// </summary>
public enum PaperSize
{
   /// <summary>
   /// 4.125 x 9.5 inches.
   /// </summary>
   Number10Envelope
}

Use Case:

// This value is used to set the page size as follows:
Document doc = new Document(fileName);
doc.FirstSection.PageSetup.PaperSize = PaperSize.Number10Envelope;
 
// Or in a similar way using DocumentBuilder:
DocumentBuilder builder = new DocumentBuilder(doc);
builder.PageSetup.PaperSize = PaperSize.Number10Envelope;

HtmlSaveOptions.ExportTextBoxAsSvg was marked as obsolete

Related issue: WORDSNET-23514

The HtmlSaveOptions.ExportTextBoxAsSvg property is now obsolete. The customers should use the HtmlSaveOptions.ExportShapesAsSvg, which affects text boxes as well.