Aspose.Words for Java 22.4 Release Notes

Major Features

There are 71 improvements and fixes in this regular monthly release. The most notable are:

  • Added saving to PDFA-4 and several other improvements in PDF output.
  • Implemented reading of Photoshop metadata resolution in Jpeg images.
  • Provided an ability to manipulate with DrawingML chart legend entries.
  • Implemented an ability to specify the name of an xls/xlsx file the DrawingML chart is linked to.
  • Implemented a new mode of import HTML block-level elements.

Full List of Issues Covering all Changes in this Release (Reported by Java Users)

Key Summary Category
WORDSNET-23301 Consider providing API to access SDTs by id or name New Feature
WORDSNET-23210 Add feature to show/hide items in the chart’s legend New Feature
WORDSJAVA-2638 Wrong “reply to comment” author name after Document.compare Bug
WORDSJAVA-2665 Calculation of number fields with only dot separator (German Locale) is incorrect Bug
WORDSJAVA-2690 DOCX to PDF:  some characters are rotated Bug
WORDSJAVA-2693 Range.Replace fails on complicated regex. Bug
WORDSJAVA-2698 Basque Time Field gets modified incorrectly Bug
WORDSNET-21829 Aspose.Words.FileCorruptedException is thrown while loading DOCX Bug
WORDSNET-23566 Text becomes white after open/save DOCX document Bug
WORDSNET-18037 Document.Compare does not mimic MS Word Bug
WORDSNET-23586 ArgumentException upon setting bookmark text Bug
WORDSNET-23593 DOCX to PDF: Italic arabic characters not rendered properly Bug
WORDSNET-23412 Issue regarding conversion of Docx with continuous section break to Html Bug
WORDSNET-23409 UpdateFields throws System.NullReferenceException Bug
WORDSNET-23217 NullReferenceException is thrown upon UpdateFields Bug
WORDSNET-23465 Unexpected replacement w:hyperlink during document comparison Bug
WORDSNET-22602 Multilevel list renders incorrectly after DOCX to HTML conversion Bug

Full List of Issues Covering all Changes in this Release (Reported by .NET Users)

Key Summary Category
WORDSNET-23522 Provide public setter for Chart.SourceFullName property New Feature
WORDSNET-23547 New OpenXML File Format attribute for bulleted and numbered lists New Feature
WORDSNET-23594 Implement reading of Photoshop metadata resolution in Jpeg images New Feature
WORDSNET-23432 Implement column widths re-calculation for tables with more than 63 columns New Feature
WORDSNET-23475 Add saving to PDF/A-4 New Feature
WORDSNET-22888 Add loading progress notification for RTF loading Enhancement
WORDSNET-22889 Add loading progress notification for WML loading Enhancement
WORDSNET-23523 Performance test fails with great time excess Enhancement
WORDSNET-23560 Unsupported file format on loading ODT Bug
WORDSNET-23610 Line break is lost when re-saving a PDF Bug
WORDSNET-23622 Document model compatibility option value does not match MS Word UI Bug
WORDSNET-23275 Font in SmartArt diagram is smaller than in MS Word Bug
WORDSNET-22982 Table cell preferred does not match MS Word in Aspose.Words DOCX output Bug
WORDSNET-23616 Grid calculation fall-back is not detected for a nested table Bug
WORDSNET-22930 Incorrect charts rendering for round join style outline Bug
WORDSNET-22460 Square blue points in Chart are become round in the PDF Bug
WORDSNET-23156 Vertical table cell merge disappears on saving to DOCX and PDF Bug
WORDSNET-23631 System.ArgumentNullException: Value cannot be null Bug
WORDSNET-23603 REF field with relative position option is not localized when saving to PDF Bug
WORDSNET-23517 Images are scaled down when saving to XPS Bug
WORDSNET-23476 HtmlReader.HandleText method fails Bug
WORDSNET-23499 Inaccurate Arabic text on PDF import Bug
WORDSNET-23580 Formatting cannot be applied because the table is empty Bug
WORDSNET-23400 Incorrect line wrapping of a line with zero-width spaces Bug
WORDSNET-23401 Incorrect line wrapping with a symbolic font Bug
WORDSNET-23609 Comparison does not show changes between documents Bug
WORDSNET-23608 FCE on loading DOC Bug
WORDSNET-23538 Using Document.ExtractPages method causing list labels numbering issue Bug
WORDSNET-22915 The rotation of the horizontal axis labels is changed after converting to PDF Bug
WORDSNET-23602 DetectBackgroundColor fails with “InvalidOperationException: Sequence contains no elements” Bug
WORDSNET-23363 Improving DOCX to HtmlFixed conversion Bug
WORDSNET-23548 FootnoteDetector fails to find footnotes above a page number Bug
WORDSNET-23582 Issue with How to Define Default Options for ChartDataLabels of ChartSeries sample Bug
WORDSNET-23543 Legend entry text becomes hidden when updating font of a new/empty legend entry Bug
WORDSNET-22718 DOCX to HTML image not visible Bug
WORDSNET-23554 NullReferenceException when save document to PDF Bug
WORDSNET-23546 Incorrect color in chart when saving to PDF Bug
WORDSNET-22619 Sizes of Series Point Shapes in Combo Chart Increased during Word to PDF Conversion Bug
WORDSNET-22184 Cannot compile Xamarin.Mac project with Aspose.Words Bug
WORDSNET-23443 Problem after converting DOCX to PDF Bug
WORDSNET-14098 Left and Top margins of Div are not lost are re-saving Html Bug
WORDSNET-23506 Colored cell background issues on PDF import Bug
WORDSNET-23419 Large scan images are not removed from a Searchable PDF Bug
WORDSNET-23515 Aspose.Words.FileCorruptedException on loading DOCX document Bug
WORDSNET-19222 Compare generates incorrect result Bug
WORDSNET-23512 Tables are not merged Bug
WORDSNET-23459 Check how Aspose.Words works with .NET 6 performance Bug
WORDSNET-23353 Legend entry not removed when deleting chart series Bug
WORDSNET-23526 Revisions changed after adding CustomXmlPart Bug
WORDSNET-23458 Incorrect markup after appending documents Bug
WORDSNET-23442 System.NullReferenceException on UpdatePageLayout Bug
WORDSNET-23529 Aspose.Words hangs on document layout Bug
WORDSNET-23570 Aspose.Words does not work with .NET 6 Ready to Run option Bug
WORDSNET-23507 Table is distorted on PDF import Bug

Public API and Backward Incompatible Changes

This section lists public API changes that were introduced in Aspose.Words 22.4. It includes not only new and obsoleted public methods, but also a description of any changes in the behavior behind the scenes in Aspose.Words which may affect existing code. Any behavior introduced that could be seen as regression and modifies the existing behavior is especially important and is documented here.

Added saving to PDFA-4

Related issue: WORDSNET-23475

PDF/A-4 (ISO-19005-4:2020) is the latest version of PDF/A format. In PDF/A-4 conformance levels has been revised. Unlike previous versions PDF/A-4 do not provide A, B and U conformance levels. The regular PDF/A-4 conformance is equivalent to the level U conformance of previous versions (i.e. document visual preservation and text Unicode representation). Level A conformance (logical structure requirements) is removed as there is PDF/UA format related to this purpose.

New values added to PdfCompliance enum:

public enum PdfCompliance
{
    /// <summary>
    /// The output file will comply with the PDF/A-4 (ISO 19005-4:2020) standard.
    /// PDF/A-4 has the objective of preserving document static visual appearance over time, independent of the tools
    /// and systems used for creating, storing or rendering the files. Additionally any text contained in the document
    /// can be reliably extracted as a series of Unicode codepoints.
    /// </summary>
    PdfA4
}

Following options are prohibited when saving to PDF/A-4:

public class PdfSaveOptions
{
    /// <summary>
    /// Specifies whether to preserve Microsoft Word form fields as form fields in PDF or convert them to text.
    /// Default is <c>false</c>.
    /// </summary>
    /// <remarks>
    ...
    /// <para>Editable forms are prohibited by PDF/A compliance. <c>false</c> value will be used automatically
    /// when saving to PDF/A.</para>
    /// </remarks>
    public bool PreserveFormFields;
  
    /// <summary>
    /// Gets or sets the details for encrypting the output PDF document.
    /// </summary>
    /// <remarks>
    ...
    /// <para>Encryption is prohibited by PDF/A compliance. This option will be ignored when saving to PDF/A.</para>
    public PdfEncryptionDetails EncryptionDetails;
  
    /// <summary>
    /// Specifies the font embedding mode.
    /// </summary>
    /// <remarks>
    ...
    /// <para>PDF/A and PDF/UA compliance requires all fonts to be embedded.
    /// <see cref="PdfFontEmbeddingMode.EmbedAll"/> value will be used automatically when saving to
    /// PDF/A and PDF/UA.</para>
    /// </remarks>
    public PdfFontEmbeddingMode FontEmbeddingMode;
  
    /// <summary>
    /// Gets or sets a value determining whether or not to substitute TrueType fonts Arial, Times New Roman,
    /// Courier New and Symbol with core PDF Type 1 fonts.
    /// </summary>
    /// <remarks>
    ...
    /// <para>PDF/A and PDF/UA compliance requires all fonts to be embedded. <c>false</c> value will be used
    /// automatically when saving to PDF/A and PDF/UA.</para>
    /// </remarks>
    public bool UseCoreFonts;
  
    /// <summary>
    /// Gets or sets a value determining the way <see cref="Document.CustomDocumentProperties"/> are exported to PDF file.
    /// </summary>
    /// <remarks>
    ...
    /// <para><see cref="PdfCustomPropertiesExport.Metadata"/> value is not supported when saving to PDF/A.
    /// <see cref="PdfCustomPropertiesExport.Standard"/> will be used instead for PDF/A-1 and PDF/A-2 and
    /// <see cref="PdfCustomPropertiesExport.None"/> for PDF/A-4.</para>
    ...
    /// </remarks>
    public PdfCustomPropertiesExport CustomPropertiesExport;
  
    /// <summary>
    /// Specifies how the color space will be selected for the images in PDF document.
    /// </summary>
    /// <remarks>
    ...
    /// <para><see cref="PdfImageColorSpaceExportMode.SimpleCmyk"/> value is not supported when saving to PDF/A.
    /// <see cref="PdfImageColorSpaceExportMode.Auto"/> value will be used instead.</para>
    /// </remarks>
    public PdfImageColorSpaceExportMode ImageColorSpaceExportMode;
  
    /// <summary>
    /// A flag indicating whether image interpolation shall be performed by a conforming reader.
    /// When <c>false</c> is specified, the flag is not written to the output document and
    /// the default behaviour of reader is used instead.
    /// </summary>
    /// <remarks>
    ...
    /// <para>Interpolation flag is prohibited by PDF/A compliance. <c>false</c> value will be used automatically
    /// when saving to PDF/A.</para>
    /// </remarks>
    public bool InterpolateImages;
}

Implemented an ability to set Chart.SourceFullName property

Related issue: WORDSNET-23522.

Implemented an ability to specify the name of an xls/xlsx file the DrawingML chart is linked to:

/// <summary>
/// Gets the path and name of an xls/xlsx file this chart is linked to.
/// </summary>
public string SourceFullName { get; set; }

Use Case:

Document doc = new Document(fileName);
Shape shape = (Shape)doc.GetChild(NodeType.Shape, 0, true);
shape.Chart.SourceFullName = @"C:\Documents\ChartData.xlsx";
doc.Save(fileName);

Implemented a new mode of import HTML block-level elements

Related issue: WORDSNET-16334

New HTML loading option was added to HtmlLoadOptions class:

public class HtmlLoadOptions
{
    /// <summary>
    /// Gets or sets a value that specifies how properties of block-level elements are imported.
    /// Default value is <see cref="BlockImportMode.Merge"/>.
    /// </summary>
    public BlockImportMode BlockImportMode { get; set; }
}

New BlockImportMode enum specifies how properties of block-level elements are imported:

/// <summary>
/// Specifies how properties of block-level elements are imported from HTML-based documents.
/// </summary>
public enum BlockImportMode
{
    /// <summary>
    /// Properties of parent blocks are merged and stored on child elements (i.e. paragraphs or tables).
    /// </summary>
    /// <remarks>
    /// <para>
    /// Properties of parent blocks are merged as follows: margins are added together; borders of higher-level blocks
    /// are discarded and only the most inner-level borders are preserved. As a result, when this mode is specified,
    /// some formatting of blocks from the original document will be lost.
    /// </para>
    /// <para>
    /// On the other hand, since all merged block-level properties are stored on document nodes, all formating
    /// in the resulting document will be available for modification.
    /// </para>
    /// </remarks>
    Merge,
 
    /// <summary>
    /// Properties of parent blocks are imported to a special logical structure and are stored separately from
    /// document nodes.
    /// </summary>
    /// <remarks>
    /// <para>
    /// Only margins and borders of 'body', 'div', and 'blockquote' HTML elements are imported. Properties of each HTML
    /// element are stored individually.
    /// </para>
    /// <para>
    /// This mode allows to better preserve borders and margins seen in the HTML document and get better conversion
    /// results. The downside is that the resulting document gets harder to modify, since borders and margins stored
    /// in the logical structure are not available for editing.
    /// </para>
    /// <para>
    /// This mode mimics MS Word's behavior regarding import of block properties.
    /// </para>
    /// </remarks>
    Preserve
}

Use Case: The new mode of import HTML block-level elements allows to better preserve borders and margins seen in the HTML document and get better conversion results.

const string html = @"
<html>
    <div style='border:dotted'>
        <div style='border:solid'>
            <p>paragraph 1</p>
            <p>paragraph 2</p>
        </div>
    </div>
</html>";

HtmlLoadOptions loadOptions = new HtmlLoadOptions();

// Set the new mode of import HTML block-level elements.
loadOptions.BlockImportMode = BlockImportMode.Preserve;
MemoryStream stream = new MemoryStream(Encoding.UTF8.GetBytes(html));
Document doc = new Document(stream, loadOptions);
doc.Save("sample.docx");

Implemented chart legend entry API

Related issue: WORDSNET-23210.

The ChartLegendEntry and ChartLegendEntryCollection public classes have been implemented.

namespace Aspose.Words.Drawing.Charts
{
    /// <summary>
    /// Represents a chart legend entry.
    /// </summary>
    /// <remarks>
    /// A legend entry corresponds to a specific chart series or trendline.
    /// The text of the entry is the name of the series or trendline. The text cannot be changed.
    /// </remarks>
    public class ChartLegendEntry
    {
        /// <summary>
        /// Gets or sets a value indicating whether this entry is hidden in the chart legend.
        /// The default value is false.
        /// </summary>
        /// <remarks>
        /// When a chart legend entry is hidden, it does not affect the corresponding chart series or trendline that
        /// is still displayed on the chart.
        /// </remarks>
        public bool IsHidden { get; set; }
 
        /// <summary>
        /// Provides access to the font formatting of this legend entry.
        /// </summary>
        public Font Font { get; }
    }
 
    /// <summary>
    /// Represents a collection of chart legend entries.
    /// </summary>
    public class ChartLegendEntryCollection : IEnumerable<ChartLegendEntry>
    {
        /// <summary>
        /// Returns the number of ChartLegendEntry in this collection.
        /// </summary>
        public int Count { get; }
 
        /// <summary>
        /// Returns ChartLegendEntry for the specified index.
        /// </summary>
        public ChartLegendEntry this[int index] { get; }
    }
}

The LegendEntries public property has been added to the ChartLegend class.

/// <summary>
/// Returns a collection of legend entries for all series and trendlines of the parent chart.
/// </summary>
public ChartLegendEntryCollection LegendEntries { get; }

The LegendEntry public property has been added to the ChartSeries class.

/// <summary>
/// Gets a legend entry for this chart series.
/// </summary>
public ChartLegendEntry LegendEntry { get; }

The constructor of the ChartLegend class has been marked obsolete. It will not be possible to create instances of this class.

Use Case:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
 
Shape shape = builder.InsertChart(ChartType.Column, 432, 252);
 
Chart chart = shape.Chart;
ChartSeriesCollection series = chart.Series;

// Delete default generated series.
series.Clear();
 
string[] categories = new string[] { "AW Category 1", "AW Category 2" };
 
ChartSeries series1 = series.Add("Series 1", categories, new double[] { 1, 2 });
series.Add("Series 2", categories, new double[] { 3, 4 });
series.Add("Series 3", categories, new double[] { 5, 6 });
series.Add("Series 4", categories, new double[] { 0, 0 });
 
ChartLegendEntryCollection legendEntries = chart.Legend.LegendEntries;
legendEntries[3].IsHidden = true;
 
foreach (ChartLegendEntry legendEntry in legendEntries)
    legendEntry.Font.Size = 12;
 
series1.LegendEntry.Font.Italic = true;
 
doc.Save("output.docx");

Implemented typed collection for markup nodes

Related issue: WORDSNET-23301

Implemented interface exposing common properties for both StructuredDocumenTag and StructuredDocumentTagRangeStart/StructuredDocumentTagRangeEnd nodes.

public interface IStructuredDocumentTag
{
    /// <summary>
    /// Returns true if this instance is a ranged structured document tag.
    /// </summary>
    bool IsRanged();
 
    /// <summary>
    /// Returns Node object that implements this interface.
    /// </summary>
    Node StructuredDocumentTagNode();
 
    /// <summary>
    /// <para>Specifies a unique read-only persistent numerical Id for this <b>SDT</b>.</para>
    /// </summary>
    int Id { get; }
 
    /// <summary>
    /// Specifies a tag associated with the current SDT node.
    /// Can not be null.
    /// </summary>
    string Tag { get; set; }
 
    /// <summary>
    /// Specifies the friendly name associated with this <b>SDT</b>.
    /// Can not be null.
    /// </summary>
    string Title { get; set; }
 
    /// <summary>
    /// Gets the <see cref="BuildingBlock"/> containing placeholder text which should be displayed when this SDT run contents are empty,
    /// the associated mapped XML element is empty as specified via the <see cref="XmlMapping"/> element
    /// or the <see cref="IsShowingPlaceholderText"/> element is true.
    /// </summary>
    /// <remarks>Can be null, meaning that the placeholder is not applicable for this Sdt.</remarks>
    BuildingBlock Placeholder { get; }
 
    /// <summary>
    /// <para>Gets or sets Name of the <see cref="BuildingBlock"/> containing placeholder text.</para>
    /// <para>
    /// BuildingBlock with this name <see cref="BuildingBlock.Name"/> has to be present in the <see cref="Document.GlossaryDocument"/>
    /// otherwise <see cref="InvalidOperationException"/> will occur.</para>
    /// </summary>
    string PlaceholderName { get; set; }
 
    /// <summary>
    /// <para>
    /// Specifies whether the content of this <b>SDT</b> shall be interpreted to contain placeholder text
    /// (as opposed to regular text contents within the SDT).
    /// </para>
    /// <para>
    /// if set to true, this state shall be resumed (showing placeholder text) upon opening this document.
    /// </para>
    /// </summary>
    bool IsShowingPlaceholderText { get; set; }
 
    /// <summary>
    /// Gets the level at which this <b>SDT</b> occurs in the document tree.
    /// </summary>
    MarkupLevel Level { get; }
 
    /// <summary>
    /// Gets type of this <b>Structured document tag</b>.
    /// </summary>
    SdtType SdtType { get; }
 
    /// <summary>
    /// When set to true, this property will prohibit a user from deleting this <b>SDT</b>.
    /// </summary>
    bool LockContentControl { get; set; }
 
    /// <summary>
    /// When set to true, this property will prohibit a user from editing the contents of this <b>SDT</b>.
    /// </summary>
    bool LockContents { get; set; }
 
    /// <summary>
    /// Gets or sets the color of the structured document tag.
    /// </summary>
    System.Drawing.Color Color { get; set; }
 
    /// <summary>
    /// Gets an object that represents the mapping of this structured document tag to XML data
    /// in a custom XML part of the current document.
    /// </summary>
    /// <remarks>
    /// You can use the <see cref="Markup.XmlMapping.SetMapping(CustomXmlPart,string,string)"/> method of this object to map
    /// a structured document tag to XML data.
    /// </remarks>
    /// <dev>
    /// If this element is present and the parent Sdt is not of a rich text type, then the current
    /// value of the Sdt shall be determined by finding the XML element (if any) which is
    /// determined by the attributes on this element.
    /// See Iso29500, chapter 1, 17.5.2.6 dataBinding (XML Mapping).
    /// If DataBinding information does not result in an XML element, then the
    /// application can use any algorithm desired to find the closest available match. If this information does result in an
    /// XML element, then the contents of that element shall be used to replace the current run content within the
    /// document.
    /// </dev>
    XmlMapping XmlMapping { get; }
 
    /// <summary>
    /// Gets a string that represents the XML contained within the node in the <see cref="SaveFormat.FlatOpc"/> format.
    /// </summary>
    string WordOpenXML { get; }
}

Implemented typed collection of IStructuredDocumentTag.

public class StructuredDocumentTagCollection : IEnumerable<IStructuredDocumentTag>
{
    /// <summary>
    /// Returns the first structured document tag encountered in the collection with the specified title.
    /// </summary>
    /// <remarks>
    /// <p>Returns null if the structured document tag with the specified title cannot be found.</p>
    /// </remarks>
    /// <param name="title">The title of structured document tag.</param>
    public IStructuredDocumentTag GetByTitle(string title);
 
    /// <summary>
    /// Returns the first structured document tag encountered in the collection with the specified tag.
    /// </summary>
    /// <remarks>
    /// <p>Returns null if the structured document tag with the specified tag cannot be found.</p>
    /// </remarks>
    /// <param name="tag">The tag of the structured document tag.</param>
    public IStructuredDocumentTag GetByTag(string tag);
 
    /// <summary>
    /// Returns the number of structured document tags in the collection.
    /// </summary>
    public int Count { get; }
 
    /// <summary>
    /// Returns the structured document tag by Id.
    /// </summary>
    /// <param name="id">The structured document tag identifier.</param>
    public IStructuredDocumentTag this[int id] {get; }
 
    /// <summary>
    /// Removes the structured document tag with the specified identifier.
    /// </summary>
    /// <param name="id">The structured document tag identifier.</param>
    public void Remove(int id);
}

Added new property to Range class.

/// <summary>
/// Returns a <see cref="StructuredDocumentTags"/> collection that represents all structured document tags in the range.
/// </summary>
public StructuredDocumentTagCollection Range.StructuredDocumentTags { get; }

Use Case:

Document doc = new Document("some document with markup");
 
// Get the structured document tag by Id.
IStructuredDocumentTag sdt = doc.Range.StructuredDocumentTags[1160505028];
Console.WriteLine(sdt.IsRanged());
Console.WriteLine(sdt.Title);
 
// Get the structured document tag or ranged tag by Title.
sdt = doc.Range.StructuredDocumentTags.GetByTitle("Alias4");
Console.WriteLine(sdt.Id);

Document.UpdateTableLayout() method marked as obsolete

Related issue: WORDSNET-23539

UpdateTableLayout() method was an early attempt to reproduce MS Word logic for table column widths re-calculation without relying on table column widths stored data in the document. The method was mostly intended as an alternative way to re-calculate table layouts when relying on the stored column widths caused incorrect results (e.g. for generated documents with incorrect column widths stored by Aspose.Words itself). Though the method produced correct results for some cases, it never reproduced MS Word table layout logic entirely. As a result, applying the method to an arbitrary document could often produce incorrect results for tables that were handled correctly by the default method relying on the stored column widths.

So after the method was recommended to a customer, we often started to get requests about incorrect table layouts after applying the method. The customers were frustrated because some tables were handled correctly only with UpdateTableLayout() and some were handled correctly only without UpdateTableLayout().

Since then, much effort was invested into reproducing MS Word table layout logic without relying on stored column widths. It turned out that replacing the default logic for any arbitrary table and contents is not feasible. There are too many nuances with content metrics and different combinations of table/cell/container properties to take into account and be sure about the correct results. So a limited approach was adopted when Aspose.Words only replaces stored column widths after checking that everything that can influence table layout of a specific table is supported by the re-calculation algorithm. The class of the supported tables was widened significantly in release 22.2 and it will be further widened in the future.

As of now, the most common combinations of table content and table/cell properties are supported by the new approach. The new approach also replaces UpdateTableLayout() logic for the supported tables. So currently UpdateTableLayout() may produce different results only for tables not supported by the new table layout logic. As it is exactly the class of tables for which there are known issues with reproducing MS Word logic, it is highly likely that the correct table layout will not be produced by UpdateTableLayout() either.

Deprecating UpdateTableLayout() will clearly indicate that the method should not be used.