Aspose.Words for Java 21.7 Release Notes
Major Features
There are 94 improvements and fixes in this regular monthly release. The most notable are:
- Implemented rendering to PDF/A-2 format.
- Added an ability to work with Framesets.
- Introduced a new overload of DocumentBuilder.InsertHtml and a new enumeration HtmlInsertOptions.
- Provided new API for working with Fill patterns.
- Supported several exact date-time parse formats while loading JSON for LINQ Reporting Engine.
Full List of Issues Covering all Changes in this Release
Key | Summary | Category |
---|---|---|
WORDSNET-22279 | Implement Fill.Patterned method | New Feature |
WORDSNET-17557 | Provide way to get URL from Target attribute inside webSettings.xml.rels file | New Feature |
WORDSNET-21750 | Highlighted Content is not visible after exporting to PDF | New Feature |
WORDSNET-15400 | Implement maskPen rasterOp mode in InkML rendering | New Feature |
WORDSNET-22370 | Support several exact date-time parse formats while loading JSON | New Feature |
WORDSNET-15984 | Ink annotations are not aligned in PDF output | New Feature |
WORDSNET-21781 | Support using of arrays with data bands and LINQ extension methods for Java | New Feature |
WORDSNET-22159 | Implement rendering of brushes with maskPen rasterOp mode | New Feature |
WORDSNET-22009 | Make links clickable when converting CHM to HTML when the file pointing itself | New Feature |
WORDSNET-5971 | Support of PdfCompliance PDF/A-2 | New Feature |
WORDSNET-11488 | Add feature to support of PdfCompliance PDF/A-2a | New Feature |
WORDSNET-15836 | Add feature to support array type identifiers in template syntax | New Feature |
WORDSNET-20505 | Add support for PDF/A-2u, PDF/A-2a, PDF/A-3u, PDF/A-3a compliance | New Feature |
WORDSNET-15129 | Support of DOCX to PDF_A_2U conversion | New Feature |
WORDSNET-15126 | Support of DOCX to PDF_A_2B conversion | New Feature |
WORDSNET-15125 | Support of DOCX to PDF_A_2A conversion | New Feature |
WORDSNET-13778 | Add feature to support of PdfCompliance PDF/A-2b | New Feature |
WORDSNET-17478 | API needed to set the Scope attribute in Table in rendered PDF | Enhancement |
WORDSNET-21089 | Shapes are lost after RTF to PDF conversion | Enhancement |
WORDSNET-22397 | How to use “Mark as decorative” IsDecorative property | Enhancement |
WORDSNET-22255 | About Data Source Name in ReportingEngine.BuildReport Method | Enhancement |
WORDSNET-17726 | Notify user when he tries to load MOBI document but the document is AZW3 | Enhancement |
WORDSJAVA-2619 | Fix standard system colours in Java | Bug |
WORDSNET-21259 | Unwanted spaces are added after conversion from DOCX>HTML>DOCX | Bug |
WORDSNET-22312 | NullReferenceException when save attached document as PDF | Bug |
WORDSNET-22263 | ReportingEngine.BuildReport throws System.InvalidCastException | Bug |
WORDSNET-22416 | DOCX to PDF: Output file text not formated properly | Bug |
WORDSNET-21766 | Document.Compare generates incorrect revisions | Bug |
WORDSNET-16528 | Incorrect convert DOCX to PDF when docx in compatible mode | Bug |
WORDSNET-22404 | Incorrect page number for FieldPageRef | Bug |
WORDSNET-19592 | Exception is thrown when converting DOCX to HTML | Bug |
WORDSNET-21940 | HTML img’s base64 data is not getting converted to doc in Aspose.words java | Bug |
WORDSNET-18852 | Empty paragraphs added when inserting unordered list using DocumentBuilder.InsertHtml | Bug |
WORDSNET-22277 | OutOfMemoryException throws wnen extract pages and save it to PNG | Bug |
WORDSNET-21669 | Mobi file - Huffman compression is not yet supported for specific file | Bug |
WORDSNET-21963 | Value of attribute “lang” is invalid | Bug |
WORDSNET-22388 | NullPointerException when calling mailMerge.execute() | Bug |
WORDSNET-21988 | Incorrect page size during conversion of HTML with landscape orientation to DOCX | Bug |
WORDSNET-22228 | Text is pushed down to next pages after DOCX to PDF conversion | Bug |
WORDSNET-22357 | System.NullReferenceException occurs upon DOC to PDF conversion | Bug |
WORDSNET-22311 | Duplicate list item created when inserting multiple paragraph breaks using Range.Replace() | Bug |
WORDSNET-22321 | Replacing text containing a paragraph break is poorly represented with TrackRevisions in enabled | Bug |
WORDSNET-21500 | Image displays as red cross in converted documents | Bug |
WORDSNET-22387 | Content are lost after PDF to DOCX conversion | Bug |
WORDSNET-17077 | Row height increases after DOCX-HTML-DOCX roundtrip | Bug |
WORDSNET-19823 | Paragraph is pushed down to next page after DOCX>HTML>DOCX conversion | Bug |
WORDSNET-22303 | DOCX does not open in MS Word after after re-saving | Bug |
WORDSNET-22389 | System.InvalidCastException occurs upon loading DOC | Bug |
WORDSNET-20793 | DOCX to PDF conversion issue with PDF accessibility | Bug |
WORDSNET-22330 | Series lines are rendered incorrectly after converting to PDF | Bug |
WORDSNET-20708 | Wrong position of chart legend in output PNG file | Bug |
WORDSNET-22356 | Object reference not set to an instance of an object | Bug |
WORDSNET-22374 | File Corrupted Exception occurs upon loading a RTF Document | Bug |
WORDSNET-20964 | DOCX to PDF conversion issue with chart rendering | Bug |
WORDSNET-22358 | System.NullReferenceException occurs upon DOC to PDF conversion | Bug |
WORDSNET-22386 | Document is corrupted exception thrown while loading DOC | Bug |
WORDSNET-22293 | HTML to TXT conversion issue with table layout | Bug |
WORDSNET-22364 | Document word find and replace issue | Bug |
WORDSNET-21454 | FileCorruptedException is thrown during import HTML file | Bug |
WORDSNET-21137 | Image is not loaded from HTML if the “src” attribute value has leading or trailing whitespace characters | Bug |
WORDSNET-22333 | Paragraph border is rendered at different position when ExportDocumentStructure is used | Bug |
WORDSNET-18137 | Document.UpdateFields does not update TEMPLATE field | Bug |
WORDSNET-22360 | Field.Update does not update FieldHyperlink and show | Bug |
WORDSNET-22332 | Bullet symbol is lost after DOCX to PDF conversion | Bug |
WORDSNET-22355 | Incorrectly read SPRM:D62F. Expected 0, but read 11 bytes. | Bug |
WORDSNET-20958 | Page number in the footer is wrong in output TIFF | Bug |
WORDSNET-21944 | ODT to HTML | Frame’s bottom border is missing | Bug |
WORDSNET-22285 | Exception is thrown when loading MOBI file | Bug |
WORDSNET-22231 | Negative letter-spacing after conversion from PDF to HTML | Bug |
WORDSNET-21789 | Aspose.Words.FileCorruptedException error when converting MHTML | Bug |
WORDSNET-22233 | System.NullReferenceException is thrown when DOC is saved to PDF | Bug |
WORDSNET-20402 | HTML export issues | Bug |
WORDSNET-22152 | Tab stop in a list item gets considerably wider after conversion to HTML | Bug |
WORDSNET-21796 | DOCX to PDF/A conversion and validation fails: Several cases with header cells that are not tagged | Bug |
WORDSNET-21947 | DOCX to PDF/A conversion: accessibility validation fails: Bullet list items are broken into several tags | Bug |
WORDSNET-14245 | Document.Compare generates incorrect format revisions | Bug |
WORDSNET-22297 | Extra Text becomes Visible in PDF | Bug |
WORDSNET-22191 | Problem with nested tables in RTF content | Bug |
WORDSNET-21035 | Incorrect rendering of Clustered Column Type Chart in PDF | Bug |
WORDSNET-18229 | The title of the horizontal axis overlaps the axis labels | Bug |
WORDSNET-20135 | The Units are no longer aligned with the tick marks in Chart when rendered to PDF | Bug |
WORDSNET-17543 | Document.UpdateFields leaves INCLUDETEXT field with “Error! Not a valid filename.” | Bug |
WORDSNET-22262 | Track changes - Comments are shown in the outline | Bug |
WORDSNET-22083 | Accessibility issues are appeared after DOCX to PDF conversion | Bug |
WORDSNET-22242 | Accessibility tags not behaving properly in Aspose PDF compared to Acrobat/Word PDF | Bug |
WORDSNET-22190 | Table header tags are not exported after DOCX to PDF/a conversion | Bug |
WORDSNET-21161 | Table tag structure is incorrect after DOCX to PDF conversion | Bug |
WORDSNET-22223 | Transparent PNG image became non-transparent after DOCX to PDF conversion | Bug |
WORDSNET-21846 | DOCX to PDF (PdfA1a) conversion issue with transparent image rendering | Bug |
WORDSNET-17086 | PDF version support // Text effects are lost in PDFA_1B output | Bug |
WORDSNET-16968 | Saving to PDF with PdfCompliance.PdfA1a results in a large file | Bug |
WORDSNET-21123 | SmartArt drawing corruption during open save a DOCX | Bug |
WORDSNET-21127 | Images and and text content are changed after re-saving DOCX at Windows Server 2019 | Bug |
WORDSNET-17657 | Document.UpdateFields does not process FieldStyleRef.SuppressNonDelimiters | Bug |
Public API and Backward Incompatible Changes
This section lists public API changes that were introduced in Aspose.Words 21.7. It includes not only new and obsoleted public methods, but also a description of any changes in the behavior behind the scenes in Aspose.Words which may affect existing code. Any behavior introduced that could be seen as regression and modifies the existing behavior is especially important and is documented here.
Added new public property IsDecorative
Related issue: WORDSNET-22397
A new public property IsDecorative has been added to ShapeBase class.
/// <summary>
/// Gets or sets the flag that specifies whether the shape is decorative.
/// </summary>
/// <remarks>
/// Note that shape having not empty <see cref="ShapeBase.AlternativeText"/> cannot be decorative.
/// </remarks>
public bool IsDecorative { get; set; }
Use Case: Explains how to use IsDecorative property.
Document doc = new Document("input.docx");
Shape shape = doc.FirstSection.Body.Shapes[0];
shape.IsDecorative = true;
doc.Save("output.docx");
Implemented rendering to PDF/A-2 format
New values added to PdfCompliance enum.
public enum PdfCompliance
{
/// <summary>
/// The output file will comply with the PDF/A-2a standard.
/// This level includes all the requirements of PDF/A-2u and additionally requires
/// that document structure be included (also known as being "tagged"),
/// with the objective of ensuring that document content can be searched and repurposed.
/// </summary>
/// <remarks>
/// Note that exporting the document structure significantly increases the memory consumption, especially
/// for the large documents.
/// </remarks>
PdfA2a,
/// <summary>
/// The output file will comply with the PDF/A-2u standard.
/// PDF/A-2u has the objective of preserving document static visual appearance over time, independent of the tools
/// and systems used for creating, storing or rendering the files. Additionally any text contained in the document
/// can be reliably extracted as a series of Unicode codepoints.
/// </summary>
PdfA2u
}
PDF/A-2 is based on PDF-1.7 format and removes significant limitations of PDF/A-1 like prohibited transparency and prohibited object compression.
There are only PDF/A-2a and PDF/A-2u conformance levels and there are not PDF/A-2b because Aspose.Words regular output already conforms to PDF/A-2u level which is more strict than PDF/A-2b.
Please note that PDF/A-2 standard adds two new requirements to the content of the document in addition to the requirements of PDF/A-1:
- Both PDF/A-2a and PDF/A-2u standards prohibits the usage of .notdef glyph
- PDF/A-2a prohibits the usage of Unicode PUA codepoints. They are commonly used in symbolic fonts like “Symbol”, “Wingdings”, etc.
PdfCompliance.PdfA1a and PdfCompliance.PdfA1b are marked as obsolete. Aspose.Words PDF/A-2 output do not have limitations of PDF/A-1 output described above and is more consistent. So there is no reason to keep saving to PDF/A-1. However if you have requirements to save particularly to PDF/A-1 please report it and we may reconsider this decision.
Several PdfSaveOptions made prohibited when saving to PDF/A-1 and PDF/A-2
public class PdfSaveOptions
{
/// <summary>
/// Specifies whether to preserve Microsoft Word form fields as form fields in PDF or convert them to text.
/// Default is <c>false</c>.
/// </summary>
/// <remarks>
...
/// <para>This value is ignored when saving to PDF/A compliance because editable forms are prohibited.</para>
/// </remarks>
public bool PreserveFormFields;
/// <summary>
/// Gets or sets a value determining the way <see cref="Document.CustomDocumentProperties"/> are exported to PDF file.
/// </summary>
/// <remarks>
...
/// <para><see cref="PdfCustomPropertiesExport.Metadata"/> value is not supported when saving to PDF/A.</para>
/// </remarks>
public PdfCustomPropertiesExport CustomPropertiesExport;
/// <summary>
/// Gets or sets a value determining whether hyperlinks in the output Pdf document
/// are forced to be opened in a new window (or tab) of a browser.
/// </summary>
/// <remarks>
...
/// <para>Settings this value to <c>true</c> is not allowed when saving to PDF/A because JavaScript actions are prohibited.</para>
/// </remarks>
public bool OpenHyperlinksInNewWindow;
/// <summary>
/// Specifies how the color space will be selected for the images in PDF document.
/// </summary>
/// <remarks>
...
/// <para><see cref="PdfImageColorSpaceExportMode.SimpleCmyk"/> value is not supported when saving to PDF/A.</para>
/// </remarks>
public PdfImageColorSpaceExportMode ImageColorSpaceExportMode;
/// <summary>
/// A flag indicating whether image interpolation shall be performed by a conforming reader.
/// When <c>false</c> is specified, the flag is not written to the output document and
/// the default behaviour of reader is used instead.
/// </summary>
/// <remarks>
...
/// <para>Settings this value to <c>true</c> is not allowed when saving to PDF/A according to compliance requirements.</para>
/// </remarks>
public bool InterpolateImages;
}
A typo in a new enum name introduced in 21.6 corrected
“Aspose.Words.Layout.ContinuosSectionRestart” enum introduced in 21.6 was renamed to ContinuousSectionRestart in order to correct a typo.
Implemented Frameset API
Related issue: WORDSNET-17557
Implemented a new Frameset API for accessing and updating a frame URL. It is available through the new Frameset property of a document.
namespace Aspose.Words
{
public class Document
{
...
/// <summary>
/// Returns a <see cref="Frameset"/> instance if this document represents a frames page.
/// </summary>
/// <remarks>
/// If the document is not framed, the property has the <b>null</b> value.
/// </remarks>
public Frameset Frameset { get; }
}
}
namespace Aspose.Words.Framesets
{
/// <summary>
/// Represents a frames page or a single frame on a frames page.
/// </summary>
/// <remarks>
/// If the <see cref="ChildFramesets"/> property contains items, this instance is a frames page, otherwise it is
/// a single frame.
/// </remarks>
public class Frameset
{
/// <summary>
/// Gets or sets the web page URL or document file name to display in this frame.
/// </summary>
public string FrameDefaultUrl { get; set; }
/// <summary>
/// Gets or sets a value indicating whether the web page or document file name specified in the
/// <see cref="FrameDefaultUrl"/> property is an external resource the frame is linked with.
/// </summary>
public bool IsFrameLinkToFile { get; set; }
/// <summary>
/// Gets the collection of child frames and frames pages.
/// </summary>
public FramesetCollection ChildFramesets { get; }
}
/// <summary>
/// Represents a collection of instances of the <see cref="Frameset"/> class.
/// </summary>
public class FramesetCollection : IEnumerable<Frameset>
{
/// <summary>
/// Gets the number of frames/frames pages contained in the collection.
/// </summary>
public int Count { get; }
/// <summary>
/// Gets a frame or frames page at the specified index.
/// </summary>
public Frameset this[int index] { get; }
}
}
Use Case:
Document doc = new Document("input.docx");
doc.Frameset.ChildFramesets[0].FrameDefaultUrl = "https://www.aspose.com";
doc.Frameset.ChildFramesets[0].IsFrameLinkToFile = true;
doc.Save("output.docx");
Introduced a new overload of DocumentBuilder.InsertHtml and a new enumeration HtmlInsertOptions
Related issue: WORDSNET-18852
The following new overload of DocumentBuilder.InsertHtml has been implemented:
/// <summary>
/// Inserts an HTML string into the document. Allows to specify additional options.
/// </summary>
/// <param name="html">An HTML string to insert into the document.</param>
/// <param name="options">Options that are used when HTML string is inserted.</param>
/// <remarks>
/// You can use this method to insert an HTML fragment or whole HTML document.
/// </remarks>
public void InsertHtml(string html, HtmlInsertOptions options)
And the following public enumeration has been introduced:
/// <summary>
/// Specifies options for the <see cref="DocumentBuilder.InsertHtml(string, HtmlInsertOptions)"/> method.
/// </summary>
[Flags]
public enum HtmlInsertOptions
{
/// <summary>
/// Use the default options when inserting HTML.
/// </summary>
None = 0,
/// <summary>
/// Use font and paragraph formatting specified in <see cref="DocumentBuilder"/> as base formatting for text
/// inserted from HTML.
/// </summary>
/// <remarks>
/// <para>
/// If this option is not specified, formatting of <see cref="DocumentBuilder"/> is ignored and text is inserted
/// with default HTML formatting. As a result, the text looks as it is rendered in browsers.
/// </para>
/// <para>
/// If this option is specified, formatting of inserted text is based on formatting specified in
/// <see cref="DocumentBuilder"/>, and the text looks as if it were inserted using <see cref="DocumentBuilder.Write"/>.
/// </para>
/// </remarks>
UseBuilderFormatting = 1,
/// <summary>
/// Remove the empty paragraph that is normally inserted after HTML that ends with a block-level element.
/// </summary>
/// <remarks>
/// By default, <see cref="DocumentBuilder"/> makes sure that the last block-level element imported from HTML
/// is closed after import and inserts a paragraph break after the element. This paragraph break separates
/// content imported from HTML from content of the template document. However, if a HTML fragment is inserted into
/// an empty paragraph, that paragraph break will create an extra empty paragraph. If this behavior is undesired,
/// specify this option.
/// </remarks>
RemoveLastEmptyParagraph = 2
}
The old DocumentBuilder.InsertHtml overloads are now aliases for the new overload of DocumentBuilder.InsertHtml as follows:
builder.InsertHtml(html);
// Is equivalent to:
builder.InsertHtml(html, HtmlInsertOptions.None);
builder.InsertHtml(html, false);
// Is equivalent to:
builder.InsertHtml(html, HtmlInsertOptions.None);
builder.InsertHtml(html, true);
// Is equivalent to:
builder.InsertHtml(html, HtmlInsertOptions.UseBuilderFormatting);
Use Case: When DocumentBuilder.InsertHtml inserts a HTML fragment that ends with a block-level HTML element (for example, a paragraph or a list), it normally closes that block-level element and inserts a paragraph break. As a result, a new empty paragraph appears after inserted document. This behavior may be undesired when HTML fragments are inserted into a template document. For example, consider the following mail merge scenario.
// Default behavior.
builder.MoveToMergeField("NAME");
builder.InsertHtml("<p>John Smith</p>", true);
builder.MoveToMergeField("EMAIL");
builder.InsertHtml("<p>jsmith@example.com</p>", true);
In the resulting document, there will be an extra empty paragraph after each inserted HTML paragraph. However, if we specify HtmlInsertOptions.RemoveLastEmptyParagraph, those extra empty paragraphs will be removed.
// RemoveLastEmptyParagraph is specified.
builder.MoveToMergeField("NAME");
builder.InsertHtml("<p>John Smith</p>", HtmlInsertOptions.UseBuilderFormatting | HtmlInsertOptions.RemoveLastEmptyParagraph);
builder.MoveToMergeField("EMAIL");
builder.InsertHtml("<p>jsmith@example.com</p>", HtmlInsertOptions.UseBuilderFormatting | HtmlInsertOptions.RemoveLastEmptyParagraph);
Inside the resulting document the empty paragraphs will be removed.
Introduced FieldOptions.TemplateName property
Related issue: WORDSNET-18137
As a part of implementing WORDSNET-18137, we have introduced the FieldOptions.TemplateName property which is used to specify template file name for the TEMPLATE field:
/// <summary>
/// Gets or sets the file name of the template used by the document.
/// </summary>
/// <remarks>
/// <p>This property is used by the <see cref="FieldTemplate"/> field if the <see cref="Document.AttachedTemplate"/> property is empty.</p>
/// <p>If this property is empty the default template file name <c>Normal.dotm</c> is used.</p>
/// </remarks>
public string TemplateName { get; set; }
Use Case:
document.FieldOptions.TemplateName = @"C:\Users\AW\AppData\Roaming\Microsoft\Templates\Normal.dotm";
Public API for working with patterns has been introduced
Related issue: WORDSNET-22279
The following new public methods were added into the Fill class:
/// <summary>
/// Sets the specified fill to a pattern.
/// <param name="patternType"><see cref="Drawing.PatternType"/></param>
/// </summary>
public void Patterned(PatternType patternType)
/// <summary>
/// Sets the specified fill to a pattern.
/// <param name="patternType"><see cref="Drawing.PatternType"/></param>
/// <param name="foreColor">The color of the foreground fill.</param>
/// <param name="backColor">The color of the background fill.</param>
/// </summary>
public void Patterned(PatternType patternType, Color foreColor, Color backColor)
A new public property Fill.Pattern has been added:
/// <summary>
/// Gets a <see cref="Drawing.PatternType"/> for the fill.
/// </summary>
public PatternType Pattern { get; }
A new public enum has been introduced:
/// <summary>
/// Specifies the fill pattern to be used to fill a shape.
/// </summary>
public enum PatternType
Use Case: Explains how to get and apply a pattern to a fill.
// Open some document with a shape.
Document doc = new Document("DocWithShape.docx");
// Get Fill object for the first shape.
Fill fill = doc.FirstSection.Body.Shapes[0].Fill;
// Check Fill Pattern value.
Console.WriteLine("Pattern value is: {0}", fill.Pattern);
// Apply DiagonalBrick pattern to the shape fill.
fill.Patterned(PatternType.DiagonalBrick);
doc.Save("DiagonalBrick.docx");