Latest release (March 2023)

Deprecation warning

What was changed

Key Summary Category
OCRJAVA‑314
OCRJAVA‑315
A slimmer, faster and more straightforward API has been introduced. See Added public APIs for details. New feature
OCRJAVA‑314 Most of the existing API classes and methods have been marked as deprecated to remind you to update your existing code. They remain functional but will be removed in release 23.11.0 (November 2023) in favor of the new API introduced in this release. See Deprecated APIs for details. Enhancement

Public API changes and backwards compatibility

This section lists all public API changes introduced in Aspose.OCR for Java 23.3.0 that may affect the code of existing applications.

Added public APIs:

The following public APIs have been introduced in this release:

Aspose.OCR.OcrInput class

The universal class for providing any type of data (images, PDF documents, archives, folders, streams, arrays, and so on) to the new image processing and recognition methods.

Aspose.OCR.AsposeOcr.Recognize method

Recognize one or more files provided as an OcrInput object. It is a universal replacement for the following recognition methods:

Method Action
RecognizePage Extract text from a raster image, provided as file, memory stream, or a pixel array.
RecognizePdf Extract text from a PDF document.
RecognizeTiff Extract text from a multi-page TIFF image.
RecognizePageFromUri Recognize a file hosted on website without downloading it to the computer.
RecognizeMultiplePages Batch recognition.

Aspose.OCR.AsposeOcr.DetectRectangles method

Find areas of images containing text. It is an extended replacement for getTextAreas method.

Aspose.OCR.ImageProcessing class

Specially adjust one or more files to improve the accuracy and reliability of the OCR. This class provides extended replacements for Aspose.OCR.AsposeOcr.PreprocessImage method:

Method Action
Save(OcrInput images, string folderPath) Saves processed images to a folder. Replaces PreprocessImage method.
Render(OcrInput images) Processes files and returns an OcrInput object with adjusted images that can be passed to recognition methods.

Aspose.OCR.AsposeOcr.CalculateSkew method

Find out skew angles of provided images. It is an universal replacement for the following methods:

Method Action
CalcSkewImage Detect the skew angle of an image.
CalcSkewImageFromUri Detect the skew angle of an image hosted on website without downloading it to the computer.

AsposeOCRPdf.GetImages method

Extract individual images from a PDF document.

Updated public APIs:

No changes.

Removed public APIs:

No changes.

Deprecated APIs

The following public APIs have been marked as deprecated and will be removed in 23.11.0 (November 2023) release:

CalcSkewImage method

Replaced with Aspose.OCR.AsposeOcr.CalculateSkew method.

CalcSkewImageFromUri method

Replaced with Aspose.OCR.AsposeOcr.CalculateSkew method.

PreprocessImage method

Replaced with Render and Save methods of Aspose.OCR.ImageProcessing class.

RecognizePage method

Replaced with Aspose.OCR.AsposeOcr.Recognize method.

RecognizePageFromUri method

Replaced with Aspose.OCR.AsposeOcr.Recognize method.

RecognizeMultiplePages method

Replaced with Aspose.OCR.AsposeOcr.Recognize method.

RecognizePdf method

Replaced with Aspose.OCR.AsposeOcr.Recognize method.

RecognizeTiff method

Replaced with Aspose.OCR.AsposeOcr.Recognize method.

RecognizeLine method

Replaced with Aspose.OCR.AsposeOcr.Recognize method.

getTextAreas method

Replaced with Aspose.OCR.AsposeOcr.DetectRectangles method.

AsposeOcr(string alphabet) constructor

No longer required. Define the list of allowed characters through the setAllowedCharacters method of recognition setting instead.

RecognitionSettings.setDetectAreas method

No longer required. Disable the document areas detection or override the default detection mode with setDetectAreasMode(#allowedsymbols-recognition-setting) method of recognition setting instead.

RecognitionSettings.setAutoSkew method

No longer required. Enable automatic skew correction image processing filter instead.

RecognitionSettings.setSkew method

No longer required. Specify image rotation angle in image processing filters instead.

RecognitionSettings.setThresholdValue method

No longer required. Specify binarization threshold in image processing filters instead.

RecognitionSettings.setAutoContrast method

No longer required. Enable or disable automatic contrast adjustments in image processing filters instead.

RecognitionSettings.setAutoDenoising method

No longer required. Enable or disable automatic noise removal in image processing filters instead.

Usage examples

The examples below illustrates the changes introduced in this release:

Migrating to the new API

Original code (Aspose.OCR for Java 23.2.0 and below):

AsposeOcr api = new AsposeOcr();
// Correct geometric distortions
PreprocessingFilter filters = new PreprocessingFilter();
filters.add(PreprocessingFilter.AutoDewarping());
// Specify recognition settings
RecognitionSettings recognitionSettings = new RecognitionSettings();
recognitionSettings.setAutoDenoising(true);
recognitionSettings.setSkew(90);
recognitionSettings.setPreprocessingFilters(filters);
recognitionSettings.setLanguage(Language.Ukr);
// Extract text from image
RecognitionResult result = api.RecognizePage("source.png", recognitionSettings);
System.out.println("Recognition result:\n" + result.recognitionText + "\n\n");

New code (Aspose.OCR for Java 23.3.1 and above):

AsposeOCR api = new AsposeOCR();	
// Configure image processing
PreprocessingFilter filters = new PreprocessingFilter();
filters.add(PreprocessingFilter.AutoDewarping());
filters.add(PreprocessingFilter.AutoDenoising());
filters.add(PreprocessingFilter.Rotate(90));
// Specify recognition settings
RecognitionSettings recognitionSettings = new RecognitionSettings();
recognitionSettings.setLanguage(Language.Ukr);
// Add an image to OcrInput object and apply processing filters
OcrInput input  = new OcrInput(InputType.SingleImage, filters);
input.add("source.png");
// Extract text from any source data using a universal call
ArrayList<RecognitionResult> results =  api.Recognize(input, recognitionSettings);
System.out.println("Recognition result:\n" + results[0].recognitionText + "\n\n");

Process and save all images from PDF documents

// Set processing filters
PreprocessingFilter filters = new PreprocessingFilter();
filters.Add(PreprocessingFilter.AutoDewarping());
filters.add(PreprocessingFilter.ContrastCorrection());
// Add all PDF documents to OcrInput object and apply processing filters
OcrInput input = new OcrInput(InputType.PDF, filters);
input.Add("source1.pdf", 0, 3);
input.Add("source2.pdf");
// Save all images from provided PDFs to the folder
ImageProcessing.Save(input, "C://images");

Detect skew angles

AsposeOCR api = new AsposeOCR();	
// Add all PDF documents to OcrInput object
OcrInput input = new OcrInput(InputType.PDF);
input.Add("source1.pdf", 0, 3);
input.Add("source2.pdf");
// Detect skew angles
ArrayList<SkewOutput> angles = api.CalculateSkew(input);
for(SkewOutput x : angles) {
	System.out.println(x.Angle);
}