Image recognition settings
Contents
[
Hide
]
Aspose.OCR for Java allows for very flexible customization of recognition accuracy, performance, and other settings by calling the methods of the RecognitionSettings
object.
These settings are applicable when extracting text from single-page raster images in JPEG, PNG, TIFF, BMP, and GIF formats.
Method | Parameter | Default state | Description |
---|---|---|---|
setAllowedCharacters |
Case-sensitive string of characters or one of the predefined character sets:
|
All characters from the selected recognition language. | The whitelist of characters Aspose.OCR engine will look for. |
setAutoContrast |
|
Disabled | Automatically increase the contrast of images before proceeding to recognition. |
setAutoDenoising |
|
Disabled | Automatically remove noise from images before proceeding to recognition. |
setAutoSkew |
|
Enabled | Automatically correct image tilt (deskew) before proceeding to recognition. |
setDetectAreas |
|
Enabled | Automatically select the optimal areas detection algorithm that suits the most common use cases. |
setDetectAreasMode |
DetectAreasMode |
Automatic | Manually override the default document areas detection method. |
setIgnoredCharacters |
Case-sensitive string of characters | All characters are recognized | A blacklist of characters that are ignored during recognition. |
setLanguage |
Recognition language | Extended Latin characters, including diacritics | Specify a language for recognition. |
setLinesFiltration |
|
Enabled | Set to true to recognize text in tables.Set to false to improve performance by ignoring table structures and treating tables as plain text. |
setPreprocessingFilters |
Image preprocessing filter | None | Apply image processing filters that enhance an image before it is sent to the OCR engine. |
setRecognitionAreas |
ArrayList<Rectangle> |
Entire image | List of areas of the image from which to extract text. |
setRecognizeSingleLine |
|
Disabled | Recognize a single-line image. Disables automatic document region detection. Improves the recognition performance of simple images. |
setSkew |
Skew angle, double |
0 | Manually rotate the image by the specified degree. |
setThreadsCount |
Number of threads, int |
Automatic | The number of CPU threads used for recognition. |
setThresholdValue |
Binarization threshold, int |
Automatic | Override the automatic binarization settings. |
setUpscaleSmallFont |
|
Disabled | Improve small font recognition and detection of dense lines. |
Applicable to
Example
The following code example shows how to fine-tune recognition:
// Create instance of OCR API
AsposeOCR api = new AsposeOCR();
// Specify recognition settings
RecognitionSettings recognitionSettings = new RecognitionSettings();
recognitionSettings.setAllowedCharacters(CharactersAllowedType.LATIN_ALPHABET);
recognitionSettings.setAutoDenoising(true);
recognitionSettings.setDetectAreasMode(DetectAreasMode.DOCUMENT);
recognitionSettings.setSkew(90);
// Extract text from image
RecognitionResult result = api.RecognizePage("source.png", recognitionSettings);
System.out.println("Recognition result:\n" + result.recognitionText + "\n\n");