Recognition
Contents
[
Hide
]
Aspose.OCR for Java can extract text from a wide variety of file formats and media sources.
Extracting text from images
- Extracting text from an image
Reading text from raster images in JPEG, PNG, WBMP, BMP, and GIF formats. - Extracting text from multi-page TIFF
Reading text from multi-page TIFF images. - Extracting text from pixel array
Reading text from images, provided as an array of pixels. - Fast recognition
Reading images in fastest recognition mode that consumes minimum possible resources. - Recognizing single line
Reading text from images containing a single line of text. - Extracting text from receipts
Digitizing scanned receipts without manual retyping.
Extracting text from documents
- Extracting text from PDF document
Reading text from a PDF document that consists of scanned images without searchable text.
Extracting text from alternative media
- Batch recognition
Reading text from a list of raster images, folder, or ZIP archive. - Extracting text from URL
Reading text from raster images hosted on web sites.
Identifying recognition problems
Non-fatal recognition errors are stored as a list of strings in the warnings
property of the recognition result.
AsposeOCR api = new AsposeOCR();
RecognitionSettings recognitionSettings = new RecognitionSettings();
recognitionSettings.setLanguage(Language.Ukr);
RecognitionResult result = api.RecognizePage("source.png", recognitionSettings);
// Show recognition errors
result.warnings.forEach((w) -> {
System.out.println(w);
});