Defining the whitelist of characters
Limiting a subset of characters instead of using the full list for a selected language can greatly improve recognition accuracy, increase speed, and reduce resource consumption. A list of characters can be automatically identified from an image using the built-in Aspose.OCR mechanisms.
Predefined character sets
To define the predefined set of characters Aspose.OCR engine will look for, provide one of the following values in allowed_characters
property of recognition settings:
Subset | Action |
---|---|
characters_allowed_type::ALL | Try to recognize all characters. |
characters_allowed_type::LATIN_ALPHABET | Only recognize Latin / English text (A to Z and a to z ), without accented characters. |
characters_allowed_type::DIGITS | Recognize only binary, octal, decimal, or hexadecimal numbers (0-9 and A to F ). |
Characters that do not match the provided subset are ignored.
std::string image_path = "source.png";
const size_t len = 4096;
wchar_t buffer[len] = { 0 };
RecognitionSettings settings;
settings.ignoredCharacters = characters_allowed_type::DIGITS;
size_t res_len = aspose::ocr::page_settings(image_path.c_str(), buffer, len, settings);
std::wcout << buffer;
Custom characters list
You can specify your own list of characters to be recognized. To define the exact set of characters Aspose.OCR engine will look for, use one of the following methods:
page_abc()
,page_abc_all()
,page_abc_from_raw_bytes()
,page_abc_all_from_raw_bytes()
,page_rect_abc()
,page_rect_abc_from_raw_bytes()
,line_abc()
,line_abc_from_raw_bytes()
.
The characters are provided as a case-sensitive string. Characters that do not match the provided list are ignored.
std::string image_path = "source.png";
const size_t len = 4096;
wchar_t buffer[len] = { 0 };
size_t size = aspose::ocr::page_abc(image_path.c_str(), buffer, len, L"AÁBCDEÉFG12345");
std::wcout << buffer << L"\n";