Analyzing your prompt, please hold on...
An error occurred while retrieving the results. Please refresh the page and try again.
In this article, we’ll look into the details of extracting text from a PDF file. All of these extraction features are provided at one place, in PdfExtractor class. We’ll see how to use these features in our code.
PdfExtractor class provides three types of extraction capabilities. These three categories are Text, Images and Attachments. In order to perform extraction under each of these three categories PdfExtractor provide various methods which work together to give you the final output.
For example, in order to extract text you can use three methods i.e. ExtractText, GetText, HasNextPageText and GetNextPageText. Now, in order to start extracting text, first of all, you need to call ExtractText method; this will extract the text from the PDF file and will store it into memory. After that, GetText method will take this extracted text and save on to the disk at specified location in a file. HasNextPageText helps you loop through each page and check whether the next page has any text or not. If it contains some text then GetNextPageText will help you save the text of an individual page into the file.
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ExtractText()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Text();
bool wholeText = true;
// Create an object of the PdfExtractor class
using (var pdfExtractor = new Aspose.Pdf.Facades.PdfExtractor())
{
// Bind PDF document
pdfExtractor.BindPdf(dataDir + "sample.pdf");
// ExtractText
pdfExtractor.ExtractText();
if (!wholeText)
{
pdfExtractor.GetText(dataDir + "sample.txt");
}
else
{
// Extract the text into separate files
int pageNumber = 1;
while (pdfExtractor.HasNextPageText())
{
pdfExtractor.GetNextPageText($"{dataDir}\\sample{pageNumber:D3}.txt");
pageNumber++;
}
}
}
}
To Extract the Text Extraction Mode use the following code:
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void ExtractTextExtractonMode()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_Text();
bool wholeText = true;
// Create an object of the PdfExtractor class
using (var pdfExtractor = new Aspose.Pdf.Facades.PdfExtractor())
{
// Bind PDF document
pdfExtractor.BindPdf(dataDir + "ExtractTextExtractonMode.pdf");
// ExtractText
// pdfExtractor.ExtractTextMode = 0; // pure mode
pdfExtractor.ExtractTextMode = 1; // raw mode
pdfExtractor.ExtractText();
if (!wholeText)
{
pdfExtractor.GetText(dataDir + "ExtractTextExtractonMode_out.txt");
}
else
{
// Extract the text into separate files
int pageNumber = 1;
while (pdfExtractor.HasNextPageText())
{
pdfExtractor.GetNextPageText($"{dataDir}\\sample{pageNumber:D3}.txt");
pageNumber++;
}
}
}
}
Analyzing your prompt, please hold on...
An error occurred while retrieving the results. Please refresh the page and try again.