PdfDocumentProcessor.GetText(PdfDocumentArea, PdfTextExtractionOptions) Method
Retrieves document content from the specified area with specified extraction options.
Namespace: DevExpress.Pdf
Assembly: DevExpress.Docs.v24.1.dll
NuGet Package: DevExpress.Document.Processor
Declaration
Parameters
Name | Type | Description |
---|---|---|
area | PdfDocumentArea | The document area from which the content should be extracted. |
options | PdfTextExtractionOptions | A PdfTextExtractionOptions object that contains extraction options. |
Returns
Type | Description |
---|---|
String | The text obtained from the specified area. |
Remarks
The GetText method uses the page coordinate system. Refer to the following help topic for more details: Coordinate Systems.
Use the PdfTextExtractionOptions.ClipToCropBox property to extract content without clipping to the crop box.
The code sample below retrieves document content from the specified area:
using (DevExpress.Pdf.PdfDocumentProcessor processor = new DevExpress.Pdf.PdfDocumentProcessor())
{
processor.LoadDocument("TextExtraction.pdf");
PdfPage page = processor.Document.Pages[0];
PdfRectangle pdfRectangle = new PdfRectangle(page.CropBox.Left / 3, page.CropBox.Bottom, page.CropBox.Right / 3, page.CropBox.Top);
PdfDocumentArea pageArea = new PdfDocumentArea(1, pdfRectangle);
string pageText =
processor.GetText(pageArea, new PdfTextExtractionOptions { ClipToCropBox = false });
Console.WriteLine(pageText);
}
See Also