PdfDocumentProcessor.GetText(PdfDocumentArea, PdfTextExtractionOptions) Method
In This Article
Retrieves document content from the specified area with specified extraction options.
Namespace: DevExpress.Pdf
Assembly: DevExpress.Docs.v24.2.dll
NuGet Package: DevExpress.Document.Processor
#Declaration
public string GetText(
PdfDocumentArea area,
PdfTextExtractionOptions options
)
#Parameters
Name | Type | Description |
---|---|---|
area | Pdf |
The document area from which the content should be extracted. |
options | Pdf |
A Pdf |
#Returns
Type | Description |
---|---|
String | The text obtained from the specified area. |
#Remarks
The GetText method uses the page coordinate system. Refer to the following help topic for more details: Coordinate Systems.
Use the PdfTextExtractionOptions.ClipToCropBox property to extract content without clipping to the crop box.
The code sample below retrieves document content from the specified area:
using (DevExpress.Pdf.PdfDocumentProcessor processor = new DevExpress.Pdf.PdfDocumentProcessor())
{
processor.LoadDocument("TextExtraction.pdf");
PdfPage page = processor.Document.Pages[0];
PdfRectangle pdfRectangle = new PdfRectangle(page.CropBox.Left / 3, page.CropBox.Bottom, page.CropBox.Right / 3, page.CropBox.Top);
PdfDocumentArea pageArea = new PdfDocumentArea(1, pdfRectangle);
string pageText =
processor.GetText(pageArea, new PdfTextExtractionOptions { ClipToCropBox = false });
Console.WriteLine(pageText);
}
See Also