PdfDocumentProcessor.GetText(PdfDocumentPosition, PdfDocumentPosition, PdfTextExtractionOptions) Method
Retrieves document content located between the specified document positions with specified extraction options.
Namespace: DevExpress.Pdf
Assembly: DevExpress.Docs.v24.2.dll
NuGet Package: DevExpress.Document.Processor
#Declaration
public string GetText(
PdfDocumentPosition startPosition,
PdfDocumentPosition endPosition,
PdfTextExtractionOptions options
)
#Parameters
Name | Type | Description |
---|---|---|
start |
Pdf |
The area’s start position. |
end |
Pdf |
The area’s end position. |
options | Pdf |
A Pdf |
#Returns
Type | Description |
---|---|
String | The text obtained from the specified area. |
#Remarks
The GetText method uses the page coordinate system. Refer to the following help topic for more details: Coordinate Systems.
If there is no text between the specified positions, this method returns text that is nearest to these positions.
The code sample below retrieves the content located between two positions on the first page:
using (DevExpress.Pdf.PdfDocumentProcessor processor = new DevExpress.Pdf.PdfDocumentProcessor())
{
processor.LoadDocument("TextExtraction.pdf");
PdfDocumentPosition startPosition = new PdfDocumentPosition(1, new PdfPoint(0, 0));
PdfDocumentPosition endPosition = new PdfDocumentPosition(1, new PdfPoint(500, 500));
string pageText =
processor.GetText(startPosition, endPosition, new PdfTextExtractionOptions { ClipToCropBox = false });
Console.WriteLine(pageText);
}