The requested page is not available for the requested platform. You are viewing the content for .NET Framework 4.5.2+ platform.

How to: Extract Text from a Document

Important

You require a license to the DevExpress Office File API or DevExpress Universal Subscription to use these examples in production code. Refer to the DevExpress Subscription page for pricing information.

This tutorial describes how to extract the text of a PDF file at runtime using the PDF Document API.

To extract the text of a PDF file, do the following.

  1. Create a PdfDocumentProcessor.
  2. To open a PDF file, pass a stream that contains the document data to the PdfDocumentProcessor.LoadDocument method.
  3. After the document is loaded, you can extract its plain text using the PdfDocumentProcessor.Text property.

The following code implements this functionality.

string ExtractTextFromPDF(string filePath) {
    string documentText = "";
    try {
        using (PdfDocumentProcessor documentProcessor = new PdfDocumentProcessor()) {
            documentProcessor.LoadDocument(filePath);
            documentText = documentProcessor.Text;
        }
    }
    catch { }
    return documentText;
}
See Also