Skip to main content

DevExpress v24.2 Update — Your Feedback Matters

Our What's New in v24.2 webpage includes product-specific surveys. Your response to our survey questions will help us measure product satisfaction for features released in this major update and help us refine our plans for our next major release.

Take the survey Not interested

How to: Get Coordinates of All Words in a Document

  • 2 minutes to read

The code sample below shows how to use the PdfDocumentProcessor.NextWord method to iterate all words in a document and retrieve their coordinates.

The PdfDocumentProcessor.NextWord method returns an PdfPageWord object. The Rectangles property returns a rectangle encompassing the current word.

Tip

The Rectangles property returns more than one PdfOrientedRectangle object when a part of a word is carried over to the next line. Use the Segments property to obtain information about each part of the word.

// Declare a list to store the word and its coordinates
List<Tuple<string, PdfOrientedRectangle>> WordCoordinates = new List<Tuple<string, PdfOrientedRectangle>>();
using (PdfDocumentProcessor processor = new PdfDocumentProcessor())
{
    processor.LoadDocument("Document.pdf");
    PdfPageWord currentWord = processor.NextWord();
    while (currentWord != null)
    {
        for (int i = 0; i < currentWord.Rectangles.Count; i++)
        {
            // Retrieve the number of the page on which the word
            // is located:
            int pageNumber = currentWord.PageNumber;

            // Retrieve the rectangle encompassing the word
            var wordRectangle = currentWord.Rectangles[i];

            // Add the segment's content and its coordinates to the list
            WordCoordinates.Add(new Tuple<string, PdfOrientedRectangle>(currentWord.Segments[i].Text, wordRectangle));
        }
        // Switch to the next word
        currentWord = processor.NextWord();
    }
}