How to: Get Coordinates of All Words in a Document
- 2 minutes to read
The code sample below shows how to use the PdfDocumentProcessor.NextWord method to iterate all words in a document and retrieve their coordinates.
The PdfDocumentProcessor.NextWord method returns an DevExpress.Pdf.PdfWord object. The PdfWord.Rectangles property returns a rectangle encompassing the current word.
Tip
The PdfWord.Rectangles property returns more than one DevExpress.Pdf.PdfOrientedRectangle object when a part of a word is carried over to the next line. Use the PdfWord.Segments property to obtain information about each part of the word.
//Declare a list to store the word and its coordinates
List<Tuple<string, PdfOrientedRectangle>> WordCoordinates = new List<Tuple<string, PdfOrientedRectangle>>();
using (PdfDocumentProcessor processor = new PdfDocumentProcessor())
{
processor.LoadDocument("Document.pdf");
PdfWord currentWord = processor.NextWord();
while (currentWord != null)
{
for (int i = 0; i < currentWord.Rectangles.Count; i++)
{
//Retrieve the rectangle encompassing the word
var wordRectangle = currentWord.Rectangles[i];
//Add the segment's content and its coordinates to the list
WordCoordinates.Add(new Tuple<string, PdfOrientedRectangle>(currentWord.Segments[i].Text, wordRectangle));
}
//Switch to the next word
currentWord = processor.NextWord();
}
}