Using pdfium to search a text in .NET

First, install these packages:

  • PDFiumSharpV2
  • PDFium.WindowsV2

The default wrapper does not expose methods for searching so we have to use directly the pdfium export methods:


string searchText = "sometext";

PdfDocument pdfDocument = new PdfDocument(File.ReadAllBytes(@"path to pdf"));
foreach (PdfPage pdfPage in pdfDocument.Pages)
{
    FPDF_TEXTPAGE fpdfTextPage = PDFium.FPDFText_LoadPage(pdfPage.Handle);
    FPDF_SCHHANDLE searchHandle = PDFium.FPDFText_FindStart(fpdfTextPage, searchText, SearchFlags.MatchWholeWord, 0);

    while (true)
    {
        int charNum = PDFium.FPDFText_GetSchCount(searchHandle);
        if (charNum > 0)
           break;
                    
        bool searchFound = PDFium.FPDFText_FindNext(searchHandle);
        if (!searchFound)
           break;
    }
                
    int searchIndex = PDFium.FPDFText_GetSchResultIndex(searchHandle);
                
    string text = PDFium.FPDFText_GetText(fpdfTextPage, searchIndex, searchText.Length);

    Console.WriteLine(text);

    PDFium.FPDFText_FindClose(searchHandle);

	/* Have to call this method for the GetRect to work */
	int cnt = PDFium.FPDFText_CountRects(fpdfTextPage, searchIndex, 1);

	if (PDFium.FPDFText_GetRect(fpdfTextPage, 0, out double left, out double top, out double right, out double bottom))
    {
    	text = PDFium.FPDFText_GetBoundedText(fpdfTextPage, left - 50, top, right, bottom);
        Console.WriteLine(text);
    }

}
pdfDocument.Close();

More documentation of the direct methods can be found here: https://pdfium.googlesource.com/pdfium/+/refs/heads/main/public/fpdf_text.h

Post a Comment

Previous Post Next Post