Convert iTextSharp Hyperlink from remote webpage to local page number

This post is in response to a comment here.

Let’s say you have a PDF with hyperlinks pointing to URLs like http://www.bing.com and you want to make these instead point to a page internal to the PDF. (Personally I can’t think of why this would be needed but someone apparently has this need.)

We’ll use the PDF annotation code that I posted on Stack Overflow here and modify it just a little bit. The code below is written in VB.Net 2010 and targets iTextSharp 5.1.2.0. See the individual code comments for specifics. If you have any questions you can leave a comment here but its probably faster to post your code and problems on Stack Overflow and just link to this post.

First, we’ll create some global variables to work with:

    ''//Folder that we are working in
    Private Shared ReadOnly WorkingFolder As String = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Hyperlinked PDFs")
    ''//Pdf with sample hyperlinks
    Private Shared ReadOnly BaseFile As String = Path.Combine(WorkingFolder, "Base.pdf")
    ''//Pdf with adjusted hyperlinks
    Private Shared ReadOnly FinalFile As String = Path.Combine(WorkingFolder, "Final.pdf")

Next we’ll create a sample PDF that we can modify URLs with later. Nothing really special here, should be self-explanatory hopefully.

    Private Shared Sub CreateSamplePdf()
        ''//Create our output directory if it does not exist
        Directory.CreateDirectory(WorkingFolder)

        ''//Create our sample PDF
        Using Doc As New iTextSharp.text.Document(PageSize.LETTER)
            Using FS As New FileStream(BaseFile, FileMode.Create, FileAccess.Write, FileShare.Read)
                Using writer = PdfWriter.GetInstance(Doc, FS)
                    Doc.Open()

                    ''//Turn our hyperlinks blue
                    Dim BlueFont As Font = FontFactory.GetFont("Arial", 12, iTextSharp.text.Font.NORMAL, iTextSharp.text.BaseColor.BLUE)

                    ''//Create 10 pages with simple labels on them
                    For I = 1 To 10
                        Doc.NewPage()
                        Doc.Add(New Paragraph(String.Format("Page {0}", I)))
                        ''//On the first page add some links
                        If I = 1 Then
                           ''//Add an external link
                            Doc.Add(New Paragraph(New Chunk("Go to website", BlueFont).SetAction(New PdfAction("http://www.bing.com/", False))))

                            ''//Go to a specific hard-coded page number
                            Doc.Add(New Paragraph(New Chunk("Go to page 5", BlueFont).SetAction(PdfAction.GotoLocalPage(5, New PdfDestination(0), writer))))
                        End If
                    Next
                    Doc.Close()
                End Using
            End Using
        End Using
    End Sub

Lastly we’ll write some code to modify all of the external hyperlinks. The key here is to update the annotation’s dictionary reference for /S. A remote URL has a /URI (NOTE: the letter I and not L, “eye” not “el”) and we need to change this to /GOTO. The second trick is that the destination (/D) is an array, of which the first item is an indirect reference to the page that you want to go to and the second item is a fitting option.

    Private Shared Sub ListPdfLinks()

        ''//Setup some variables to be used later
        Dim R As PdfReader
        Dim PageCount As Integer
        Dim PageDictionary As PdfDictionary
        Dim Annots As PdfArray

        ''//Open our reader
        R = New PdfReader(BaseFile)
        ''//Get the page cont
        PageCount = R.NumberOfPages

        ''//Loop through each page
        For I = 1 To PageCount
            ''//Get the current page
            PageDictionary = R.GetPageN(I)

            ''//Get all of the annotations for the current page
            Annots = PageDictionary.GetAsArray(PdfName.ANNOTS)

            ''//Make sure we have something
            If (Annots Is Nothing) OrElse (Annots.Length = 0) Then Continue For

            ''//Loop through each annotation
            For Each A In Annots.ArrayList

                ''//Convert the itext-specific object as a generic PDF object
                Dim AnnotationDictionary = DirectCast(PdfReader.GetPdfObject(A), PdfDictionary)

                ''//Make sure this annotation has a link
                If Not AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK) Then Continue For

                ''//Make sure this annotation has an ACTION
                If AnnotationDictionary.Get(PdfName.A) Is Nothing Then Continue For

                ''//Get the ACTION for the current annotation
                Dim AnnotationAction = DirectCast(AnnotationDictionary.Get(PdfName.A), PdfDictionary)

                ''//Test if it is a URI action. NOTE: URI and not URL
                If AnnotationAction.Get(PdfName.S).Equals(PdfName.URI) Then
                    ''//Remove the old action, I don't think this is actually necessary but I do it anyways
                    AnnotationAction.Remove(PdfName.S)
                    ''//Add a new action that is a GOTO action
                    AnnotationAction.Put(PdfName.S, PdfName.GOTO)
                    ''//The destination is an array containing an indirect reference to the page as well as a fitting option
                    Dim NewLocalDestination As New PdfArray()
                    ''//Link it to page 5
                    NewLocalDestination.Add(DirectCast(R.GetPageOrigRef(5), PdfObject))
                    ''//Set it to fit page
                    NewLocalDestination.Add(PdfName.FIT)
                    ''//Add the array to the annotation's destination (/D)
                    AnnotationAction.Put(PdfName.D, NewLocalDestination)
                End If
            Next
        Next

        ''//The above code modified an im-memory representation of a PDF, the code below writes these changes to disk
        Using FS As New FileStream(FinalFile, FileMode.Create, FileAccess.Write, FileShare.None)
            Using Doc As New Document()
                Using writer As New PdfCopy(Doc, FS)
                    Doc.Open()
                    For I = 1 To R.NumberOfPages
                        writer.AddPage(writer.GetImportedPage(R, I))
                    Next
                    Doc.Close()
                End Using
            End Using
        End Using
    End Sub