Microsoft’s Web Platform Installer is great but for some reason I prefer to install things the harder way. Here’s the official Microsoft link to download the Visual Studio 11 Express Beta for Web installer. Its still a bootstrap downloader but if I find an ISO I’ll update this post.
March 1, 2012
January 27, 2012
Convert iTextSharp Hyperlink from remote webpage to local page number
This post is in response to a comment here.
Let’s say you have a PDF with hyperlinks pointing to URLs like http://www.bing.com and you want to make these instead point to a page internal to the PDF. (Personally I can’t think of why this would be needed but someone apparently has this need.)
We’ll use the PDF annotation code that I posted on Stack Overflow here and modify it just a little bit. The code below is written in VB.Net 2010 and targets iTextSharp 5.1.2.0. See the individual code comments for specifics. If you have any questions you can leave a comment here but its probably faster to post your code and problems on Stack Overflow and just link to this post.
First, we’ll create some global variables to work with:
''//Folder that we are working in
Private Shared ReadOnly WorkingFolder As String = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Hyperlinked PDFs")
''//Pdf with sample hyperlinks
Private Shared ReadOnly BaseFile As String = Path.Combine(WorkingFolder, "Base.pdf")
''//Pdf with adjusted hyperlinks
Private Shared ReadOnly FinalFile As String = Path.Combine(WorkingFolder, "Final.pdf")
Next we’ll create a sample PDF that we can modify URLs with later. Nothing really special here, should be self-explanatory hopefully.
Private Shared Sub CreateSamplePdf()
''//Create our output directory if it does not exist
Directory.CreateDirectory(WorkingFolder)
''//Create our sample PDF
Using Doc As New iTextSharp.text.Document(PageSize.LETTER)
Using FS As New FileStream(BaseFile, FileMode.Create, FileAccess.Write, FileShare.Read)
Using writer = PdfWriter.GetInstance(Doc, FS)
Doc.Open()
''//Turn our hyperlinks blue
Dim BlueFont As Font = FontFactory.GetFont("Arial", 12, iTextSharp.text.Font.NORMAL, iTextSharp.text.BaseColor.BLUE)
''//Create 10 pages with simple labels on them
For I = 1 To 10
Doc.NewPage()
Doc.Add(New Paragraph(String.Format("Page {0}", I)))
''//On the first page add some links
If I = 1 Then
''//Add an external link
Doc.Add(New Paragraph(New Chunk("Go to website", BlueFont).SetAction(New PdfAction("http://www.bing.com/", False))))
''//Go to a specific hard-coded page number
Doc.Add(New Paragraph(New Chunk("Go to page 5", BlueFont).SetAction(PdfAction.GotoLocalPage(5, New PdfDestination(0), writer))))
End If
Next
Doc.Close()
End Using
End Using
End Using
End Sub
Lastly we’ll write some code to modify all of the external hyperlinks. The key here is to update the annotation’s dictionary reference for /S. A remote URL has a /URI (NOTE: the letter I and not L, “eye” not “el”) and we need to change this to /GOTO. The second trick is that the destination (/D) is an array, of which the first item is an indirect reference to the page that you want to go to and the second item is a fitting option.
Private Shared Sub ListPdfLinks()
''//Setup some variables to be used later
Dim R As PdfReader
Dim PageCount As Integer
Dim PageDictionary As PdfDictionary
Dim Annots As PdfArray
''//Open our reader
R = New PdfReader(BaseFile)
''//Get the page cont
PageCount = R.NumberOfPages
''//Loop through each page
For I = 1 To PageCount
''//Get the current page
PageDictionary = R.GetPageN(I)
''//Get all of the annotations for the current page
Annots = PageDictionary.GetAsArray(PdfName.ANNOTS)
''//Make sure we have something
If (Annots Is Nothing) OrElse (Annots.Length = 0) Then Continue For
''//Loop through each annotation
For Each A In Annots.ArrayList
''//Convert the itext-specific object as a generic PDF object
Dim AnnotationDictionary = DirectCast(PdfReader.GetPdfObject(A), PdfDictionary)
''//Make sure this annotation has a link
If Not AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK) Then Continue For
''//Make sure this annotation has an ACTION
If AnnotationDictionary.Get(PdfName.A) Is Nothing Then Continue For
''//Get the ACTION for the current annotation
Dim AnnotationAction = DirectCast(AnnotationDictionary.Get(PdfName.A), PdfDictionary)
''//Test if it is a URI action. NOTE: URI and not URL
If AnnotationAction.Get(PdfName.S).Equals(PdfName.URI) Then
''//Remove the old action, I don't think this is actually necessary but I do it anyways
AnnotationAction.Remove(PdfName.S)
''//Add a new action that is a GOTO action
AnnotationAction.Put(PdfName.S, PdfName.GOTO)
''//The destination is an array containing an indirect reference to the page as well as a fitting option
Dim NewLocalDestination As New PdfArray()
''//Link it to page 5
NewLocalDestination.Add(DirectCast(R.GetPageOrigRef(5), PdfObject))
''//Set it to fit page
NewLocalDestination.Add(PdfName.FIT)
''//Add the array to the annotation's destination (/D)
AnnotationAction.Put(PdfName.D, NewLocalDestination)
End If
Next
Next
''//The above code modified an im-memory representation of a PDF, the code below writes these changes to disk
Using FS As New FileStream(FinalFile, FileMode.Create, FileAccess.Write, FileShare.None)
Using Doc As New Document()
Using writer As New PdfCopy(Doc, FS)
Doc.Open()
For I = 1 To R.NumberOfPages
writer.AddPage(writer.GetImportedPage(R, I))
Next
Doc.Close()
End Using
End Using
End Using
End Sub
January 6, 2012
How to recompress images in a PDF using iTextSharp
(I originally posted this on Stack Overflow)
iText and iTextSharp have some methods for replacing indirect objects. Specifically there’s PdfReader.KillIndirect() which does what it says and PdfWriter.AddDirectImageSimple(iTextSharp.text.Image, PRIndirectReference) which you can then use to replace what you killed off.
In pseudo C# code you’d do:
var oldImage = PdfReader.GetPdfObject(); var newImage = YourImageCompressionFunction(oldImage); PdfReader.KillIndirect(oldImage); yourPdfWriter.AddDirectImageSimple(newImage, (PRIndirectReference)oldImage);
Below is a full working C# 2010 WinForms app targeting iTextSharp 5.1.1.0. It takes an existing JPEG on your desktop called “LargeImage.jpg” and creates a new PDF from it. Then it opens the PDF, extracts the image, physically shrinks it to 90% of the original size, applies 85% JPEG compression and writes it back to the PDF. See the comments in the code for more of an explanation. The code needs lots more null/error checking. Also looks for NOTE comments where you’ll need to expand to handle other situations.
using System;
using System.Drawing;
using System.Drawing.Imaging;
using System.Drawing.Drawing2D;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1 {
public partial class Form1 : Form {
public Form1() {
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e) {
//Our working folder
string workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
//Large image to add to sample PDF
string largeImage = Path.Combine(workingFolder, "LargeImage.jpg");
//Name of large PDF to create
string largePDF = Path.Combine(workingFolder, "Large.pdf");
//Name of compressed PDF to create
string smallPDF = Path.Combine(workingFolder, "Small.pdf");
//Create a sample PDF containing our large image, for demo purposes only, nothing special here
using (FileStream fs = new FileStream(largePDF, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document doc = new Document()) {
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
iTextSharp.text.Image importImage = iTextSharp.text.Image.GetInstance(largeImage);
doc.SetPageSize(new iTextSharp.text.Rectangle(0, 0, importImage.Width, importImage.Height));
doc.SetMargins(0, 0, 0, 0);
doc.NewPage();
doc.Add(importImage);
doc.Close();
}
}
}
//Now we're going to open the above PDF and compress things
//Bind a reader to our large PDF
PdfReader reader = new PdfReader(largePDF);
//Create our output PDF
using (FileStream fs = new FileStream(smallPDF, FileMode.Create, FileAccess.Write, FileShare.None)) {
//Bind a stamper to the file and our reader
using (PdfStamper stamper = new PdfStamper(reader, fs)) {
//NOTE: This code only deals with page 1, you'd want to loop more for your code
//Get page 1
PdfDictionary page = reader.GetPageN(1);
//Get the xobject structure
PdfDictionary resources = (PdfDictionary)PdfReader.GetPdfObject(page.Get(PdfName.RESOURCES));
PdfDictionary xobject = (PdfDictionary)PdfReader.GetPdfObject(resources.Get(PdfName.XOBJECT));
if (xobject != null) {
PdfObject obj;
//Loop through each key
foreach (PdfName name in xobject.Keys) {
obj = xobject.Get(name);
if (obj.IsIndirect()) {
//Get the current key as a PDF object
PdfDictionary imgObject = (PdfDictionary)PdfReader.GetPdfObject(obj);
//See if its an image
if (imgObject.Get(PdfName.SUBTYPE).Equals(PdfName.IMAGE)) {
//NOTE: There's a bunch of different types of filters, I'm only handing the simplest one here which is basically raw JPG, you'll have to research others
if (imgObject.Get(PdfName.FILTER).Equals(PdfName.DCTDECODE)) {
//Get the raw bytes of the current image
byte[] oldBytes = PdfReader.GetStreamBytesRaw((PRStream)imgObject);
//Will hold bytes of the compressed image later
byte[] newBytes;
//Wrap a stream around our original image
using (MemoryStream sourceMS = new MemoryStream(oldBytes)) {
//Convert the bytes into a .Net image
using (System.Drawing.Image oldImage = Bitmap.FromStream(sourceMS)) {
//Shrink the image to 90% of the original
using (System.Drawing.Image newImage = ShrinkImage(oldImage, 0.9f)) {
//Convert the image to bytes using JPG at 85%
newBytes = ConvertImageToBytes(newImage, 85);
}
}
}
//Create a new iTextSharp image from our bytes
iTextSharp.text.Image compressedImage = iTextSharp.text.Image.GetInstance(newBytes);
//Kill off the old image
PdfReader.KillIndirect(obj);
//Add our image in its place
stamper.Writer.AddDirectImageSimple(compressedImage, (PRIndirectReference)obj);
}
}
}
}
}
}
}
this.Close();
}
//Standard image save code from MSDN, returns a byte array
private static byte[] ConvertImageToBytes(System.Drawing.Image image, long compressionLevel) {
if (compressionLevel < 0) {
compressionLevel = 0;
} else if (compressionLevel > 100) {
compressionLevel = 100;
}
ImageCodecInfo jgpEncoder = GetEncoder(ImageFormat.Jpeg);
System.Drawing.Imaging.Encoder myEncoder = System.Drawing.Imaging.Encoder.Quality;
EncoderParameters myEncoderParameters = new EncoderParameters(1);
EncoderParameter myEncoderParameter = new EncoderParameter(myEncoder, compressionLevel);
myEncoderParameters.Param[0] = myEncoderParameter;
using (MemoryStream ms = new MemoryStream()) {
image.Save(ms, jgpEncoder, myEncoderParameters);
return ms.ToArray();
}
}
//standard code from MSDN
private static ImageCodecInfo GetEncoder(ImageFormat format) {
ImageCodecInfo[] codecs = ImageCodecInfo.GetImageDecoders();
foreach (ImageCodecInfo codec in codecs) {
if (codec.FormatID == format.Guid) {
return codec;
}
}
return null;
}
//Standard high quality thumbnail generation from http://weblogs.asp.net/gunnarpeipman/archive/2009/04/02/resizing-images-without-loss-of-quality.aspx
private static System.Drawing.Image ShrinkImage(System.Drawing.Image sourceImage, float scaleFactor) {
int newWidth = Convert.ToInt32(sourceImage.Width * scaleFactor);
int newHeight = Convert.ToInt32(sourceImage.Height * scaleFactor);
var thumbnailBitmap = new Bitmap(newWidth, newHeight);
using (Graphics g = Graphics.FromImage(thumbnailBitmap)) {
g.CompositingQuality = CompositingQuality.HighQuality;
g.SmoothingMode = SmoothingMode.HighQuality;
g.InterpolationMode = InterpolationMode.HighQualityBicubic;
System.Drawing.Rectangle imageRectangle = new System.Drawing.Rectangle(0, 0, newWidth, newHeight);
g.DrawImage(sourceImage, imageRectangle);
}
return thumbnailBitmap;
}
}
}
October 13, 2011
Speed up iTextSharp’s PdfReader when reading multiple files or one very large file
Most users of iTextSharp’s PdfReader are used to using the constructor that takes a single string representing a file path. For small files or only a couple of files this is fine but if you have a document with a large number of pages or just a large number of documents then you might run into some performance programs.
Luckily there’s already a built-in albeit non-obvious solution to the problem : iTextSharp.text.pdf.RandomAccessFileOrArray. When you create a PdfReader using the PdfReader(string) constructor you are actually creating one of these behind the scenes, just not an optimal one. The default one basically sets up a standard FileStream object that reads your file, nothing too special. But there’s an overload called RandomAccessFileOrArray(string fileName, bool forceRead) that will (generally) give you a giant performance boost if you pass true to the second parameter. When forceRead is true the entire file that you are reading will be read into memory as a byte array. You can understand why the default is false, hopefully. But if you’ve got a fairly modern machine you should hopefully have enough memory to be able to take advantage of this overload. Obviously test this and stress test this in a product environment. One person loading a 500MB file into memory isn’t a big deal but 100 people doing it is.
Below is a proof-of-concept WinForms app targeting iTextSharp 5.1.1.0. Just create a blank C# WinForms app (VS2010) and paste this into the source. Modify the variables at the top to your liking for testing. On my machine, the regular PdfReader constructor takes about 22 seconds for 4,000 files and between 1 and 2 seconds using a RandomAccessFileOrArray.
using System;
using System.Diagnostics;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Threading;
using System.Windows.Forms;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
//Location to create temporary files. NOTE: This folder will get DELETED when cleaned up!
private readonly string workingFolder = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Many Files");
//Number of test files to create
private readonly int fileCount = 4000;
//Maximum number of pages in each test file
private readonly int maxNumberOfPages = 20;
//Will hold our threads
private Thread tw;
private Thread tm;
private void Form1_Load(object sender, EventArgs e)
{
//Resize the main form
this.Width = 350;
this.Height = 150;
//Create various buttons
var btn1 = new Button();
btn1.Text = "Create sample files";
btn1.Click += (a, b) => BtnClick_CreateSampleFiles();
btn1.Location = new Point(0, 0);
btn1.Width = 150;
this.Controls.Add(btn1);
var btn2 = new Button();
btn2.Text = "Count pages old way";
btn2.Click += (a, b) => BtnClick_CountPages_Slow();
btn2.Location = new Point(0, 25);
btn2.Width = 150;
this.Controls.Add(btn2);
var btn3 = new Button();
btn3.Text = "Count pages new way";
btn3.Click += (a, b) => BtnClick_CountPages_Fast();
btn3.Location = new Point(150, 25);
btn3.Width = 150;
this.Controls.Add(btn3);
var btn4 = new Button();
btn4.Text = "Clean up";
btn4.Click += (a, b) => CleanUp(true);
btn4.Location = new Point(0, 50);
btn4.Width = 150;
this.Controls.Add(btn4);
var pbFileCreated = new ProgressBar();
pbFileCreated.Name = "pbFileCreated";
pbFileCreated.Location = new Point(0, 75);
pbFileCreated.Width = 300;
this.Controls.Add(pbFileCreated);
}
/// <summary>
/// Enable/Disable buttons on the main form
/// </summary>
private void SetFormState(bool enabled){
//If we are called outside of the main UI thread then we need to invoke into it
if (this.InvokeRequired){
this.Invoke(new MethodInvoker(delegate() { SetFormState(enabled); }));
}else{
//Disable all buttons
foreach (Control c in this.Controls){
if (c is Button) c.Enabled = enabled;
}
}
}
#region Button Click Events
private void BtnClick_CreateSampleFiles(){
//Disable the UI
SetFormState(false);
//Create a thread to do our work
tw = new Thread(new ThreadStart(this.CreateSampleFiles));
//Start the thread
tw.Start();
//Create a thread to monitor our progress
tm = new Thread(new ThreadStart(this.Monitor));
//Start the thread
tm.Start();
}
private void BtnClick_CountPages_Slow(){
//Disable the UI
SetFormState(false);
//Create a thread to do our work
tw = new Thread(new ThreadStart(this.CountPages_Slow));
//Start the thread
tw.Start();
//Create a thread to monitor our progress
tm = new Thread(new ThreadStart(this.Monitor));
tm.Start();
}
private void BtnClick_CountPages_Fast(){
//Disable the UI
SetFormState(false);
//Create a thread to do our work
tw = new Thread(new ThreadStart(this.CountPages_Fast));
//Start the thread
tw.Start();
tm = new Thread(new ThreadStart(this.Monitor));
//Create a thread to monitor our progress
tm.Start();
}
#endregion
#region Monitor And ProgressBar
/// <summary>
/// Used to monitor the progress of the worker thread so that we know when to re-enable the form's UI
/// </summary>
private void Monitor()
{
while (tw != null && tw.ThreadState == System.Threading.ThreadState.Running)
{
Thread.Sleep(250);
}
SetFormState(true);
}
/// <summary>
/// Called from various methods on various threads to update the main progress bar
/// </summary>
private void updatePB(int value, int max){
//Get the progress bar, there should only be only
var pb = (ProgressBar)this.Controls.Find("pbFileCreated", false)[0];
//See if we are on another thread
if (pb.InvokeRequired){
//If so, have the main thread invoke our method with the same paremeters for us
pb.Invoke(new MethodInvoker(delegate() { updatePB(value, max); }));
}else{
//Otherwise update the progress bar's values
pb.Maximum = fileCount;
pb.Value = value;
}
}
#endregion
private void CreateSampleFiles(){
//Just in case, erase current files
CleanUp(false);
//Create our output directory
Directory.CreateDirectory(workingFolder);
//Placeholder for our random number of pages to create
int pageCount;
//Random number generator
Random r = new Random();
//Loop through each file that we need to create
for (int i = 1; i <= fileCount; i++){
//Ever 100 files update the main progress bar
if (i % 100 == 0){
updatePB(i, fileCount);
}
//Create our temporary PDF
using (FileStream fs = new FileStream(Path.Combine(workingFolder, String.Format("{0}.pdf", i.ToString().PadLeft(8, '0'))), FileMode.Create, FileAccess.Write, FileShare.None)){
using (Document doc = new Document(PageSize.LETTER)){
using (PdfWriter w = PdfWriter.GetInstance(doc, fs)){
doc.Open();
//Get a random number of pages to create
pageCount = r.Next(1, maxNumberOfPages + 1);
for (int j = 1; j <= pageCount; j++){
//Add a page
doc.NewPage();
//Add some content on the page, just to give the page a little "weight"
doc.Add(new Paragraph(String.Format("File {0}, Page {1}", i, j)));
}
doc.Close();
}
}
}
}
//Give an alert to let people know we're done
MessageBox.Show(String.Format("Created {0} Files", fileCount));
}
/// <summary>
/// Clean up the files we created by erasing the entire directory
/// </summary>
/// <param name="msg">Whether to show a message alerting when done</param>
private void CleanUp(bool msg){
if (Directory.Exists(workingFolder)){
Directory.Delete(workingFolder, true);
}
if (msg){
MessageBox.Show("Test files deleted");
}
}
/// <summary>
/// Make sure we have a working folder and the correct number of files in it
/// </summary>
private bool SanityCheck()
{
if (!Directory.Exists(workingFolder)){
MessageBox.Show("Folder not found, please create first");
return false;
}
if (Directory.EnumerateFiles(workingFolder, "*.pdf").Count() != fileCount){
MessageBox.Show("Not enough files exist in source folder, please create files before using.");
return false;
}
return true;
}
private void CountPages_Slow(){
//Make sure we've got files to work with
if (!SanityCheck()) return;
//Create a timer
var st = new Stopwatch();
//Start it
st.Start();
//Get our files
var files = Directory.EnumerateFiles(workingFolder, "*.pdf");
//Total number of pages found
int totalPageCount = 0;
//Used to update the progress bar
int i = 0;
int localFileCount = files.Count();
//Loop through each file
foreach (string f in files){
//This is a total perf hit but the differences between the two methods is so great it doesn't really matter
//Every 100 pages update the progress bar
i++;
if (i % 100 == 0){
updatePB(i, localFileCount);
}
//Add the page count to the total
totalPageCount += new PdfReader(f).NumberOfPages;
}
//Stop our timer
st.Stop();
MessageBox.Show(String.Format("Found {0:N0} pages in {1:N0} seconds", totalPageCount, st.Elapsed.Seconds));
}
private void CountPages_Fast(){
//Make sure we've got files to work with
if (!SanityCheck()) return;
//Create a timer
var st = new Stopwatch();
//Start it
st.Start();
//Get our files
var files = Directory.EnumerateFiles(workingFolder, "*.pdf");
//Total number of pages found
int totalPageCount = 0;
//Used to update the progress bar
int i = 0;
int localFileCount = files.Count();
//Loop through each file
foreach (string f in files){
//This is a total perf hit but the differences between the two methods is so great it doesn't really matter
//Every 100 pages update the progress bar
i++;
if (i % 100 == 0){
updatePB(i, localFileCount);
}
//Add the page count to the total
totalPageCount += new PdfReader(new RandomAccessFileOrArray(f, true), null).NumberOfPages;
}
//Stop our timer
st.Stop();
MessageBox.Show(String.Format("Found {0:N0} pages in {1:N0} seconds", totalPageCount, st.Elapsed.Seconds));
}
}
}
September 3, 2011
#3 – VB.Net iTextSharp Tutorial – Add a scaled image to a document
This is part of a series of iTextSharp tutorials for VB 2010 Express. See this post for an overview and to answer any basic questions that you may have.
This post is a followup to the previous one, this time it scales the image based on the document’s size
Option Explicit On
Option Strict On
Imports System.IO
Imports iTextSharp.text
Imports iTextSharp.text.pdf
Public Class Form1
Private Sub Form1_Load(sender As System.Object, e As System.EventArgs) Handles MyBase.Load
''//The main folder that we are working in
Dim WorkingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop)
''//The file that we are creating
Dim WorkingFile = Path.Combine(WorkingFolder, "Output.pdf")
Dim SampleImage = Path.Combine(WorkingFolder, "IMG_5605.JPG")
''//Create our file with an exclusive writer lock
Using FS As New FileStream(WorkingFile, FileMode.Create, FileAccess.Write, FileShare.None)
''//Create our PDF document
Using Doc As New Document(PageSize.LETTER)
''//Bind our PDF object to the physical file using a PdfWriter
Using Writer = PdfWriter.GetInstance(Doc, FS)
''//Open our document for writing
Doc.Open()
''//Insert a blank page
Doc.NewPage()
''//Create a PDF image object from our physical image
Dim ThisImage = iTextSharp.text.Image.GetInstance(SampleImage)
''//Use standard ratio resizing algorithms to calculate new image dimensions based on the documents dimensions. This will shrink or grow documents to fit
''//Will hold our new image dimensions
''//Documents sometimes have margins (and this sample does) so subtract them so that our image in centered in the page
Dim NewW, NewH As Single
NewW = Doc.PageSize.Width - (Doc.LeftMargin + Doc.RightMargin)
NewH = Doc.PageSize.Height - (Doc.TopMargin + Doc.BottomMargin)
''//Scale the image
ThisImage.ScaleToFit(NewW, NewH)
''//Add the image to the document
Doc.Add(ThisImage)
''//Close our document
Doc.Close()
End Using
End Using
End Using
Me.Close()
End Sub
End Class
#2 – VB.Net iTextSharp Tutorial – Add an image to a document
This is part of a series of iTextSharp tutorials for VB 2010 Express. See this post for an overview and to answer any basic questions that you may have.
Option Explicit On
Option Strict On
Imports System.IO
Imports iTextSharp.text
Imports iTextSharp.text.pdf
Public Class Form1
Private Sub Form1_Load(sender As System.Object, e As System.EventArgs) Handles MyBase.Load
''//The main folder that we are working in
Dim WorkingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop)
''//The file that we are creating
Dim WorkingFile = Path.Combine(WorkingFolder, "Output.pdf")
Dim SampleImage = Path.Combine(WorkingFolder, "IMG_0259.JPG")
''//Create our file with an exclusive writer lock
Using FS As New FileStream(WorkingFile, FileMode.Create, FileAccess.Write, FileShare.None)
''//Create our PDF document
Using Doc As New Document(PageSize.LETTER)
''//Bind our PDF object to the physical file using a PdfWriter
Using Writer = PdfWriter.GetInstance(Doc, FS)
''//Open our document for writing
Doc.Open()
''//Insert a blank page
Doc.NewPage()
''//Add an image to a document. This does not scale the image or anything so if your image is large it might go off the canvas
Doc.Add(iTextSharp.text.Image.GetInstance(SampleImage))
''//Close our document
Doc.Close()
End Using
End Using
End Using
Me.Close()
End Sub
End Class
#1 – VB.Net iTextSharp Tutorial – Hello World
This is the first in the series of iTextSharp tutorials for VB 2010 Express. See this post for an overview and to answer any basic questions that you may have.
This is the starter, the “hello world” program done in VB.Net. The comments in the code should hopefully be enough to explain what’s going on, but after running (and it should run fast, just opens and closes), you should have a PDF on your desktop called “Output.pdf”
Option Explicit On
Option Strict On
Imports System.IO
Imports iTextSharp.text
Imports iTextSharp.text.pdf
Public Class Form1
Private Sub Form1_Load(sender As System.Object, e As System.EventArgs) Handles MyBase.Load
''//The main folder that we are working in
Dim WorkingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop)
''//The file that we are creating
Dim WorkingFile = Path.Combine(WorkingFolder, "Output.pdf")
''//Create our file with an exclusive writer lock
Using FS As New FileStream(WorkingFile, FileMode.Create, FileAccess.Write, FileShare.None)
''//Create our PDF document
Using Doc As New Document(PageSize.LETTER)
''//Bind our PDF object to the physical file using a PdfWriter
Using Writer = PdfWriter.GetInstance(Doc, FS)
''//Open our document for writing
Doc.Open()
''//Insert a blank page
Doc.NewPage()
''//Add a simple paragraph with text
Doc.Add(New Paragraph("Hello World"))
''//Close our document
Doc.Close()
End Using
End Using
End Using
Me.Close()
End Sub
End Class
VB.Net tutorial for iTextSharp
iTextSharp is a great open source PDF creation and manipulation library and is a port of the original Java version iText. Unfortunately I’ve found the documentation and samples lacking, especially for VB.Net. The only good collection of tutorials that I’ve found was written by Mike Brind. While they were a great start for me, they targeted the 4.x series and were written in C#.The tutorials are fairly easy to upgrade and convert to VB.Net but I wanted to have my own collection. So stay tuned to this post for what I hope will be several iTextSharp tutorials.
A couple of things about the tutorials themselves.:
- All will be written targetting iTextSharp 5.1.1.0 unless otherwise noted.
- All will be written using Visual Basic 2010 Express
- My comments are written a little weird, they all start with ”//. This is because StackOverflow’s (SO) HTML code highlighting system seems to break when using VB comments. So I “open and close” and VB comment and then start a C-style comment which seems to work best. One other problem I’ve had with SO is that apostrophes seem to mess up highlighting in comments, so you’ll usually see me say “do not” instead of “don’t” or “the objects properties” instead of “the object’s properties”.
- All code samples are complete WinForms apps unless otherwise noted. This means that you should be able to launch VB Express 2010, create a new Windows Forms Application, add a reference to iTextSharp, switch to the code-behind on the form and paste the entire portion of my code on top of the existing code and it will work for you. The only modifications needed might be variables pointing to specific files and those will be called out at the top. If you hunt-and-peck at my code and it doesn’t work for you, don’t complain right away. Start with my exact base and modify bits at a time.
- The reason I use WinForms apps for samples over Console Apps is because I occassionally need System.Drawing. While you can definately use System.Drawing with a Console App, this route makes copy and pasting of code easier.
- If you have a question about a specific tutorial, feel free to post a comment. If a tutorial is about XYZ and you want to know ABC, feel free to also post a comment but don’t expect an immediate answer. I might eventually get around to it but there’s no guarantee. Instead, also post your question on StackOverflow. Feel free to cite my tutorial as a reference.
- If you find a tutorial helpful I really do enjoy feedback!
- All code that I post here is free for you to use as far as I’m concerned. For licensing of iTextSharp itself please contact iText. I in no way represent them, work for them or anything. I just like their product.
- Some or all examples will execute code directly in Form1_Load. Because of this you’ll often see a Me.Close() at the end of the code. Because this is all sample code this is just so that I don’t need to close an empty form every time.
- VB.Net iTextSharp Tutorial – Hello World
- VB.Net iTextSharp Tutorial – Add an image to a document
- VB.Net iTextSharp Tutorial – Add a scaled image to a document
July 31, 2011
Getting color information from iTextSharp’s TextRenderInfo and ITextExtractionStrategy
In order to get color information when using an ITextExtractionStrategy in iTextSharp (5.1.1.0) you need to make the following changes to main iTextSharp code. Once you make these changes you can follow my SO post here for getting font information as well.
iTextSharp.text.pdf.parser.GraphicsState.cs
//New Fields:
internal BaseColor colorStroke;
internal BaseColor colorNonStroke;
//New Properties:
public BaseColor GetColorStroke() {
return colorStroke;
}
public BaseColor GetColorNonStroke() {
return colorNonStroke;
}
//changed constructors:
public GraphicsState(){
ctm = new Matrix();
characterSpacing = 0;
wordSpacing = 0;
horizontalScaling = 1.0f;
leading = 0;
font = null;
fontSize = 0;
renderMode = 0;
rise = 0;
knockout = true;
colorStroke = null;
colorNonStroke = null;
}
/**
* Copy constructor.
* @param source another GraphicsState object
*/
public GraphicsState(GraphicsState source){
// note: all of the following are immutable, with the possible exception of font
// so it is safe to copy them as-is
ctm = source.ctm;
characterSpacing = source.characterSpacing;
wordSpacing = source.wordSpacing;
horizontalScaling = source.horizontalScaling;
leading = source.leading;
font = source.font;
fontSize = source.fontSize;
renderMode = source.renderMode;
rise = source.rise;
knockout = source.knockout;
colorStroke = source.colorStroke;
colorNonStroke = source.colorNonStroke;
}
iTextSharp.text.pdf.parser.PdfContentStreamProcessor.cs
//append to end of method PopulateOperators()
RegisterContentOperator("G", new SetStrokingGray());
RegisterContentOperator("g", new SetNonStrokingGray());
RegisterContentOperator("RG", new SetStrokingRGB());
RegisterContentOperator("rg", new SetNonStrokingRGB());
RegisterContentOperator("K", new SetStrokingCMYK());
RegisterContentOperator("k", new SetNonStrokingCMYK());
RegisterContentOperator("CS", new SetStrokingGeneral());
RegisterContentOperator("cs", new SetNonStrokingGeneral());
RegisterContentOperator("SC", new SetStrokingGeneral());
RegisterContentOperator("sc", new SetNonStrokingGeneral());
RegisterContentOperator("SCN", new SetStrokingGeneral());
RegisterContentOperator("scn", new SetNonStrokingGeneral());
//add new classes:
public abstract class SetColorBase : IContentOperator {
public enum ColorStyle { Stroke = 1, NonStroke = 2 };
public enum ColorSpace { RGB = 1, CMYK = 2, Gray = 3, Other = 4 };
public abstract BaseColor GetColor(PdfLiteral oper, List&lt;PdfObject&gt; operands);
private ColorStyle style;
private ColorSpace space;
public SetColorBase(ColorStyle colorStyle, ColorSpace colorSpace) {
this.style = colorStyle;
this.space = colorSpace;
}
public void Invoke(PdfContentStreamProcessor processor, PdfLiteral oper, List&lt;PdfObject&gt; operands) {
BaseColor c = GetColor(oper, operands);
GraphicsState gs = processor.gsStack.Peek();
if (this.style == ColorStyle.Stroke) {
gs.colorStroke = c;
}
else if (this.style == ColorStyle.NonStroke) {
gs.colorNonStroke = c;
}
}
}
private class SetStrokingGray : SetColorBase {
public SetStrokingGray() : base(ColorStyle.Stroke, ColorSpace.Gray) { }
public override BaseColor GetColor(PdfLiteral oper, List&lt;PdfObject&gt; operands) {
PdfNumber g = (PdfNumber)operands[0];
return new GrayColor(g.FloatValue);
}
}
private class SetNonStrokingGray : SetColorBase {
public SetNonStrokingGray() : base(ColorStyle.NonStroke, ColorSpace.Gray) { }
public override BaseColor GetColor(PdfLiteral oper, List&lt;PdfObject&gt; operands) {
PdfNumber g = (PdfNumber)operands[0];
return new GrayColor(g.FloatValue);
}
}
private class SetStrokingRGB : SetColorBase {
public SetStrokingRGB() : base(ColorStyle.Stroke, ColorSpace.RGB) { }
public override BaseColor GetColor(PdfLiteral oper, List&lt;PdfObject&gt; operands) {
PdfNumber r = (PdfNumber)operands[0];
PdfNumber g = (PdfNumber)operands[1];
PdfNumber b = (PdfNumber)operands[2];
return new BaseColor(r.FloatValue, g.FloatValue, b.FloatValue);
}
}
private class SetNonStrokingRGB : SetColorBase {
public SetNonStrokingRGB() : base(ColorStyle.NonStroke, ColorSpace.RGB) { }
public override BaseColor GetColor(PdfLiteral oper, List&lt;PdfObject&gt; operands) {
PdfNumber r = (PdfNumber)operands[0];
PdfNumber g = (PdfNumber)operands[1];
PdfNumber b = (PdfNumber)operands[2];
return new BaseColor(r.FloatValue, g.FloatValue, b.FloatValue);
}
}
private class SetStrokingCMYK : SetColorBase {
public SetStrokingCMYK() : base(ColorStyle.Stroke, ColorSpace.CMYK) { }
public override BaseColor GetColor(PdfLiteral oper, List&lt;PdfObject&gt; operands) {
PdfNumber c = (PdfNumber)operands[0];
PdfNumber m = (PdfNumber)operands[1];
PdfNumber y = (PdfNumber)operands[2];
PdfNumber k = (PdfNumber)operands[3];
return new CMYKColor(c.FloatValue, m.FloatValue, y.FloatValue, k.FloatValue);
}
}
private class SetNonStrokingCMYK : SetColorBase {
public SetNonStrokingCMYK() : base(ColorStyle.NonStroke, ColorSpace.CMYK) { }
public override BaseColor GetColor(PdfLiteral oper, List&lt;PdfObject&gt; operands) {
PdfNumber c = (PdfNumber)operands[0];
PdfNumber m = (PdfNumber)operands[1];
PdfNumber y = (PdfNumber)operands[2];
PdfNumber k = (PdfNumber)operands[3];
return new CMYKColor(c.FloatValue, m.FloatValue, y.FloatValue, k.FloatValue);
}
}
private class SetNonStrokingGeneral : SetColorBase {
public SetNonStrokingGeneral() : base(ColorStyle.NonStroke, ColorSpace.Other) { }
public override BaseColor GetColor(PdfLiteral oper, List&lt;PdfObject&gt; operands) {
if (operands.Count == 2 &amp;&amp; operands[0].IsNumber() &amp;&amp; ((PdfNumber)operands[0]).IntValue == 0) {
return new BaseColor(0);
}
if (operands.Count == 2 &amp;&amp; operands[0].IsName()) {
return new BaseColor(0);
}
if (operands.Count == 4) {
PdfNumber r = (PdfNumber)operands[0];
PdfNumber g = (PdfNumber)operands[1];
PdfNumber b = (PdfNumber)operands[2];
return new BaseColor(r.FloatValue, g.FloatValue, b.FloatValue);
}
return null;
}
}
private class SetStrokingGeneral : SetColorBase {
public SetStrokingGeneral() : base(ColorStyle.Stroke, ColorSpace.Other) { }
public override BaseColor GetColor(PdfLiteral oper, List&lt;PdfObject&gt; operands) {
if (operands.Count == 2 &amp;&amp; operands[0].IsNumber() &amp;&amp; ((PdfNumber)operands[0]).IntValue == 0) {
return new BaseColor(0);
}
if (operands.Count == 2 &amp;&amp; operands[0].IsName()) {
return new BaseColor(0);
}
if (operands.Count == 4) {
PdfNumber r = (PdfNumber)operands[0];
PdfNumber g = (PdfNumber)operands[1];
PdfNumber b = (PdfNumber)operands[2];
return new BaseColor(r.FloatValue, g.FloatValue, b.FloatValue);
}
return null;
}
}
iTextSharp.text.pdf.parser.TextRenderInfo.cs
//new methods
public BaseColor GetColorStroke() {
return gs.GetColorStroke();
}
public BaseColor GetColorNonStroke() {
return gs.GetColorNonStroke();
}
This code is very experimental but so far works pretty well. Depending on who generates the PDF different things can happen. Word’s built-in PDF generator seems to take the easier route and just kicks out simple RGB values. Adobe’s PDF plug-in appears to do the same but in a more complicated way, creating “named” color spaces (I think) but I’m not completely sure how to use them yet.
February 2, 2011
Don’t bother with TcpClient.Connected
The TcpClient has a Connected property that is very convenient to use but unfortunately it doesn’t do what you think it should do. A better name for this property would be WasConnected or WasLastOperationSuccessful. The problem is that this property only tells you the status of the last operation. For example, if 30 seconds ago you sent some data this property would be true. If the client that you sent data to calls Close() on their end or their network connection goes down this property will still be true. The latter case you can probably understand but the for the former case you need to understand that there’s no “connection agreement” between the two parties. When one side calls Close() its not going to stay open to send data to the other and potentially wait forever on a slow network connection. Instead, Close() just means “terminate my side”. If you want, you can roll your own handshake implementation and have the client send a ‘closing connection’ packet but neither side should assume that it will work.
For more information see MSDN:
Because the Connected property only reflects the state of the connection as of the most recent operation, you should attempt to send or receive a message to determine the current state. After the message send fails, this property no longer returns true. Note that this behavior is by design. You cannot reliably test the state of the connection because, in the time between the test and a send/receive, the connection could have been lost. Your code should assume the socket is connected, and gracefully handle failed transmissions.