WindJack Logo

About Us


How To Articles
by Thom Parker of WindJack Solutions.
Copyright © 2004 by WindJack Solutions
 
Explore Annotations in a PDF Document
by Thom Parker



This article discusses how you can use PDF CanOpener to find and identify the parts of an Annotation Object in a PDF Document.

PDF File Used in this article: WJAnnotExample1.pdf

PDF CanOpener Symbols:

  - Document Catalog
  - Page Object
  - Pages Object

  - Annotation Object
  - Indirect Object Indicator

General:

The PDF Content, the stuff in the Page Content Stream, is static, it's just like a picture. With PDF 1.5, Adobe added Optional Content Groups, which gave the content some minor interactivity, but it's the Annotations that really liven up PDF Documents.  Form Fields, Links, Multimedia Objects, Notes, Highlights, Stamps, and anything else on a page that responds to the user is an Annotation. They are primarily a way of extending the PDF with location oriented and user interactive functionality. To a lesser extent they are also used as local storage objects. An Annotation is a live object only if an appropriate Annotation Handler is available in the viewer application.  The Handler is a set of functions for responding to mouse, keyboard, and drawing events.

The PDF 1.5 Spec. defines a rich set of Annotations. The handlers for most of these Annotations are built into Acrobat.  If you want to make your own, the SDK provides facilities for creating custom Annotation Handlers.  This Article will look at the "Object Views" of 4 different  Annotations, one of which will be a simple custom Annotation.

Where is that Annot?

Every Annotation is associated with a specific page and a specific location on that page.  Even if it has no graphical representation.  As such, each Page that has an Annotation on it, has its' own "Annots" Array in the Page's Dictionary. The Annotation Objects in this array are unique,  they don't (or shouldn't anyway) exist in any other "Annots" array on any other page.  So the first place to look for an Annotation is always in this array on the page of interest.

PDF CanOpener offers some different ways to find Annotations. 

Tree Walking:

  1. Open the sample document in Acrobat and activate PDF CanOpener.  The PDF CanOpener display will show the root node of the document's CosObj tree.
  2. Open the "Pages" entry.
  3. Open the "Kids" entry.  This is the list of pages for this document.  There is only one page.
  4. Open the Page Object.
  5. The "Annots" array entry in the Page Dictionary is the list of Annotations contained on this page. Open it.

Snap To and Highlight Annotations in the Document Page View:

  1. Right click anywhere in the tree view portion of the PDF CanOpener Display to bring up the popup context menu.  Make sure both the "Highlight Selected" and "Snap to Selected" options are checked.
  2. Click on any one of the Annotation Objects in the "Annots" array. The Page View will scroll to make the selected Annotation visible and it will be highlighted with a blue rectangle. This is an easy way to see which Annotation in the object tree is which Annotation on the page.


(Screen shot of the sample PDF Document.) 

Object Selector Tool:

In the above image three methods are shown for activating the Selector Tool.  The Selector tool is the easiest way to locate a specific Annotation in the COS Object tree from the Page View.

  1. After you activate the Selector Tool the Page View cursor will change to
  2. As it passes over an Annotation, the Annotation will be highlighted. 
  3. Click on the Annotation to navigate to its' representation in the Object Tree, very easy. 
  4. Click on it a second time and Annotation dictionary expands. 

Simple Link Annotation:

Shown below are the contents of a Link Annotation Dictionary.  This Annotation Type is one of the simplest and most utilized of the built-in Annotations.

This Link Annotation has no static visual representation, so it is missing some entries that are common to most Annotation types. We'll get to those later.  The Info Window, immediately below the CosObj tree display, shows us that the Link Annotation Object is really an indirect reference to a Cos Dictionary.  All Annotations are indirect so they can be referenced in other locations.  Some of the entries shown are specific to the "Link" Annot and some are common to all Annots. 

Common Annotation Entries in the Link Object:

  1. "Type" - Optional entry.  Cannot be relied on for programmatically IDing Annotations.

  2. "SubType" - Required.

  3. "Rect" - Required. Location on page in User Coordinates.

  4. "Border" - Border Style Array.  Original PDF implementation. Redundant here but needed for backward compatibility.

  5. "BS" - Also Border Style, introduced in PDF 1.2

  6. "A" - Action to take when Annotation is clicked on. Common to all types but really only used in a few. 

The "H" entry is the only one shown here that is specific to the Link Annotation, it selects the type of highlight to be drawn by the Handler when the cursor passes over the link. Another entry specific to the Link Annotation, but not present here, is the "Dest" entry, which holds a Destination Object.  In this Annotation the  "A" entry is taking its place .

The Link Annotation will even work with fewer members.  Delete the "BS", "Border", and "H" entries. The Annotation Handler will supply them with default values. 

A more complex Annotation: the Highlight

The Highlight Annotation features a visual representation on the page and a text input box.

There are a lot more entries in this Annotation than in the Link Example.  These entries are common to a large group of built-in Annotations used for text markup and commenting.

  1. "Flags" - Determines how Acrobat (not necessarily the Handler) treats the Annotation.  It sets characteristics like the visibility and printability. Very common entry for all Annotations that have a static appearance on the page.

  2. "Subj" - User entered string. From the "Subject" field in the Properties Dialog. 

  3. "CreationDate" - Check the PDF Spec sec. 3.83 for format details.

  4. "NM" - Annotation's Name, auto generated by Acrobat.  This string is the input value you use with the JavaScript method "doc.getAnnot()".

  5. "C" - This array is the fill color used by the Annotation Handler whenever it needs to draw something, like the popup text box.  For example, the Highlight in the sample PDF is yellow, by default the popup is also drawn in yellow. So the three numbers in the "C" array are the RGB values for yellow [1,1,0].  If you change these to [1,0,0] for red, the highlight color is unchanged, but the popup is now drawn in red.

  6. "M" - Modified date.

  7. "P" - reference to the Page Object in which the Annotation appears. Typically in every Annotation with an appearance. It is an aid to navigation and it fixes the Annotation's association with the page

  8. "T" - User entered string, through the "Author" field in the Properties Dialog.

  9. "StructParent" - Associates this Annotation with a node somewhere in the logical structure tree ("StructTreeRoot") in the document catalog.

  10. "QuadPoints" - This entry is specific to the set of Text Markup Annotations (highlights, underlines and strikeouts).  It's an array of 4 points (8 numbers) that give the vertices of the rectangle that bounds the affected text on the page. 4 points allows the representation of rotated rectangles.

The two most important entries of the Highlight Annotation for this discussion are "AP" and "Popup".  The "AP" or Appearance Dictionary  contains the Annotation's graphical appearance in a Form XObject.  This Annotation has only one, but an Annotation may have many, each representing a different visual state. The PDF Spec divides these appearances into three broad categories, Normal, Down, and Rollover.  In this example, the Highlight Annotation has only one appearance, so by default it must be named "N", for normal.  The Down appearance is used when the Annotation is clicked on, and the Rollover Appearance is used when the cursor passes over the Annotation.  If there are more than one appearance per normal/down/rollover categories, then the "N", "D" and "R" entries become Dictionary Objects.  The Appearance Streams in these dictionaries are named according to visual states defined by the Annotation Handler.  The Annotation dictionary must then also contain an "AS" (Annotation State) entry that indicates to the Annotation Handler, or Acrobat if the handler is missing, which appearance to draw on the page.  We'll see this later in the discussions on both Form Fields and the Custom Annotation.

The "Popup" entry is another Annotation.  It is used in this case as an input window.  If you look inside it you'll see it does not have an "AP" entry.  The Annotation Handler is responsible for drawing it on the page and processing the user input.   The Popup is connected to the Highlight Annotation through the "Parent" entry of its' dictionary.  Popup Annotations are used as a kind of local storage for the Annotation.  One use is to store a history of "Comment Status" changes made to an Annotation. In this case, the "Popup" Annotation is just a way of keeping track of the last position and state of the input box.

Form Fields:

Form Field Objects are all kept in the "AcroForm" entry of the Document Catalog, so they have global, or Document Scope, meaning they have the same value everywhere in the document.  You cannot have two Form Fields with the same name in the same document.  But you say, Form Fields have a graphical appearance on specific pages in the document.  And furthermore, you can copy them all over the place so you can have lots of Form Fields with the same name.  Well, not really, as I'll explain a little later.

The screen shot below shows the "AcroForm" dictionary for the document displayed on the right.  This document has 4 Form Fields; one text field and 3 radio buttons.  The "Fields" entry of the AcroForm Dictionary contains a list of all the Form Fields on this document.  It only contains 2 entries, and one of those isn't even a Form Object, it's an Annotation, what gives? 

The Widget Annotation is the Form Field's graphical representation on the specific pages.  Each Form Field has one Widget Annotation for each place it is used in the document.  If there is only one location on a document where a Form Field is used, Acrobat combines the Form Field Dictionary and the Widget Annotation into a single object, like the Text Field in this example.  If a Form Field is used more than once, Acrobat puts it in a proper Form Field Object. The location specific Widget Annotations are stuffed into the "Kids" entry.  See the the Radio Button entries in the above example.

For the Form Field to be visible on a page, it must have a Widget Annotation in the page's "Annots" Array.  Here is a view of the same Radio Button Widget as above inside the "Annots" Array of a Page Dictionary.

There aren't too many things here that are different from the other Annotation Types. The big difference between this Annotation and the earlier examples is the "AP" entry. This one has 4 Appearance Streams, two each in the normal appearance entry "N" and the down appearance entry "D".  One appearance for each of the states this radio button can have, "Off" or "Sel1". "Sel1" is the export value of this button set by the user.  The Annotation's "AS" entry tells Acrobat which of these XObjects to use when drawing it on the page.  It is the Annotation Handlers responsibility to set this value.

Form Fields are highly interactive. The Field is "Active" whenever it has the mouse and keyboard focus, for example, when the user enters text into a Text Field or clicks on a button.  In this state the Form Handler is responsible for drawing the Widget Annotation's Appearance.  The Form Field becomes "Inactive" when it loses focus.  At this point the Form Field's static appearance (the Widget's Appearance Stream, or "AP" entry) needs to be changed to reflect the user's changes.  

If this change requires regenerating the Appearance Stream, then the "MK" entry provides the Handler with hints on how to do this.  The entries in the "MK" Dictionary are set by the user in the Properties Dialog for the Form Field. The one in this example contains only two entries, one for the border color, and one for the background color.  If you set more properties in the dialog,  more entries will appear in this dictionary.  For the Radio Button Field, all the Appearance Streams it will ever need are created when these properties are set.  So, user interaction causes the Handler to set the value of the "AS" (Annotation State) entry, rather than regenerate the Appearance Stream.

Custom Annotation:

No Handler was written for this example.  The Annotation was designed to simply be visible on the page.   For an Annotation to be displayable it needs:

  1. To be in the Annots array of the Page Dictionary.

  2. Have a "Rect" entry that places it on the page.

  3. Have a "Subtype" entry, so Acrobat can identify the correct Handler.

  4. Have an "AP" entry that has at least one entry "N", that is a Form XObject.

That's all, any other entries in the Annotation Dictionary are parameters for the Handler.  The screen shot below shows a custom Annotation in a section of the Page Annots array. This one is a little more complex than the minimum.  It has 2 Appearance Streams and an "AS" entry that selects which one Acrobat displays.  The only way to change the "AS" entry is to either write an Annotation handler or get PDF CanOpener.

We hope this material was helpful to you.  If you have any questions or comments for us or want more info on PDF CanOpener, please send email to info@windjack.com.

Check back regularly for new articles.


[ << BACK TO HOW TO ]



Home | About Us | Contact Us

Site design by Terraform Creative