Gloucestershire Local History banner

Creating PDF (Portable Document Format) Files

Guidelines No. 8         Issue 1.2:         June 2007


The aim of these guidelines is to provide local historians with basic information about Adobe Acrobat PDF files and their advantages for publishing material as they can be read on any computer. Various methods are described which can be used to create PDF documents for distribution over the Internet or by other means such as by CD-ROM.


1. Introduction
2. Technical Background
3. Creating Your Own PDF Documents
4. Special Features
5. Concluding Remarks

1.    Introduction          [top]

Most users of the Internet have encountered the now familiar "Portable Document File" (PDF) Format developed by the Adobe Corporation over ten years ago. PDF files have the great advantage that exactly the same file can be read on virtually any computer that is equipped with the appropriate PDF reader software. This software is free from Adobe and can be downloaded from the Internet or installed directly from many of the CD ROMs on the cover of computing magazines.

PDF files can contain text, images and tables such as those produced by spreadsheets. Many documents are based on standard paper sizes such as A4 and so can be readily printed. However, the PDF format will handle documents up to five metres square and so can accommodate the largest maps and plans. The use of word processor files such as Microsoft Word is not a real option for the publication of material since not every one uses it and for even Word users the output from a given file is just too dependent on version of the software and the printer used. Similarly, publication using web pages is not always a satisfactory option because what the reader sees depends on his browser and monitor resolution. Of course Apple Macintosh computer users don't use Windows at all but happily the same PDF file works for Apple computers as well as for Windows machines.

The PDF format is therefore well worth consideration by local historians looking for an alternative means of publishing their work This might be in the familiar format of a printed article but it could be an analysis performed using a spreadsheet, a listing of the contents of a database or a map. This note provides links to some of the many resources on the Internet concerning PDF files.

The main disadvantage of PDF files for individual authors is that the 'industry standard' PDF writer software from Adobe costs nearly £250 including VAT. However, one purpose of this note is to provide advice on the different PDF writer software available including some solutions which are free and legal.

2.    Technical Background          [top]

This section provides background information on the origins of the PDF format and Postscript which forms the basis of PDF. If this of no interest to the reader should go directly to Section 3, Creating Your Own PDF Documents.

Adobe's Portable Document Format (PDF) is based on its well established page description language called Postscript. A postscript file is basically a text file which contains commands which describe how elements of the page such as text characters, lines and curves should be positioned and what size they should be. An output device such as a printer equipped with a postscript 'interpreter' takes the Postscript commands from the Postscript file presented to it and generates the specific instructions that this particular printer needs to reproduce the text , lines and curves exactly. Two completely different printers with the same output resolution (which is also adequate for the job in hand) will therefore produce identical output without the need to have two separate files - one for each type of printer. A PostScript device of lower resolution than the input file may still make a reasonable job of printing the page although it will be at a lower resolution and some details may be lost. This technique is employed when low resolution proofs are made from material that is destined to be output on very high resolution image setters which make printing plates. It would be wasteful to use the image setter itself to produce expensive plates that are only the initial proofs. Only the final proofs need be made on the actual machine.

Postscript was developed for the print industry but it soon became apparent that the system also needed to be adapted so that documents could be displayed consistently on computer screens. Hence the development of PDF. The internal representation of documents with the PDF document is still closely based on Postscript. Many features such as encryption and annotation and digital signatures were also added to PDF as discussed in Section 4.

There are two basic ways to use the Adobe Acrobat PDF creation software. The simplest way is to install a component of the Adobe software on your computer very much in the way you might install Windows printer driver. When you wish to create a PDF document you effectively select Acrobat PDF as your printer driver and when prompted you supply a suitable file name and the system creates a PDF file on disk as easily as if you were simply printing to paper in the normal manner. Unfortunately, this approach provides little control over the process and it may not be able to cope with certain features found in some documents and therefore strange results may sometimes occur. The alternative approach is to use the other main component of the Acrobat Acrobat PDF writing software called Distiller. In this approach the document is created using a Postscript printer driver (either within Distiller or you can specify your own driver). The resulting output (a Postscript file) is then 'distilled' to a PDF file but there is far greater control over the parameters used in the Distiller process. It may take some time to get to grips with the many settings available but happily this may not be necessary as many applications such as creating PDF files for web sites as well as for general printing purposes have default sets of parameters available which work quite well.

It should be noted that some of the cheaper alternatives only emulate the PDF printer driver aspect of the actual Adobe software.

The smallest PDF files are produced from files where the data is already in a standard data file format such as Word or Excel or Access. However multiple-page image files created by scanning multiple-page documents can be readily converted into PDF documents, albeit with much larger file sizes than if the data was in Word or Excel format. It is possible to convert the images to words using OCR (Optical Character Recognition). However an image of the document should be retained

The Planet PDF website is an excellent resource for all matters relating to PDF including details of 100s of software products to make and process PDF files along with much helpful advice

3.    Creating Your Own PDF Documents           [top]

You have many choices when it comes to software to create PDF files as although Adobe Acrobat controls the PDF specification they permit anyone to produce and sell software to create PDF files. However, there is a strict proviso that the documents created must adhere to the PDF Standard. This is a good thing as it ensures that all PDF documents can still be read on any computer even though different software was used to create them. The existence of alternative suppliers also means that Adobe does not have a monopoly and therefore users can be confident that they wont be held to ransom at some future date by the company. Adobe make their money in the PDF area by simply being the biggest producers of the commercial software used to create PDF files.

Over the last ten years there have been several revisions to the standard which now stands at version 1.6. Each new version is backwards compatible with all the earlier versions. This means that you do not normally have to recreate your documents to the new specification each time a new version is released. Nevertheless it is good practice to ensure that all your 'original' material is archived in a suitable format that should be 'readable' and 'comprehensible' a long way into the future. At present text material should be archived as ASCII text files or HTML files if you wish to preserve limited formatting information for text and images should be archived as uncompressed TIF (Tagged Images File Format) files.

Adobe Acrobat Version 7, which became available at the end of December 2004, is the current release for both Reader and PDF Creator software. There is a standard version which costs about £250 (inc.VAT). However the new Professional version costs about £380 (inc.VAT). The latter contains many features of interest to the professional printing community. However, it does employ new improved image compression techniques.

Adobe may be contacted at http://www.adobe.com/products/acrobat/main.html

At the cheaper end of the market (and therefore with products with less features compared to Adobe products are

•   Jaws for PDF (www.jawspdf.com) about £50 (inc.VAT)

•   RoboPDF (www.ehelp.com/products/robopdf) about £50 (inc.VAT)

•   Omipage Optical Character Recognition Software Version 12 ( www.scansoft.com/omnipage)

•   Paperport Scanning Software ( www.scansoft.com/paperport)

and Serif PagePlus 10

all have significant PDF creation capabilities.

Freeware

The comprehensive Ghostscript Package has a good Postscript to PDF Converter. This will convert the postscript file produced by a Print to File operation out of any Windows operation using a freely available Postscript printer driver along the line of Distiller component of the Adobe Acrobat software described in the above section.

The Planet PDF website (www.planetpdf.com) describes itself as "A World of Adobe Acrobat News, tools Tips and Resources". It has details of hundreds of commercial utilities and software products to make and process PDF files and is highly recommended.

4.    Special Features           [top]

'Navigation aids' may be set up in a PDF document to help readers find the section they require in a document. These can be in the form of either bookmarks or links. The bookmarks that are set up by the author may be displayed (in a separate window) and as the name suggests it allows the reader to click on the bookmark and go straight to the relevant page. Links are similar to web page links and are included in the text of the PDF document.

Documents may be encoded so that a password is required before they can be opened.

Users can often select text or images in a PDF document using the mouse and then copy these items into their own documents such as Microsoft Word documents. However, the author of the PDF document can set an option such that text or images cannot be copied.

A PDF document circulated for comment may be 'annotated' by the recipient and the document returned to the originator complete with the comments.

Adobe Acrobat Version 7 Professional includes some of the most advanced image compression technology currently available (This is JPEG2000 for photographs and JBIG2 for text and line art).

For compatibility with earlier versions of the Acrobat Reader software PDF documents created with the Version 7 software may be created as 'earlier version' documents. This restricts the number of features available.

5.    Concluding Remarks           [top]

The PDF format is currently the 'de facto' standard for distributing document in single format which is capable of been read by all computer users. Solutions for writing the software range from 'freeware' to about £400. In view of the high costs associated with traditional printed publications the PDF file is worthy of careful consideration by local historians wishing to bring their work to the widest possible audience.

Feedback on these notes will be welcomed. Please send them to the Author at ray.wilson@coaley.net .

[top]