PDF File Format
Home Software Hardware Services News

 

PDF Portable Document Format                        PDF Searchable text

TIFF Tagged Image File Format
PDF Searchable Images
PDF/A Archiving Format
Twain Scanning

The Portable Document Format (PDF) is the file format created by Adobe Systems in 1993 for document exchange. PDF is a fixed-layout format used for representing two-dimensional documents in a manner independent of the application software, hardware, and operating system. Each PDF file encapsulates a complete description of a 2-D document (and, with Acrobat 3-D, embedded 3-D documents) that includes the text, fonts, images, and 2-D vector graphics that compose the documents.

PDF is a universal file format for document exchange that preserves all the fonts, formatting, colours, and graphics of any source document (whether it’s on paper or from the Web or other electronic sources). Preservation is faithful regardless of the application and platform used to create or view the material. PDF files can be shared, viewed, navigated, and printed on a broad range of operating systems by anyone using free Adobe Acrobat Reader™ software.

PDF files look and print exactly as intended across a wide variety of platforms. Free Acrobat Reader software is easy to download from the Adobe Web site. More than 200 million copies have been downloaded or preloaded onto personal computers. This means that if you want to distribute documents to a broad audience, the ubiquity of Acrobat Reader software ensures that all your PDF files can be read by anyone across a broad range of computing environments.

Navigating PDF documents, even long ones, is easy. Unlike standard TIFF files, PDF files can hold navigation information—such as hyperlinks for tables of contents, indexes, and URLs—all within one self-contained file. It’s easy to search for any word or number in a document and see the desired information in context on the page.

The PDF file format has changed several times, as new versions of Adobe Acrobat were released. There have been eight versions of PDF with corresponding Acrobat releases:

(1993) - PDF 1.0 / Acrobat 1.0 
(1994) - PDF 1.1 / Acrobat 2.0 
(1996) - PDF 1.2 / Acrobat 3.0 
(1999) - PDF 1.3 / Acrobat 4.0 
(2001) - PDF 1.4 / Acrobat 5.0 
(2003) - PDF 1.5 / Acrobat 6.0 
(2005) - PDF 1.6 / Acrobat 7.0 
(2006) - PDF 1.7 / Acrobat 8.0 

The Four Types of Adobe PDF documents

With scanning software, volumes of legacy paper documents may be converted to Adobe PDF so you can search, annotate, publish, and archive all of your information in a digital environment. 

However there are four different types of Adobe PDF for use with paper-based documents:

PDF Image Only
PDF Searchable Image Exact
PDF Searchable Image Compact
PDF Formatted Text and Graphics

Each provides distinct advantages that enable you to customise your electronic files to meet your information needs. By examining the type of document you’re converting and how you intend to use the electronic file, you can choose the most suitable PDF option. 

The following table summarises the four types and the best use for each. 

PDF Type Notes
PDF Image Only • Transactional documents 
• Documents that don’t require searchable text
• The simplest scanning
PDF Searchable Image Exact 
(also known as PDF Image+Text)
• Documents that need to retain scanned images for  legal accuracy
• Documents that you need to be able to search quickly
• Full-colour documents
• The OCR'd text is held 'behind' the document to allow full-text searching
PDF Searchable Image Compact  • Documents that need to retain their original look but must be reduced to the smallest possible file size for network distribution or Web posting
• Possibly 'lossy' compression
PDF Formatted Text and Graphics 
(also known as PDF Normal)
• Documents that need to have the highest possible on-screen viewing and printing quality
• Documents that must be reduced to the smallest possible file size for network distribution or Web posting
• A conversion from a scanned image to a text + graphics document
• May be labour intensive to achieve an accurate document
• May not be legally acceptable


PDF Image Only

PDF Image Only takes a bitmapped image of a document (like a TIF file) and applies a PDF wrapper to that raster image. Because PDF Image Only files do not contain OCR text, their content is not searchable. But the file can be integrated with other PDF documents and read by anyone on any platform with Adobe Acrobat Reader software. In addition, you can add keywords to the file, so you can search for the document later.

PDF Image Only is ideal for transactional documents, such as invoices and forms. For example, you can use Image Only to scan invoices into an imaging archive. Digital versions of invoices must be absolutely faithful to the originals, yet they are rarely retrieved once they have been entered into the system. When an invoice does need to be retrieved, it can easily be found with an index search for the invoice number or customer name.

PDF Searchable Image Exact and Compact

A document created in PDF Searchable Image offers the best of both worlds—an exact replica of the original document that is also fully searchable. PDF Searchable Image files contain two layers: a bitmapped layer and a hidden text layer. The bitmapped layer maintains the visual representation of the original document. The text layer contains the OCR version so you can search for any word on any page. PDF Searchable Image comes in two variants: Exact and Compact. These two are similar in many ways, but they have a few key differences.

Exact

The Exact version of PDF Searchable Image—also known as PDF Image+Text—is great for preserving your most richly coloured, intricately designed documents. This PDF flavour stores image information on one layer and maintains a text version of the document on another hidden layer, so you can easily search your documents. 
The Exact option preserves
colour as 8-bit to 24-bit files, so you can distinguish between shades of the same colour and between multiple colours on a page. The trade-off is a larger file size. PDF Searchable Image Exact may not be the best option where small file sizes are required. However, if you are archiving corporate information, your need for accurate, searchable files may outweigh concerns about file size. In that case, PDF Searchable Image Exact may be preferable.

Compact

PDF Searchable Image Compact uses a new colour-segmentation process to create small file sizes from certain types of colour documents. The Compact format is advantageous when the document you need to scan has some regions that are colour images and some regions that are monochrome (for example, text in any two colours).
With the Compact option, a page is segmented into two types of regions. Image (colour) regions are stored within the PDF file as JPEG data. Text (monochrome) regions are stored within the file as G4 or Zip compressed data.
Depending on how large the text regions are in the original document, this storage process can substantially reduce file size.
By producing smaller files, PDF Searchable Image Compact makes it easier for you to share your electronic documents, output them to printers, and post them on your Web site. 

 

PDF Formatted Text and Graphics

PDF Formatted Text and Graphics - also known as PDF Normal - is the most widely seen type of PDF file. It is the usual PDF output produced from a text processing or authoring environment, such as Microsoft Word. It contains the full text of the page with appropriate coding to define fonts, and font sizes, and so on. Many applications typically come with a 'Save As PDF' or 'Print To PDF' function, which allows the user to convert their documents to PDF Normal.

This format does not use bitmapped images but has true computer-generated text and graphics, using only one layer. This makes PDF Formatted Text and Graphics the most compact of the four PDF types.

Additionally, documents created in this type are fully searchable, and they look as good and print as well as files generated from software applications. Rather than viewing a scanned image, you see computer-generated text and graphics that scale and retain their crispness on-screen and in print.

If used for scanned documents, creating PDF Normal becomes significantly more complicated and expensive. The scanned image is not stored but is converted using OCR to text and graphics. This requires proofreading and correction.
The trade-off for these compact, high-quality files is that it takes more time and effort to ensure they are 100% error-free. 
Because the image text is replaced with formatted text, PDF Formatted Text and
Graphics files are easy to read. And because this type of PDF produces small file sizes, you can easily post these files to the Web or e-mail them to clients and colleagues.
PDF Formatted Text and Graphics is also ideal for out-of-print documents, such as rare books. Once scanned, the material can be converted to computer-generated text and graphics and then fine-tuned to produce a version of the original with identical or, in many cases, greatly improved quality.

 

PDF/A

PDF/A is a variation of PDF for document archiving and is known as International Standard ISO 19005. PDF/A is based on PDF release 1.4 (Acrobat 5.0).

 

 

See also TIFF file formatPDF/A archiving format and PDF Searchable

Alliance BatchScan can scan into PDF formats.

 

* PDF File Format * PDF Searchable Images * PDF/A * Portable Document Format * PDF File Format *

Changing LINKS