|
|
PDF Portable Document Format
|
| PDF Type | Notes |
| PDF Image Only | • Transactional
documents • Documents that don’t require searchable text • The simplest scanning |
| PDF Searchable
Image Exact (also known as PDF Image+Text) |
• Documents that
need to retain scanned images for legal accuracy • Documents that you need to be able to search quickly • Full-colour documents • The OCR'd text is held 'behind' the document to allow full-text searching |
| PDF Searchable Image Compact | • Documents that
need to retain their original look but must be reduced to the smallest
possible file size for network distribution or Web posting • Possibly 'lossy' compression |
| PDF Formatted Text
and Graphics (also known as PDF Normal) |
• Documents that
need to have the highest possible on-screen viewing and printing quality • Documents that must be reduced to the smallest possible file size for network distribution or Web posting • A conversion from a scanned image to a text + graphics document • May be labour intensive to achieve an accurate document • May not be legally acceptable |
PDF Image Only takes a bitmapped image of a document (like a TIF file) and applies a PDF wrapper to that raster image. Because PDF Image Only files do not contain OCR text, their content is not searchable. But the file can be integrated with other PDF documents and read by anyone on any platform with Adobe Acrobat Reader software. In addition, you can add keywords to the file, so you can search for the document later.
PDF Image Only is ideal for
transactional documents, such as invoices and forms. For example, you can use
Image Only to scan invoices into an imaging archive. Digital versions of
invoices must be absolutely faithful to the originals, yet they are rarely
retrieved once they have been entered into the system. When an invoice does need
to be retrieved, it can easily be found with an index search for the invoice
number or customer name.
A document created in PDF Searchable Image offers the best of both worlds—an exact replica of the original document that is also fully searchable. PDF Searchable Image files contain two layers: a bitmapped layer and a hidden text layer. The bitmapped layer maintains the visual representation of the original document. The text layer contains the OCR version so you can search for any word on any page. PDF Searchable Image comes in two variants: Exact and Compact. These two are similar in many ways, but they have a few key differences.
Exact
The Exact version of PDF Searchable
Image—also known as PDF Image+Text—is great for preserving your most richly
coloured, intricately designed documents. This PDF flavour stores image
information on one layer and maintains a text version of the document on another
hidden layer, so you can easily search your documents.
The Exact option preserves colour as 8-bit to
24-bit files, so you can distinguish between shades of the same colour and
between multiple colours on a page. The trade-off is
a larger file size. PDF Searchable Image Exact may not be the
best option where small file sizes are required. However, if you are archiving corporate
information, your need for accurate, searchable files may outweigh concerns
about file size. In that case, PDF Searchable Image Exact may be preferable.
Compact
PDF Searchable Image Compact uses a
new colour-segmentation process to create small file sizes from certain types of
colour documents. The Compact format is advantageous when the document you need
to scan has some regions that are colour images and some regions that are
monochrome (for example, text in any two colours).
With the Compact option, a page is segmented into two types of regions. Image (colour) regions are stored within the PDF
file as JPEG data. Text (monochrome) regions are stored within the file as G4 or
Zip compressed data.
Depending on how large the text regions are in the original document, this
storage process can substantially reduce file size.
By producing smaller files, PDF Searchable Image Compact makes it easier for you to share your
electronic documents, output them to printers, and post them on your Web site.
PDF Formatted Text and Graphics - also known as PDF Normal - is the most widely seen type of PDF file. It is the usual PDF output produced from a text processing or authoring environment, such as Microsoft Word. It contains the full text of the page with appropriate coding to define fonts, and font sizes, and so on. Many applications typically come with a 'Save As PDF' or 'Print To PDF' function, which allows the user to convert their documents to PDF Normal.
This format does not use bitmapped images but has true computer-generated text and graphics, using only one layer. This makes PDF Formatted Text and Graphics the most compact of the four PDF types.
Additionally, documents created in
this type are fully searchable, and they look as good and print
as well as files generated from software applications. Rather than viewing a
scanned image, you see computer-generated text and graphics that scale and
retain their crispness on-screen and in print.
If used for scanned documents, creating PDF Normal becomes significantly more
complicated and expensive. The scanned image is not stored but is converted using
OCR to text and graphics. This requires proofreading and correction. The trade-off for these compact, high-quality files is that it takes
more
time and effort to ensure they are 100% error-free.
Because the image text is
replaced with formatted text, PDF Formatted Text and Graphics
files are easy to read. And because this type of PDF produces small file
sizes, you can easily post these files to the Web or e-mail them to clients and
colleagues.
PDF Formatted Text and Graphics is
also ideal for out-of-print documents, such as rare books. Once scanned, the
material can be converted to computer-generated text and graphics and then
fine-tuned to produce a version of the original with identical or, in many
cases, greatly improved quality.
PDF/A is a variation of PDF for document archiving and is known as International Standard ISO 19005. PDF/A is based on PDF release 1.4 (Acrobat 5.0).
See also TIFF file format, PDF/A archiving format and PDF Searchable
Alliance BatchScan can scan into PDF formats.
* PDF File Format * PDF Searchable Images * PDF/A * Portable Document Format * PDF File Format *