*
AMS Adviser *
Volume 3 Issue 3 - May / June 1998
Welcome to a new issue of the AMS Adviser.
This month we cover and the Kodak
ds 3500 Scanner
Also, we continue with the second part of a new series on the Dynamics of Cost in
Document Capture.
Take special note of our new services.
Plus we have all the usual bits (Funny
Bit? and AMS Services).
AMS
The Dynamics of Cost in Image Capture (Part 2)
In the last issue under the heading of
"The Elements of Production Document Capture" we covered Batch Preparation and
Document Scanning. In part 2 we continue with the same topic and look at the various
stages of batch scanning.
OCR and Image Cleanup
Optical character recognition is frequently used in
production capture systems to extract information about a document directly from the
document itself. There are two forms of OCR: zonal and full-text. Zonal OCR is
typically used on forms, where only specific fields on the form are of interest. Full-text
OCR is used on free-form documents, such as legal briefs, to read the entire document and
then prepare a searchable, full-text index of the document.
Image cleanup is a broad term that includes various methods for cleaning up scanned
images to make them more readable. Techniques include:
- Deskewing, despeckling, deshading, streak removal, and other basic cleanup functions
- Line removal and character reconstruction for use on forms
- Edge enhancement, which sharpens character edges to increase OCR accuracy
In addition, enhanced thresholding options are available on some scanners (for example,
Fujitsus IPC2 and Bell+Howells ACE). All of these techniques make the images
more readable, increase the accuracy of OCR, and assist the indexing process.
Indexing
Indexing consists of creating meaningful descriptive
information for each scanned document and then writing this
information into a database that will be used to retrieve the images later. In most cases,
the index information is entered by a keyboard operator based on information on the image
itself, an operation known as "key from image." In some cases, however, the
index information is extracted automatically from the images via a recognition process --
typically optical character recognition or bar code recognition. Some indexing information
may also be assigned automatically to all images included in a particular batch.
QA and Rescanning
Quality assurance entails systematic reviews and
checks to ensure that the scanned images are readable and the indexes are accurate. It
includes methods for flagging bad images and explaining why or how images should be
rescanned, as well as correcting errors or shortcomings in indexing. The QA step can be
performed either by a QA operator or by an index operator.
Release
Release is the final stage of the capture process,
and consists of handing off batches of in-process images and index information to users of
the document imaging system. Typically, this is when the document images are written to
optical disk or other long-term storage, and the associated index information is merged
with the document database of the larger system. In addition, the release of a document
might trigger a workflow process, initiate the foldering and filing of documents, etc.
In the next issue
we cover "Analysing Capture Costs".
This series has been reproduced from a Kofax
white paper and will be continued over the next few issues.
As usual, the whole article is available for those impatient types. Just
contact AMS and it can be mailed, faxed or
e-mailed in full.
Go to top.
Kodak
ds 3500 Scanner
![]()
|