Part of my responsibility at the library is managing an archive of digitized materials including books, pamphlets, maps, photographs, and more. When most people look at digital archives they see how easy it is to use the materials and are probably unaware of the journey the items took to arrive on their screen. Today I have tried to summarize the steps that the staff at our library take to go from physical items on our shelves or in our archives to digital objects available for you to search from home.
This process will be different for all institutions depending on their source material, budget, environment, staffing, procedures, priorities, hosting service, etc. But I hope this will provide a little insight so that next time you visit an online archive you can have a better appreciation for the complexity of digitizing historical collections.
What people think digitization is:
What digitization actually is:
- Select an item to digitize: Is this item worth digitizing compared to the time and resources it will require? How does it contribute to our mission? How does it compare to other items we have or have not digitized? How does it contribute to presenting a broad and diverse historical record? Will it be useful and/or interesting to the public who accesses our digital archive? What is the copyright status of the item and does that impact our decision to digitize it? Has the item already been digitized by another organization? Is it in good enough condition to subject to the scanning process? Is it the most complete and best quality copy available?
- Prepare the item for scanning: Does the item need to be disbound? Does it need to be removed from a case? Are there folded pages that need to be unfolded? Are there tears that need to be mended? How will inserts be handled? How will missing pages be handled? Can the item be scanned using equipment we have, or will it need to be sent to a out to a vendor that can handle fragile/oversized materials?
- Prepare the scanner and software: What are the appropriate scanner settings? What DPI should be used? What file format should be used? What color depth should be used? Is descreening necessary?
- Scan the item: Is the item straight? Does it need to be weighted? How closely should it be cropped?
- Save the images: What will the file name be? How will it be numbered? Where in the folder scheme will it be stored? How large is the file compared to the available space? Will the file be backed up? What media is this being stored on (external drive, internal drive, network drive)? Is the file being compressed in transfer to the storage media?
- Edit the images: Does the image need to be straightened? What if the item is not squared? How closely should it be cropped? Are any other color corrections or alterations necessary? Do multiple scans need to be knit together to form a larger image? How many versions will be saved and in what formats?
- Review images for quality control: Have any pages been skipped? Are any pages duplicated? Are any images cropped too closely? Are any pages out of order? Have page numbers/names been appended to the file names?
- Package the images into a single object: Which version of the images will be used for the final object? Have all the images, and only the necessary images, been imported to the application? Did the application report any import errors? Will the images remain as a group of images or converted to a PDF?
- Add metadata to the object: What metadata should be associated with this item to make it discoverable? What fields should be left blank? What metadata standards govern formatting individual fields? What rules are imposed on the metadata by partner organizations who will automatically import the object from our site? What authorities, such as the Library of Congress, can supply standard terms for some of the fields? Will OCR be used to generate a transcription or a manual transcription? Is it worthwhile to take the time to manually check the quality of the OCR text?
- Upload the object to the hosting service: Has the item and associated metadata been uploaded for approval? Who will approve the final item? Does the item appear on the site the way it is supposed to appear?
- Handle the remaining physical item: Will the original item be returned to the shelf or placed in an alternate location, such as storage? Will a note be placed on the physical item that it has been digitized? Will patrons will be given access to the physical item or will they be pointed to the digital version?
- Provide patron services related to the digital object: Does the catalog need to be updated with a link to the digital object? Are there any other channels that need to be informed of the upload, such as social media sites, administrative reports, reference staff, newsletters, or partner organizations? How will patron requests for high-quality scans of the digital object be handled? How will usage statistics for the item be recorded and what decisions will they inform?