The aim of these guidelines is to provide local historians with an introduction to the MrSID and DjVu computer file formats which promise significant improvements for the delivery of high resolution digital images and fully illustrated documents, respectively, over the Internet or by other electronic means such as CD-ROMs.
|2.||MrSID Image File Format|
|3.||DjVu Document File Format|
|4.||Further Information and Downloads|
|5.||Update January 2005|
1. Introduction [top]
The well established digital image formats such as JPG/JPEG, TIFF, PNG, GIF can handle extremely high resolution images providing the image is very small, otherwise the file sizes involved soon become prohibitive. An alternative file format that overcomes these problems has been developed as part of a US Government research programme investigating digital storage techniques for tens of millions of fingerprint records. It was realised that this technology had a much wider application and therefore in 1992 the US Government licenced the technology to a private company called LizardTech Inc. that was formed to exploit these developments.
In the new format an image is stored in a computer file called a Multi-resolution seamless image database (MrSID). This permits the production of highly compressed images for transmission over the Internet without visible compromise of quality. LizardTech have also utilised this technology in their new approach to encoding documents called DjVu.
A full colour A3 sheet scanned at 600 samples per inch will occupy a file size of more than 200 MB in an uncompressed format. The use of a "lossless" compression technique (which means that after encoding, the reconstructed image will be identical to the original image) will reduce this. The TIFF Group 4 format is one example of lossless compression. A higher degree of compression is possible by using a "lossy" compression technique (e.g. JPEG) where after encoding, the reconstructed image may be different from the original image. This inevitably results in some loss of quality when "zooming in" onto a detail of the image. However, whether it is 20MB or 200MB no one is going to wait the one or ten hours it will take to download over a 56K dial-up Internet connection.
This gives a clear indication for the need for improved file formats like MrSID to handle documents of particular interest to historians such as maps. Section 2 gives a brief introduction to this format and Section 3 provides the same for DjVu. Details of how to get further information on either format are given in Section 4 along with links to free downloads for the MrSID and DjVu software and sample files.
2. MrSID Image File Format [top]
LizardTech's MrSID used with GeoExpress is a means of encoding large, high-resolution images to a fraction of their original file size while effectively maintaining the original image quality. Images can be reduced, enlarged, zoomed, panned or printed without compromising integrity. MrSID eliminates the need for multiple image files for different purposes as a single image file can be used for anything from a Web site thumbnail to a billboard-size poster. MrSID files are supported in commonly-used graphic applications and standard Web browsers.
LizardTech has developed a facility called 'Selective Decompression', which gives users only the pixels they need, when they need them, thus conserving bandwidth and processing time. They claim that encoded files may be as small as 3% of the size of the original source files. They further claim that MrSID's high-encoding ratios result in the greatest overall file-size reduction in the industry without visible compromise of quality and that it can provide the greatest portability for high-resolution images, no matter what the original file sizes.
A number of large institutions appear to be adopting MrSID to deliver high quality images. The National Library of Scotland claims on its website that its map library is among the ten largest in the world. It has made a large number of its maps and plans available on the Internet in the MrSID format.
3. DjVu Document File Format [top]
LizardTech's DjVu used with Document Express (pronounced like the French 'déjà vu') is an image encoding technology specifically designed for scanned document pages such as books, magazines, catalogues, newspaper articles, technical publications, ancient and historical documents. The technology employed is similar to that for MrSID but whereas MrSID is used for single, possibly large format, images DjVu is applicable to multipage documents comprising text and images. In some respects it is an alternative format to the popular Adobe Acrobat PDF format for documents. However, while DjVu may produce smaller documents for distribution via the Internet the Adobe Acrobat format has become so well established in the print world that DjVu is unlikely to be a serious rival in that area.
DjVu produces extremely compact files by separating a printed page into two types of objects: characters, and pictures. The pictures are smooth, do not have too many edges, and have colour content, while the characters have many hard edges, but do not have much colour and texture content. Thus the two components should be separated and encoded with different methods. In DjVu, the backgrounds and pictures are coded with a wavelet-based technique, while the text and drawings are coded with a new bi-level compression technique.
Conventional image-viewing software decompresses images in their entirety before displaying them. This is impractical for high-resolution document images since the large file sizes involved typically exceed the capacity of most PCs. DjVu, on the other hand, never decompresses the entire image, but instead keeps the image in memory in a compact form and decodes only the piece displayed on the screen in real time as the user views the image. As a result, the initial view of the page loads very quickly, and the visual quality progressively improves as more bits arrive. LizardTech claim that DjVu can achieve file size reduction ratios as much as 500:1 while preserving excellent image quality.
DjVu performs "lossy" encoding and thus like JPEG and many image encoding algorithms, DjVu allows the loss of some high frequency information to achieve high compression rates. However, unlike JPEG, this is not done at the expense of document readability.
DjVu documents can be created directly from images produced by a scanner or in the commercial version optical character recognition (OCR) technology may be employed to convert text images to machine readable text.
4. Further Information and Downloads [top]
You will need to download and install the MrSID or DjVu 'plug-ins', respectively, for your Internet browser before you can view MrSID or DjVu files that are on your computer. However if the files exist on a host server then it is likely that the host server will do all the necessary work at the server end and deliver results direct to your browser without the need for a plug-in. Try it and see!
Examples of the National Library of Scotland excellent map collection are available in MrSID format
5. Update January 2005 [top]
The MrSid format continues to be used for certain specialist sites like the National Library of Scotland map collection referred to in the previous section. However, the DjVu format has not been taken up generally for a number of reasons and it is beyond the scope of this article to discuss these. Furthermore, the Adobe Acrobat PDF writing software now has vastly improved image compression technology from Version 6 onwards - see Guidelines No 8. The Adobe Acrobat software now uses the JPEG2000 standard for photographs and JBIG2 for text and line art (Professional version only). The DjVu format is capable of producing smaller file sizes than these methods and so may still be of use in certain applications. However, the universal acceptance of PDF documents means that the DjVu is unlikely to achieve its early promise. If sufficient compression can be achieved using the Adobe Acrobat PDF (Professional) software then this is probably the best approach to use as the Acrobat Reader software is now found on most computers.
6. Concluding Remarks [top]
The use of high compression, high resolution digital image techniques is becoming more widespread. The MrSID file format developed by LizardTech Inc. is a useful standard for, large format, high resolution applications such as maps and pictures. The DjVu document format also from LizardTech Inc. is probably the best approach where very high compression is demanded but it appeared to have been eclipsed by the latest Adobe Acrobat PDF writing software for general use.