Digital Libraries: The Year of the Open
By Roy Tennant -- Library Journal, 09/15/2007
Two events this year are ushering in a new era of openness—both in the source code and the file formats of commercial software. Adobe and Microsoft have announced technologies that are open and transparent (see “IDPF Hosts Digital Book 2007,” LJ 6/1/07, p. 27ff.). It is hard to overestimate the impact of these developments, since much of what they'll enable is yet to be seen. Still, they represent enormous potential for anyone interested in libraries, information technology, and coming digital services.
PDF evolves
Many people may not be aware that the Adobe Acrobat file format has long been an openly published specification, or that the full version of Adobe Acrobat could save an XML (extensible markup language) version of a PDF (portable document format). This trend is only intensifying as Adobe works with the Open Publication Structure (OPS) specification managed by the International Digital Publication Forum (IDPF).
The beta release of the Adobe Integrated Runtime (AIR)™ environment “allows developers to use HTML/CSS, Ajax, Adobe Flash®, and Adobe Flex™ to extend rich Internet applications (RIAs) to the desktop,” according to Adobe. Users can dynamically flow the text and repaginate as font size changes, thereby providing a much richer and more natural screen reading experience, not achievable with a standard Adobe Acrobat PDF file.
Adobe Digital Editions, which uses the OPS file format, is an AIR application. If you go to the Adobe Digital Editions site and download an ebook, you'll see the potential of this publishing platform (see “Digital Books Redux” in the link list).
The extensible office
An even larger development is the news that Microsoft is introducing a completely new (and open) file format with Office 2007. When you save a document in Word, PowerPoint, or Excel, the file will have the character “x” added to the typical filename extension, so “.doc” will be “.docx” in Word 2007. This signifies that the document is in XML, specifically OpenXML, a growing standard.
But there's more. If you add “.zip” to the end of the filename, turning “my.docx” into “my.docx.zip,” and then unzip it (by double-clicking on it), the “file” becomes a directory that reveals a package consisting of the document itself in XML as well as potentially a number of other components—for example, higher-resolution versions of the images in the document and the metadata describing it.
The true beauty of this design, however, is that it is extensible. Anyone can add components to this package. I could create a Dublin Core record describing my document, put it in the package, and zip it back up. When I give this document to someone else, it will have my contribution as well as the original files.
Implications for libraries
Documents will increasingly be open to other applications to manipulate, index, and transform. Librarians (and others) will find it much easier to capture files in their native format and do interesting things with them, such as indexing them for access and transforming them into canonical, standard formats for preservation, such as TEI (Text Encoding Initiative).
Also, the open “package” format of Microsoft files offers interesting opportunities for libraries to create metadata packages that can be inserted into the original document's ZIP configuration and transported transparently as one file. Only those who need to see the library metadata package have to check it.
With open software and file formats, the opportunities to enrich, expand, and embellish are unlimited. From such fertile fields innovation can flower. If we need a single word to describe 2007, I nominate open.
| LINK LIST | ||
| Adobe AIR labs.adobe.com/technologies/air | ||
| Adobe Digital Editions www.adobe.com/products/digitaleditions | ||
| Adobe Flex labs.adobe.com/technologies/flex | ||
| Adobe Mars Project labs.adobe.com/technologies/mars | ||
| Digital Books Redux libraryjournal.com/blog/1090000309/post/1840011784.html | ||
| Microsoft Open X Format msdn2.microsoft.com/en-us/library/aa338205.aspx | ||
| Open Office XML Formats www.ecma-international.org/memento/TC45.htm | ||
| Open Publication Structure (OPS) www.idpf.org/2007/ops | ||







