.comment-link {margin-left:.6em;}


Travels into Several Remote Digital Realms of the World
PART I: A Voyage to Libraryland

My Photo
Location: Champaign, Illinois, United States


Can images be metadata?

I spent some time last week reviewing various metadata formats for the Metadata in Theory and Practice course. It occured to me that discussions about metadata priveledge text over other ways we might provide metadata for cultural heritage artifacts.

Often museums create records that describe physical artifacts and attach an identification image to those records. In some ways this ID image is serving as a form of visual description, just as the textual metadata does.

Some of this bias may be due to the fact that the tools we have for analyzing images do not have a sufficient level of visual literacy to be able to extract meaning from the images. Image formats also differ from text, because they are essentially a long uninterrupted string of bits. In "Markup Systems and the Future of Scholarly Text Processing," Coombs, et al. mention ancient practices of scriptio continua in which there is no whitespace in the text, just a continuous string of characters. In essence, this is what we have for images today, proably even less so since the right reader can make pretty good guesses about what the words are in a scriptio continua.

For text we moved from presentational markup to descriptive markup to solve some of the problems in encoding meaning in texts. I'm now wondering what would a similar system look like for creating meaning out of the undifferentiated bits in an image. Web services like Flickr are allowing users to crudely "tag" portions of an image with text (I assume using outside textual metadata). I'm not familiar enough with the bits under the hood for common image formats to tell whether it would ever be possible to markup portions of the image the way one does with text. Medical and astronomic imaging might provide some hints, but generally they start with a set of data that gets represented visually.

Within METS one can specify an of a visual image, but I need to look more closely about how the coordinates of the area are represented. Could it be possible to export a vector map from Photoshop into a METS record that would allow me to associate textual metadata with just a portion of an image?

Imaged-based searching also seems like an area to explore, although like simple text indexing that matches character strings, it appears focused on colors and shapes - not the "meaning" of those color and shapes. Some form of image markup (possibly still relying on text) could serve the same purpose that descriptive markup provides for literary texts. Importanly image markup could provide the contexts that make shapes and colors meaningful.

While I haven't seen anything clearly state this, there do seem to be assumptions in practice that suggest that images are metadata in certain contexts. How can we refine and explicitly state this practice?

"Markup Systems and The Future of Scholarly Text Processing." Communications of the Association for Computing Machinery, 30, no. 11 James H. Coombs, Allen Renear, and Steven J. DeRose (1987).


Anonymous Anonymous said...

Hi Richard,

I enjoyed your musings about how images could be thought of as metadata - and it reminded me of project I was involved with in an advisory position of sorts a while back, the Union Catalog of Art Images (UCAI). The project team always spoke of the thumbnails which were supposed to populate the catalog as "metadata" - they weren't useful in and of themselves (too small), but they were useful for pure identification purposes, just as a title and a creator name would be. Unfortunately, UCAI folded not too long ago - for those interested, I've written a short post-mortem at http://hangingtogether.org/?p=31, and you can find the UCAI project pages at http://gort.ucsd.edu/ucai/. And yes, you can tag areas of images with METS - I'm sure Jerry will be just too delighted to show you how!


2/13/2006 12:48 PM  
Blogger Richard Urban said...

I'm sure he will!

It occured to me later that some intelligance agencies might be developing similar capabilities, e.g. detecting familiar shapes in satellite imagery.

I assume this would require a lexicon of shapes that a system could recognize, similar to the way OCR works for text. I'm not sure how one would do this for all of cultural heritage, but within a limited and well defined scope it might work.

2/13/2006 11:02 PM  
Anonymous Mark said...

This post was a Ringleader's Selection for the Carnival of the Infosciences #25.


2/20/2006 12:23 PM  
Blogger Richard Urban said...

A quick follow-up on this. For other purposes I've been reading Introduction to MPEG-7: Multimedia Contetn Description Interface (ISBN:0-471-48678-7). It includes an entire section on "Visual Descriptions," including color, shape, texture and motion. These still rely on text-based metadata but do seem fairly expressive. Some of this metadata would come from capture hardware (such as camera settings, camera motion sensors, etc.), but much of it would still require human intervention. Examples include frames from a soccer game where players are "tagged" along with the ball and the goal posts. Interesting...

2/20/2006 8:30 PM  

Post a Comment

Links to this post:

Create a Link

<< Home