Newspapers embracing
metadata in image workflows
RIT study finds that newspapers
are putting digital data to good use.
By Howard Vogl and Franziska Frey
Special to Newspapers & Technology
In her
classic book “Drawing on the Right Side of the Brain,” Betty Edwards explains
that all of us have the ability to draw if we learn to incorporate the creative
right side of our brains with the logical left side, forming a holistic
environment.
In a similar way, digital
images used by newspapers have both a visual and a technical personality.
The visual portion — the image
— is easy to see.
The technical side, however,
is equally compelling, containing a wealth of information that details a file’s
basic elements — from camera settings to the conditions surrounding the image
capture (see Figure 1).

Fig. 1: A display of the technical metadata as
generated by Adobe Bridge, and the actual image reflected by that metadata,
below.

Mining this information, known
as technical metadata, can benefit newspapers planning their workflow,
management and archiving operations.
But before newspapers can
exploit the value these metadata can provide, they first have to understand how
they can be used in a news environment.
And to do that, they need to
understand the gap between the visual and technical aspects of newspaper digital
images.
To that end, the Rochester
Institute of Technology launched a research study to examine how five large
newspapers used technical metadata in their workflows.
The five evaluated were the
Chicago Tribune, the New York Times, USA Today, the Washington Post and the
Rochester (N.Y.) Democrat and Chronicle.
The study looked at the
digital imaging workflow of each organization and attempted to answer two main
questions: First, how do these newspapers currently use technical metadata in
their digital imaging workflow? Second, where could technical metadata be used
in the future to improve the creative and editorial processes used within these
organizations?
XML and RDF
The storage and transfer of
technical metadata is underpinned by two protocols: XML and RDF.
XML, or eXtensible Markup
Language, stores information by means of user-defined tags that permit the
exchange of data between disparate applications and systems.
Importantly, XML enables
computer systems not only to exchange data, but it also allows systems to
extract meaning from data in much the same way humans extract meaning from text.
Humans rely on where each word
sits in a sentence to deduce meaning. Likewise, computer applications need to
have their “words” arranged in a certain way, according to a specific syntax.
That’s where RDF, or Resource
Description Framework, comes in. Simply, RDF provides the syntax for creating
XML statements that systems can understand (see Table 1, next page).

Recently, several major
software designers have embraced XML and RDF as a means to embed metadata in
image files.
Standards
Once a method exists for
systems to exchange information in a meaningful way, the logical next step is to
determine the data to be embedded in the image file.
This issue has been addressed
by several organizations, including the National Standards Information
Organization, whose technical metadata standard was approved in December 2006.
NISO’s Technical MetaData for
Still Images standard includes attributes that describe both the device settings
as well as the conditions surrounding image capture. The group identified 111
separate fields of technical metadata, organized into four distinct categories
(see
http://www.niso.org/standards/index.html).
They are:
•Basic image parameters - This
includes technical metadata that is considered to be fundamental to the
reconstruction of the image. These would cover such elements as file type, file
size, image resolution, image compression and embedded ICC profiles.
•Image creation metadata -
This includes data surrounding image capture and the capture system used. Image
creation metadata contains fields such as a unique image identifier, the creator
of the image, details of the capture device and its settings, subject distance,
scene illumination, and GPS location.
•Image performance assessment
- Image performance assessment consists of metrics that access the accuracy of
current image reproduction as well as the preservation of images during
migration to a new file format. These include high-level spatial metrics such as
sampling frequency and the source X, Y dimension, as well as more familiar
lower-level metrics, such as information about test targets that serve as a
reference for image analysis.
• Change history - This
represents the cumulative record of image processing steps that have been made
since the capture of an image.
The five newspapers were asked
to rate the importance of each category of technical metadata.
As expected, metadata
describing basic image parameters was judged most important.
That’s not surprising, since
fields such as file size, file type and level of compression are essential to
digital newspaper workflows.
However, the use of image
creation metadata were not as widespread, except for fields that established
date and time of capture or assigned each image a unique identifier.
Additionally, image
performance assessment metadata, such as test targets, were not widely used. One
exception: the Democrat and Chronicle, which printed test targets on a quarterly
basis and made adjustments based on target analysis.
Change history metadata fell
victim to the short cycle times required for publication. Again, that isn’t
surprising, especially when you take into consideration that a history of change
is automatically kept as images are saved throughout the digital workflow.
The future of technical
metadata
As newspapers publish more of
their content on multiple media platforms, the use of technical metadata could
yield newspapers great benefits.
For example, a paper could use
metadata to pinpoint the time when a particular image was captured. It could
also use metadata to automate cross-media workflows, leveraging the fact that
the process of repurposing the same image for different media is largely
repetitive.
These practices are gaining
steam even as verifying the details of image capture becomes more complex.
This complexity occurs, in
part, because large newspapers tend to create their own image databases for use
within their own organization.
Second, the growing practice
among newspapers to extract still images from video makes it more difficult to
verify each image’s authenticity.
Given the fact that computer
storage space is inexpensive, it would be practical for newspapers to include
technical metadata currently not now considered necessary. This procedure could
prove valuable in the future as the use of technical metadata increases.
Unlocking metadata
Lowering the barriers blocking
the increased use of technical metadata has the potential to provide newspapers
with a significant improvement in their digital workflows.
The first step is to ensure
that technical metadata is embedded in image files.
Whenever possible, technical
metadata that was not included at the time of image capture should be embedded
automatically. The digital image cycle time in the newspaper industry is too
short to trust the inclusion of technical metadata to manual entry and
newspapers should avoid relying on workers to manually populate technical
metadata fields.
Second, metadata must work
seamlessly with metadata embedded for other purposes, such as accounting,
reproduction or archiving.
Third, metadata must be
preserved and retrievable regardless of how many images are placed in a single
document.
Because technical metadata
represents the conditions and settings of image capture, it is understandable
that in order to be useful metadata must be presented to the user by means of a
well-designed graphic user interface.
Therefore, it is expected that
the increased use of technical metadata in the newspaper digital imaging
workflow will only occur through the development of intuitive GUIs that allow
users to visually adjust images using technical metadata that resides
transparently in the background.
Barriers to adoption
The study found that the
newspaper industry defined image quality in ways that went beyond creating
pleasing images.
In fact, all of the newspapers
studied said image quality had to include data that verified the authenticity of
the original image capture, the consistency between visual and reported
information and the difference between previous image captures.
To address these matters, the
five newspapers stressed clear communication between departments engaged in a
cooperative photo editing process.
In fact, this process, which
can be called editorial integrity, was found to be the common denominator in the
newspaper industry’s definition of image quality.
The need to examine each image
to determine its editorial integrity is in some instances at cross-purposes with
the ability to use technical metadata in automating newspapers’ digital
workflows.
Based on the results of this
study, however, there are several steps that could be taken to bridge this gap
and to increase the usefulness of technical metadata.
The use of technical metadata
offers newspapers the opportunity to improve their digital imaging workflow
while maintaining the image integrity that has become the hallmark of the
newspaper industry.
How quickly newspapers will
embrace technical metadata, however, will depend on how successfully software
designers respond to newspapers’ multifaceted definitions of image quality.
Like a fine work of art, the
underlying structure of technical metadata must be presented in a holistic
fashion.
Howard
Vogl is a visiting professor in the School of Print Media at the Rochester
Institute of Technology. He can be reached at
hevppr@rit.edu. Franziska Frey is the McGhee Professor at the School of
Print Media at the Rochester Institute of Technology and is also an extended
faculty member in RIT’s Center for Imaging Science.