by Chris Lynn
Special to Newspapers & Technology
In the United States, newspaper publishers have
only begun to experiment with the notion of selling display ads online.
That’s not the case in the United Kingdom,
where Trinity Mirror plc has for the past three years posted display ads on its
newspapers’ Web sites.
Nearly half the population of the U.K. reads one
of Trinity’s titles. With more than 250 publications, Trinity is the largest
newspaper publisher in the country. The flags range from
million-plus-circulation national newspapers such as the Daily Mirror, Sunday
People and the Scottish Daily Record to regionals such as the Birmingham Post
& Mail, and specialty publications such as horseracing papers and magazines.
As with most print publishers, Trinity has
experimented with various online offerings. Today, its digital media group
operates Web sites for each of the national papers as well as 26 local portals,
dubbed the icNetwork. These portals — all of which have a common look and feel
— share editorial and advertising content generated by the national papers and
the local titles.
The portals also jointly access an advertising
database maintained by the U.K.’s Fish4 (“fish for”) Web site, a popular
search engine that contains information about jobs, homes and automobiles.
Unifying info flows
The portals’ disparate information sources
posed a challenge for Trinity when it first created the sites in the late ’90s.
System technicians and engineers wanted to integrate the content without
disrupting established workflows. To do that, Trinity decided to use a common
content management system to drive all the sites. In addition, managers opted to
base all of its digital workflow on Adobe Systems’ PDF standard.
The decision to centralize around PDF made it
possible for Trinity to rely on existing workflows without requiring any
substantial changes. Still, some modifications were necessary.
The PDF format was created by Adobe to ensure
that documents can be reproduced with near-identical appearance, independent of
the output medium. A PDF document has been likened to a ransom note — the
letters are glued to the page in specific positions. That means there is no
inherent means of knowing for certain that words or sentences necessarily belong
to the same story, particularly when they flow across columns or pages.
Initially, Trinity chose a manual solution to
this problem. System technicians developed a browser-based PDF previewer, which
enabled content managers to view thumbnail versions of PDF stories.
Improving the workflow
With that approach, stories posted on the Web
were copied and pasted into a Hypertext Markup Language template, or manually
rekeyed.
But that process, while satisfactory, wasn’t
very efficient. In a bid to automate the process, Trinity Technology Manager
Trevor Pervin tapped Norwich, England-based Iceni Technology Ltd. (www.iceni.com).
Trinity used two Iceni applications, first evaluating PDF conversion and
extraction application Gemini before ultimately deploying 50 copies of News
Studio, a more robust product that uses artificial intelligence to translate
from PDF to HTML all components of a news story — from headlines and subheads
to bylines and pictures.
The resulting workflow was much more productive
than using Gemini, according to Trinity Technical Architect David Wells. With
News Studio, each content operator converts up to 30 stories per hour — more
than 10 times as many as were processed by operators using Gemini.
Display ads next battle
Editorial content was half the battle. The next
phase centered on overcoming problems associated with converting display ads.
While creating a PDF version of a print ad was relatively simple, the result was
inadequate: Web visitors had to zoom in, then pan and scroll to view the ad’s
details.
Worse, there was no suitable method to search for
a specific advertisement unless key search identifiers could be copied over from
the ad production system or manually rekeyed.
To solve those problems, Trinity installed Iceni’s
Adstract ad conversion application. The software converts PDF documents to
eXtensible Markup Language documents, automatically identifying key fields based
on the content of an ad.
Once the ad has been converted, a database record
that identifies the ad’s contents is linked to a server that meshes with the
Fish4 system. The record includes images and enables users to view and search
for data. If for any reason an ad can’t be modified, Adstract converts the PDF
file to a TIFF format and passes the ad to an optical character reader
application. There, relevant data is extracted and the advertisement is returned
to the main workflow for database entry.
The human element
Human intervention has not been entirely removed.
No one wants to risk a customer’s wrath (or lose potential ad revenue) by
making an egregious error in the online ad, so an operator still reviews the
data and images that go online.
The browser-based review system is simple and
productive. As operators log on, they see which ads have been allocated for
review as well as each ad’s status.
The system has enabled Trinity to post thousands
of ads every week while using a relatively small number of operators.
“With 21 titles going through Birmingham alone,
we are managing massive amounts of data, and this will increase as we centralize
[in 2003],” Wells said. Next up: real estate ads. The publisher this year will
add property and housing to the automobile and job ads the system is currently
managing.
It is arguable that using a format as rigid as
PDF to recreate data that existed in a “repurposable” form upstream in the
production process is symptomatic of a badly designed workflow. But this is to
ignore the real world of newspaper production. Information technology budgets
and resources are stretched, print production managers are not interested in
diverting effort to support online systems and PDF is — for now — the common
denominator across multiple vendors’ advertising and production systems.
More important, what Trinity learned successfully
is a lesson upon which many newspaper publishers can capitalize. Clearly,
newspapers have to offer more options to their print advertisers. Through the
use of evolving technologies available from a wide variety of vendors,
newspapers can reap additional revenues by using editorial and display
advertising that can be used in multiple mediums.
Chris Lynn is an Atlanta-based consultant who
has held positions with vendors in Europe, Asia and the United States. He
specializes in working with European technology companies — among them Iceni
— planning their U.S. market entry. He can be reached at chris@hillamtech.com