The St. Louis Post-Dispatch is
cutting the time formerly used to manage agate copy by rolling out a home-grown
app that automatically formats the tiny typeface, regardless of the source.
Using a script developed by
Newsroom Technology Director John Hurst and News Administration Director Charles
Arms, the newspaper (daily, 277,842; Sunday, 423,291) can process agate in
seconds, all with the push of a button.
Hurst said the paper is
deploying the tool one section at a time, in conjunction with its implementation
of MediaSpan’s Jazbox editorial app, which uses Adobe InCopy for word processing
and InDesign for page design.
The script has already been
employed for its sports agate and other features, and Hurst said it will be used
in the business and news sections as well.


Agate copy as it appears before and after
running through the functions of the Post-Dispatch's home-grown scripting app.
Graphics: St. Louis Post-Dispatch
Mixture of VB
The script, a mixture of
Visual Basic and the InCopy/InDesign document object model, runs under InCopy,
Hurst said.
“We take a sports agate file
or a recipe or a movie listing, for instance, from any source and the script
sets the file up for use on an InDesign page. It adds all the attributes - rule
lines, style, extra leading, tabs -and stylizes the file to get it ready for the
paper.”
What makes the approach unique
is that the script analyzes each line of agate individually to determine what
rules to apply, rather than forcing someone to hand-code or use macros, Hurst
said.
The script analyzes the copy
one line at a time, beginning with the first, where it finds a keyword or code
that determines what to do with the file.
“If the keyword is ‘NFL
longbox,’ then we run the appropriate modules in the script to stylize and code
up that box,” Hurst said.
Once the script determines
what type of file it’s dealing with, it examines the second line, which goes
through an array - an ordered arrangement of data elements - created by the
programmers that contains the breakdown of the file, including all the unique
lines that make up the file.
“If you were to take a file
and look at each line separately, you would say ‘OK, the pattern of these five
lines matches the pattern of these three lines, matches the pattern of this one
line, etc.,’” Hurst explained.
The Post-Dispatch’s scripting
app relies heavily on the use of regular expressions, which work by pattern
matching.
Matching patterns
“Instead of looking to see if
a line contains the word football, we’ve written it so that it maybe looks for a
capital letter F followed by lowercase letters and ending in a lowercase l,”
Hurst said. “We’ve set the array up in such a way that if something matches a
certain pattern, there are things we know need to be done to that pattern.”
Once a pattern is matched, the
array performs the defined steps and moves on to the next line until it has
moved through every line of copy.
Arms said tabs were among the
Post-Dispatch’s most trying style issues for agate copy, and an area where the
scripting tool has been key.
“A tab has a lot of attributes
to it - right or left, leader lines, where its position stops in a column - so
if you want to assign tab attributes to a style you might literally end up with
500 styles,” he explained. “We have to have a style for every set of stock
values and there might be 10 lines where the typography is exactly the same, but
if the tabs were in different places that would require different styles.”
That proved a major obstacle
for the paper. Because even a simple box score might contain five sets of tabs,
the number of different styles governing those tabs would be overwhelming, Arms
said.
“You can imagine how many
styles we would have just for agate,” he said. “Now we have one agate style and
by scripting we can set tab-stop values and other attributes of the tabbing. It
works very well and keeps our style palette clean.”
Abandoning macros
The scripting tool also let
the Post-Dispatch abandon the use of macros formerly needed to manage agate.
“That entails finding a word
and then jumping down so many lines, and it’s very mechanically driven,” Arms
said. “That concept was around 20 years ago and it’s basically just memorizing
keystrokes.”
Unfortunately, Hurst said,
that is the method most papers are dealing with.
“The model we’re applying here
is - as far as I know - not done anywhere else,” he said. “Most papers are
hand-coding copy or using macros, which consists of recording keystrokes. So
you’re saying ‘go down five lines, find this word and put a colon on the end of
it - and if that word’s not there, it can’t do what it tried to do.”
Identifying lines by regular
expression is another benefit.
“The beauty of identifying
the lines by regular expression, is that for example, with something simple like
a major league baseball box score, on any given night, the nine starters may
play the whole game and then on another night they may have five different guys
pinch hit, so you never know how many lines are going to be in that box score,”
Arms said.
“You can’t use automated
cursor movements, you literally have to know when something starts and something
else stops. By having a regular expression pattern that represents each line, we
just move along and we know exactly when it’s changed regardless of how many
lines are in the agate file.”
The scripting tool has allowed
the Post-Dispatch to automate several styles unique to its paper - such as
automatically converting time references in AP wire stories from Eastern to
Central, formatting celebrity birthdays, which the Post-Dispatch handles
differently than many other papers that simply run the AP copy, identifying
winners and losers of pitching match-ups at the bottom of baseball box scores,
and creating unique overlines in NFL box scores.
The daily wrote in a special
function to create the overline in NFL boxes using team nicknames, based on the
city name provided by AP and the lines identified as the score.
“When we send ‘St. Louis’ over
to functions it brings back ‘Rams’ and we can automatically parse from that data
the winner, the loser, supply the nickname and write out that line,” Arms said.
These features previously
required an agate copy clerk to enter the information by hand and took several
minutes. Now they take between 5 and 20 seconds, Hurst said.
“Having the one-button
functionality is huge, but then the bigger benefit is that the (copy) is so
close to being publication-ready when this thing runs,” Arms said.
The process is instantaneous
and the paper boasts a 99 percent success rate with the app. Typical corrections
include things like shortening names that are too long for a particular line in
score boxes, Arms said.
When errors are identified,
they are immediately corrected so that the same one never occurs twice.
“We almost amaze ourselves in
that we’ve had so few errors,” Arms said.
Although Visual Basic anchors
the script, for which Hurst and Arms have written more than 15,000 lines of
code, Hurst said the app could have just as easily be written in Javascript as
well.
“It’s just a script that goes
underneath one of the InCopy folders and once you put it in the folder, InCopy
sees it and then you can assign it to a function key or run it from the Script
palette in InCopy,” he explained.
Showing off script
Hurst and Arms will be showing
their technology off to other newspapers at the next MediaSpan User’s Group,
scheduled in February 2007.
They both agree that this tool
could be even more widely implemented at the paper - in any copy that contains
recognizable patterns since the Visual Basic script allows manipulation of any
copy that can be recognized by a pattern.
“What makes this unique is the
way our model works,” Hurst said. “Because of the pattern matching we can apply
this any place that we want to automate something.”