The very act of publishing a daily
newspaper is often referred to in the business as “the daily miracle,” and
with good reason.
A vast range of resources provided
by a wide range of individuals has to flow ultimately into a single path,
converging at a single point: the printing press. Within the past five years
most newspapers have added an additional step to that procedure —
simultaneously updating a Web site, often on a more frequent basis than the
print product. Many of the editorial and publishing systems were installed at
these organizations before Web site production became commonplace, necessitating
a number of interesting techniques to get information out in a usable form and
manage it.
Bob Cauthorn, vice president of
Digital Media for the San Francisco Chronicle, which has been owned by the
Hearst Corp. since July 2000, has had overall responsibility for all of the
paper’s digital media efforts since he came aboard in September of that year.
“We get the text shipped to us
early every morning, on which we run processing to strip out any artifacts from
the editorial system. One of the interesting flukes in the way we’re
structured here, which is quite unusual in the business, but ends up being a
benefit for us, is that we actually get our material from the library archive,
after they’ve processed it first thing in the morning,” he said.
Cauthorn pointed out the benefits of
getting the information at that stage.
“What’s enabling about that, is
that we have a lot of additional meta-information that the library has provided
that would not be in there if we were simply grabbing the text at the
typesetter, which has been the case in other operations I’ve been involved in,”
he said.
This provides some benefits to the
online editors and ultimately the readers as well.
“For example, when we get a (San
Francisco Mayor) Willie Brown story, or maybe it’s a city government story and
Willie Brown’s name isn’t mentioned, there’s additional information there
that the library has provided, and I can do things with that information.”
Cauthorn also described how pictures
get from the photographers to the Web site.
“We have remote access to the
photo front end, since we’re housed in a separate building than the main
newspaper plant. Unfortunately it’s not quite as well--organized as it could
be, but we’re working with the photographers and news desk to get that fixed
because we spend too much time searching for these photos. We grab them; we
color correct them … we don’t do any cropping because we want the photo to
reflect what was in the newspaper. We’re coming up with a ‘foldering’
system, which will allow our photographers to submit to us, in these times of
tight newsprint, photos that can’t end up in the print newspaper,” he said
The Web site, www.sfgate.com, is run as an
in-house-developed system and does not use a third-party workflow manager.
Moving from the West Coast to the
Midwest, NewsOK.com is the combined news
portal of The (Oklahoma City) Oklahoman and News 9, the local CBS affiliate and
has been operating since August 2001 as a limited liability company. There is a
high degree of integration between the two operations, according to Ron Cariker,
managing editor for NewsOK, and that affects how daily site management is
performed.
“Our content management system was
done in-house, by a programmer who is no longer with us,” he said. “It’s a
very simple text editor, straight [HyperText Markup Language]. Its interface was
set up and designed to interface with the newspaper’s export system and the
way their content is done, through Harris. There’s a translation table that
allows the content to be exported to us where we can categorize the content from
the newspaper. They put identifiers on different sections, such as the sports
section, which would be labeled ‘SPT.’ From there the editors can go in,
enhance the story, clean it up and then put it in the different categories that
we’ve created.”
The export occurs during the
overnight hours.
“The newspaper is printed
overnight, so the export process happens a couple times because of the multiple
editions. The actual export that’s sent to us happens around 5:30 a.m. every
day, and it dumps into categories that aren’t live on the site. Between three
and four editors come on about the same time, and from there it’s pretty
hands-on as to the amount of work that goes on.”
The site handles photographs in a
similar manner.
“Photos are exported along with
the stories but in a separate database, and our editors can go in and pick and
choose whatever they want to use,” Cariker said.
NewsOK.com is built on a very
traditional and text-friendly programming language.
“Perl is the main programming
language for how our site functions, as far as the way it pulls in a template
around a story, the way a lot of the nuts-and-bolts work,” said Cariker. “Since
we have so many different functions behind our site, though, there’s some ASP,
some Java as well … we’re always trying new things to see what works best
and gives us the most options down the road.”
Azcentral.com
is the home page for The (Phoenix) Arizona Republic, a 451,000-circulation daily
owned by Gannett Co. Inc. Gannett also owns KPNX-TV Channel 12, the Phoenix ABC
affiliate that partnered with azcentral.com in March 2001 to provide streaming
video and other content from KPNX newscasts.
“In the newsroom we have a CCI
Europe front end that is able to export [eXtensible Markup Language] and from
there we take it into a homebrew content system,” said John Leach, senior
editor for online news at azcentral.com. “Before we developed our own system
around two years ago we had been through a number of systems, including most
recently IPS Xcelerate from FutureTense. There were issues of speed and user
interface that made us think we could do better on our own.”
Like others, azcentral.com gets the
data right before it gets typeset to reflect the latest editing changes.
“XML tells us what section of the
paper the content is from,” Leach said. “It drops the headline into a
headline field, byline into a byline field and so on. We pull just about
everything … some graphics don’t make it, some photos, but by and large the
site is a reflection of the print edition.”
Automation is heavily used with the
text, but manual intervention is still performed with photo content.
“Photos are associated with the
text in CCI, but we ran into problems with sizing … all sorts of things became
problematic,” Leach said. “We tried to work an automated process, but we end
up with getting a photo out of the archive and doing manual touchup. We tend to
crop a lot tighter (than in print).”