Geography matters.
So says Gilad Elbaz, the chief information
officer and co-founder of Applied Semantics Inc., a Los Angeles based technology
company.
The company resisted the mass migration of young
upstarts moving lemming-like to Silicon Valley during the mid and late 90s.
Many of those companies met the same fate as lemmings, it turns out, but not
Applied Semantics. Its conference table may have a few chips in it, and the
chairs arent Aerons, but theyre making money and recently closed a deal
with USAToday.com.
When I came to [Los Angeles] I found a lot
more affinity for the idea of focusing on what people want, what people need,
said Elbaz. It was easier to find people who were interested in an idea and
technology rather than if you had (venture capital firm) Sequoia backing you.
Besides, my friend was starting Google at the time and everyone went to go work
for him.

Part of the Applied Semantics team at their
office in Los Angeles. From left, Gilad Elbaz, co-founder and chief information
officer; Chris Daniels, director of sales and business development; and Jordan
Libit, chief executive officer.
Photo by Hays Goodman
Chris Daniels, director of sales and business
development at Applied Semantics, approached USAToday.com in 2001 with the idea
of using its product called Categorizer. This would assist the Gannett Co. Inc.
newspaper with properly categorizing and summarizing editorial content on its
Web site.
I literally cold-called them, Daniels
recalled. We were really just discovering newspapers and starting to go to
the trade shows. I knew that they were on the forefront with experimenting with
online architectures and online delivery.
USA Today had built an in-house, eXtensible
Markup Language-based editorial workflow system. Early on, the paper recognized
the need for consistent, quality metadata to accompany articles. Complete
metadata often allows an editorial system to provide accurate retrieval and a
much richer search experience, especially when it comes to archives.
However, when metadata isnt consistently
applied, it leads to problems. One editor may categorize a story one way, and
another will have a different idea and keywords he or she will assign.
Inconsistently applied metadata, while better than none at all, still has a
degree of randomness.
The one thing we introduced there that I dont
believe they considered in the past was the [International Press and
Telecommunications Council] taxonomy, Daniels said. I think that was a big
draw: Applied Semantics was going with standards that were used in the newspaper
industry.
The companies agreed to test the process, which
is based on a proprietary technology known as Conceptual Information Retrieval
and Communication Architecture, or CIRCA. This system draws on the fundamental
relationships of knowledge to organize textual data. This allows it to make use
of the meanings contained in the text, rather than just simply recognizing the
words.
A typical example is the word Java, which has a
number of meanings, including a synonym for coffee, an Indonesian island and a
computer programming language. All these three meanings are on the same level,
which makes recognition and tagging relatively easy in that case based on the
surrounding text.
In the case of a word like Ford, however, the
system has to rank the relationships generated. Ford is a car manufacturer as
well as a company. The concept car manufacturer is more specific than
company, so it would receive a stronger value. This entire scheme of how
concepts relate is called an ontology and forms the core of most linguistics
engines produced today.
USAToday.com supplied Applied Semantics with
thousands of example documents and the company achieved correct results more
than 90 percent of them.
Other categorization products get correct
results in the 75-percent range, so we were very pleased with that, Daniels
said.
Once fine-tuning was complete, Applied Semantics
shipped USAToday.com pre-configured servers. The Document Type Definition for
the XML editorial system had been supplied earlier, so Applied Semantics was
able to write the appropriate hooks into USAToday.coms system ahead of
time. The purpose of a DTD is to define the legal building blocks of an XML
document. It defines the document structure with a list of legal elements.
They cracked the boxes in early June and were
implemented in two to three weeks.
The servers have additional functionality beyond
generating a categorization scheme and keyword metadata. They also create
summaries, the first of which is very short and can be seen on the front of
www.usatoday.com, right below the title of an article. The second one is a
summary limited by character length, for delivery on personal digital assistants
and other future small-screen delivery media. The third is a longer summary,
which is an abstract that is inserted into the archive system.
The editors can have Applied Semantics return
results via pull down in the USAToday.com XML Editor application, said Adrian
Bouten, vice president of technology and business development at USAToday.com.
The menu item calls a macro that submits the story to Applied Semantics and
returns results to the application.
According to Bouten, the editors have been
trained to use the system and it doesnt significantly change the workflow.
The system has greatly increased consistency
in keyword tagging and categorization, he said. So far, the increase in
editorial productivity is hard to determine (from the auto-summarization)
(but) the consistency in the metadata that is stored within the story greatly
increases USAToday.coms ability to syndicate and re-distribute the content.
So far, the taxonomies and applications for the
technology have been entirely designed around the English language. So what
would happen if a French newspaper phoned up Applied Semantics tomorrow and
wanted to integrate its technology?
Wed ask them to kindly wait for a year or
two, Daniels laughed. When you think of an ontology, its really
language-independent. A chair is a chair no matter what language, and its
related to the floor
but so far weve only mapped our ontology to English.
A fair amount of it is also mapped to Spanish, but more of the complications
have to do with the natural language processing on the front end before you even
get to the ontology. Its on the roadmap for us, though. By the end of 03,
it would be nice to have two or three languages done.
Applied Semantics would likely pursue the idea
with Spanish first, Daniels said, given the number of Spanish-language
newspapers in the U.S. What are the companies other plans for 2003? Applied
Semantics cant yet publicly name the content management companies it would
like to pair up with to integrate its technology into well-established editorial
and workflow systems.
According to Jordan Libit, chief executive
officer, the company will continue to focus on maintaining profitability and
growing the staff as necessary. He thinks the increasing penetration of XML-based
editorial systems in mid- and large-sized newspapers makes accurate tagging of
metadata an expanding market and one that is ripe for Applied Semantics
product line.