By Hays Goodman
Associate Editor
When The Washington Post needed to bring vast
amounts of employment listings online, directly from corporate Web sites and
recruitment sites, they realized the task wasnt necessarily going to be an easy
one.
When a classified ad system is able to stay
entirely internal, as when a newspaper publishes only their own classifieds,
then data organization is considerably easier than integration with external
systems. To build the largest possible database in the area, The Washington Post
not only wanted to put their own papers classifieds online, they also wanted to
incorporate ads from other Web sites and other partners. The problem was, there
was no overall schema to the way the data was organized, which would have
required an unrealistic level of reformatting and recopying by personnel, unless
some more automated means of data normalization could be found. The Post turned
to an American company with an extensive engineering presence in India for a
solution.
Prior to
using Zycus technology in July 2000, WashingtonJobs.com used a third-party
vendor to provide the technology that scrapes our customers Web sites for
their job listings and allows users to search our listings, said Mary Moslander,
vice president and general manager of WashingtonJobs.com. With our previous
vendor, we did not have access to the job listing data and had only limited
control over the scraping process and frequency.
In 2000, we determined that controlling our jobs
database was critical to developing the most powerful and effective local online
recruitment solution, she continued. We were familiar with Zycus because they
had contacted us the year before, at a time when their technology was not
compatible with our third-party job hosting arrangement. When we began building
our own searchable jobs database, we looked closely at the options available and
then chose Zycus. They were also chosen as the technology for displaying The
Washington Posts text ads online.
Founded in 1998, Zycus Inc. is a privately held
company headquartered in Santa Clara, Calif., and is a subsidiary of Zycus
InfoTech Pvt. Ltd. The companys chief executive officer, Aatish Dedhia, was
formerly with Intel where he worked on the Pentium Pro Design Chip team. Even
while working on that project, his interest in finding structure from chaos led
him to establish an Intranet management tool for his design group, where 20,000
technical documents were indexed and made searchable to the team of 200 people.
All of Zycus solutions focus on distilling a
hierarchy from documents that may have a structure and repeatable pattern, but
are not necessarily databased, although extraction from existing databases is
also possible. Each particular product targets a specific industry niche, with
an emphasis on electronic catalogs and classifieds advertising.
The Washington Post is using two solutions from
Zycus EzeePost and ClassAct. EzeePost attempts to automate the flow of job
listing content from employers to the job portal, in effect scraping existing
online job postings at regular intervals as well as taking other data feeds from
disparate sources like e-mails, Microsoft Word documents, and other databases
and then running the proprietary filters. From there, the combined data is
delivered to the client in the format, structure and schedule of their choosing.
The primary function that Zycus performs is
scraping our customer Web sites three times a week for their job listings. After
going to our featured employer Web sites and copying their job listings, Zycus
processes the listings, compiles the data and prepares a data feed. This feed is
then sent to WashingtonJobs.com, where our database is updated and the listing
become available on our Web site. This process allows companies to maintain
their brand image by reproducing job listings on WashingtonJobs.com with the
look and feel that is consistent with their own Web site.
The second function that Zycus provides is
processing the print ads that come from The Washington Post. WashingtonJobs.com
sends Zycus display and lightface ad feeds once a week. Zycus processes this
data against specified criteria and provides a data feed that is processed at
WashingtonJobs.com and made available to our users.
This second function is the work of another Zycus
product, ClassAct. This service helps newspapers move classifieds online as well
as improving searchability and usability of the ads. The service accepts a wide
range of input formats including plain text, PDF, eXtensible Markup Language,
Word and others. From there, a sophisticated rules-based classification engine
runs against the customer-defined schema, which generates the initial
classification. Next validation checks are performed, which allows the software
to continually learn, and in time can produce accuracy above 95 percent,
according to Sanjay Gala, executive vice president for business development at
Zycus.
We can get the data 95 (percent) to 98 percent
accurate, and we are constantly improving, Gala remarked. We have a team that
handles all this 24/7 and our services are highly scalable. Our customers are
not in the business of data extraction or processing
this is a tough business
to run efficiently and cost-effectively unless it is your core business, as it
is in our case.
Zycus software is notably sourced in the
application service provider type of arrangement.
We can, and will, productize it when our
customers ask for it. So far, they prefer to keep it as a service so that we do
all of the maintenance and keep up with changes in the market. For example, when
new car models come out, we update all our rules to handle that change. However,
in addition to the ASP model we will probably productize the newspaper software
within the next year.
Pricing of the service always requires a custom
estimate, since each situation is unique in terms of data sourcing, the number
of feeds and the update frequency required.
The price is based on volume and on the quality
of the input data, said Gala. So on a job-for-job or ad-for-ad basis, its
less expensive to process job listings from an employer Web site than from a
newspaper front-end system. This is because the data from the employer Web site
is more highly structured and in a consistent format. However, there is usually
a larger volume of newspaper ads, so the volume drives the price down on a
strictly ad-for-ad basis.
The Washington Post is pleased with the level of
customer service they have received from Zycus.
Given the critical role of online recruitment in
our business, and the stiff competition in this industry, it is essential that
we have high-quality customer service, said Moslander. Zycus has done a very
good job in providing rapid, pro-active customer service to WashingtonJobs.com,
regardless of the time of day.