The International Journal 
of Newspaper Technology

Home  | Newspapers & Technology | Prepress Technology | Online Technology | International News
 | Free Subscription | Contact Us | Newspaper Links | Trade Show Listing |

        

January
2002




Zycus
408.855.1060
www.zycus.com

 



 













 

 


Making sense of chaos: Zycus sifts through data to find logical structure

By Hays Goodman
Associate Editor


When The Washington Post needed to bring vast amounts of employment listings online, directly from corporate Web sites and recruitment sites, they realized the task wasn’t necessarily going to be an easy one.

When a classified ad system is able to stay entirely internal, as when a newspaper publishes only their own classifieds, then data organization is considerably easier than integration with external systems. To build the largest possible database in the area, The Washington Post not only wanted to put their own paper’s classifieds online, they also wanted to incorporate ads from other Web sites and other partners. The problem was, there was no overall schema to the way the data was organized, which would have required an unrealistic level of reformatting and recopying by personnel, unless some more automated means of data normalization could be found. The Post turned to an American company with an extensive engineering presence in India for a solution.

“Prior to using Zycus’ technology in July 2000, WashingtonJobs.com used a third-party vendor to provide the technology that ‘scrapes’ our customer’s Web sites for their job listings and allows users to search our listings,” said Mary Moslander, vice president and general manager of WashingtonJobs.com. “With our previous vendor, we did not have access to the job listing data and had only limited control over the scraping process and frequency.

“In 2000, we determined that controlling our jobs database was critical to developing the most powerful and effective local online recruitment solution,” she continued. “We were familiar with Zycus because they had contacted us the year before, at a time when their technology was not compatible with our third-party job hosting arrangement. When we began building our own searchable jobs database, we looked closely at the options available and then chose Zycus. They were also chosen as the technology for displaying The Washington Post’s text ads online.”

Founded in 1998, Zycus Inc. is a privately held company headquartered in Santa Clara, Calif., and is a subsidiary of Zycus InfoTech Pvt. Ltd. The company’s chief executive officer, Aatish Dedhia, was formerly with Intel where he worked on the Pentium Pro Design Chip team. Even while working on that project, his interest in finding structure from chaos led him to establish an Intranet management tool for his design group, where 20,000 technical documents were indexed and made searchable to the team of 200 people.

All of Zycus’ solutions focus on distilling a hierarchy from documents that may have a structure and repeatable pattern, but are not necessarily databased, although extraction from existing databases is also possible. Each particular product targets a specific industry niche, with an emphasis on electronic catalogs and classifieds advertising.

The Washington Post is using two solutions from Zycus — EzeePost and ClassAct. EzeePost attempts to automate the flow of job listing content from employers to the job portal, in effect scraping existing online job postings at regular intervals as well as taking other data feeds from disparate sources like e-mails, Microsoft Word documents, and other databases and then running the proprietary filters. From there, the combined data is delivered to the client in the format, structure and schedule of their choosing.

“The primary function that Zycus performs is scraping our customer Web sites three times a week for their job listings. After going to our featured employer Web sites and copying their job listings, Zycus processes the listings, compiles the data and prepares a data feed. This feed is then sent to WashingtonJobs.com, where our database is updated and the listing become available on our Web site. This process allows companies to maintain their brand image by reproducing job listings on WashingtonJobs.com with the look and feel that is consistent with their own Web site.

“The second function that Zycus provides is processing the print ads that come from The Washington Post. WashingtonJobs.com sends Zycus display and lightface ad feeds once a week. Zycus processes this data against specified criteria and provides a data feed that is processed at WashingtonJobs.com and made available to our users.”

This second function is the work of another Zycus product, ClassAct. This service helps newspapers move classifieds online as well as improving searchability and usability of the ads. The service accepts a wide range of input formats including plain text, PDF, eXtensible Markup Language, Word and others. From there, a sophisticated rules-based classification engine runs against the customer-defined schema, which generates the initial classification. Next validation checks are performed, which allows the software to continually learn, and in time can produce accuracy above 95 percent, according to Sanjay Gala, executive vice president for business development at Zycus.

“We can get the data 95 (percent) to 98 percent accurate, and we are constantly improving,” Gala remarked. “We have a team that handles all this 24/7 and our services are highly scalable. Our customers are not in the business of data extraction or processing … this is a tough business to run efficiently and cost-effectively unless it is your core business, as it is in our case.”

Zycus software is notably sourced in the application service provider type of arrangement.

“We can, and will, productize it when our customers ask for it. So far, they prefer to keep it as a service so that we do all of the maintenance and keep up with changes in the market. For example, when new car models come out, we update all our rules to handle that change. However, in addition to the ASP model we will probably productize the newspaper software within the next year.”

Pricing of the service always requires a custom estimate, since each situation is unique in terms of data sourcing, the number of feeds and the update frequency required.

“The price is based on volume and on the quality of the input data,” said Gala. “So on a job-for-job or ad-for-ad basis, it’s less expensive to process job listings from an employer Web site than from a newspaper front-end system. This is because the data from the employer Web site is more highly structured and in a consistent format. However, there is usually a larger volume of newspaper ads, so the volume drives the price down on a strictly ad-for-ad basis.”

The Washington Post is pleased with the level of customer service they have received from Zycus.

“Given the critical role of online recruitment in our business, and the stiff competition in this industry, it is essential that we have high-quality customer service,” said Moslander. “Zycus has done a very good job in providing rapid, pro-active customer service to WashingtonJobs.com, regardless of the time of day.”