Publishing & Data Services

Home

Service portfolio

Composition

Project management

Data conversion

Contact us

Data conversion

Newgen has a 150-person unit in Chennai devoted to major backlist and frontlist conversion projects. The unit operates under the tagline “Anything In, Anything Out,” but clients make use of three main strands of services: XML/SGML/HTML conversion, PDF scans with background OCR text, and e-book conversions.

We use more than 20 DTDs in daily operations for conversion work, including a number developed internally. Input material can be hard copy or electronic legacy files, and after scanning and OCR, keyboarding, or text extraction (depending on the source material), content is normalized to a short-tagging system known to all Newgen’s data conversion technicians before automated conversion to the destination DTD. Quality levels of 99.95 and 99.995 per cent, depending on the customer’s requirements, are ensured through a combination of online visual QC using cascading style sheets, validation through parsing and semantic checks, and statistical sampling of the output.

For Oxford University Press, the Institute of Physics Publishing, and Cambridge University Press combined, we have now converted more than eight million printed pages dating back to the mid-nineteenth century to searchable online PDFs. In the scanning process, color, images, and text are scanned in separate passes, optimized individually, and then recombined in the PDFs to balance resolution with file size. Headers and references are tagged in full XML so that they can be used for searching and linking outwards, and the main text is OCRed so that it can be searched by readers.

Newgen supports onward conversion to all the major e-book formats, including Mobipocket, Sony, Acrobat Reader, Microsoft .lit, and Kindle.