When one of our "local" client first introduced us to the effective use of OCR for translation purposes a few years ago, it was relatively rare in the world of commercial translation and too often incompetently performed, but most professional translation agencies had been using that technology in one way or another for a decade already. But in many cases, "standard" procedures for optical character recognition are simply not well suited for our purposes, so here we go, we just had a lot to learn.
Well, the basic idea is to avoid using OCR for anything OTHER THAN undertaking some word counting estimation or the like. Unfortunately few do so efficiently or even usefully. Depending on the source, estimating text counts for word quotation may be very accurate or only a rough count (if there are serious contrast problems that can't be compensated, for example). Most professional translators do this not only with PDFs and bitmap files such as JPEG or TIFF, but also with large, complex documents in other formats.
One cannot always rely on the counts from Microsoft Word itself or various translation tools for text counting. Embedded objects, even editable ones, are generally not included in the counts, we ALL know this but time to time fail to admit it (!!).
Using OCR to prepare translations is often straightforward, but there are a number of traps that people commonly fall into. Do not, under any circumstances, be seduced by the automatic conversion settings of any commercial OCR program nor by options to save with the "original formatting". This is nearly always a disaster when working with translation tools. Problems may include bizarre text changes, disappearing chunks of text due to text box sizing problems, a plague of tags and more.
OCR強迫観念、あ・ぶ・な・い!!! Be safe !