If you are a professional translator you will probably be aware of the difficulties of translating PDF documents.
There are two fundamental problems we face when accepting a PDF from a client:
As an image format, PDF is all about visuals - layout is king. Getting at the content inside a PDF was never a priority for Adobe when they designed the format and as a result it's very difficult to do. With no concept of paragraphs, text boxes or story flow, even if you can get at the raw characters inside, you will have to reconstruct them into meaningful language.
Now at last, there is a glimmer of hope for those of us frustrated by this situation.
A software house based in England has been working on Infix PDF Editor which attempts to resolves both issues. And their latest revision (version 5) provides a good, usable solution for professional translators. You can download a fully-functional trial version from their web site.
The software doesn't do the translation itself, that is left to your own CAT tools such as Trados or Sisulizer or any of the dozens of sophisticated memory-based CAT solutions on the market. What it does do is to export all of the content as XML and then re-import the translated XML back into the existing PDF retaining all original layout.
Clever, automatic
text-fitting handles most of the cases where the translation is
longer than the original. But for the times when even this can't squeeze a wordy translation into an existing
space, the software offers a large array of editing tools to help resolve any layout-related
problem.
![]() |
| Typical workflow when translating a PDF using Infix PDF Editor |
Since the text inside a PDF isn't in any particular order there are no nice paragraphs, indents or stories just individual characters at specific x,y positions. When you export a PDF, Infix first pieces together all these fragments into lines, paragraphs and text blocks and spits out simple, well formed XML.
![]() |
Infix's export dialog offers a second format - 'plain text' which can be handy for quickly passing through Google Translate to get a rough-and-ready PDF translation. |
The XML is divided into paragraphs and includes markup for font,
size and color. Most CAT software will easily handle such basic
XML thought the software does comes with user-supplied
configuration files for Trados and Accross TM though you
probably won't need them.
One thing the software doesn't do automatically is to link
together text blocks into a story flow. This is not surprising
since it's difficult enough for a human being to work out where
a story flows in some documents. If your document's body text is
in a 3-column layout for example, Infix will treat each column
as a distinct, isolated block of text.
Thankfully, Infix offers a tool for linking text blocks into
stories. Using the T123
tool, you click on each text block in the story to connect
them together. Now when you re-import the translated text, the
software will reflow the text from the first connected box to
the last, filling each as it goes. If you use the automated text
fitting option, it will even ensure the best fit across all
linked boxes giving a very neat finish.
![]() |
Clicking on text blocks links them together. The software numbers the linked blocks to show the reading order of the story. These connections are recorded in the PDF so you should only need to do this once. |
Each paragraph exported in the XML is given a unique number and the software tags each paragraph inside the PDF with the same number so when you import your translated XML, it knows where each paragraph should go. To preserve this relationship between paragraphs it's important to save the PDF as soon as you export it. Once saved, you will be sure that the paragraph tags in the XML will always match the tags stored in the PDF.
![]() |
Once exported, Infix shows how the text was split-up by coloring blocks. Each block corresponds to a separate paragraph in the output XML. The coloring is only an aid and doesn't affect the PDF. It will look normal when viewed in any other PDF reader. |
Use your preferred CAT tool to translate the XML making sure you maintain the paragraph markup as well as the font/size changes if you want to preserve these.
|
| A sample of the XML Infix creates. Your CAT software should be able to shield you from the XML details, allowing you to concentrate on just the text content. |
The first and most important thing to do is to tell Infix what language you will be importing. The language setting governs how words are hyphenated when broken across lines. Get this wrong and you'll get some strange hyphenation. Infix support 25 languages so it should cover most common cases.
![]() |
The Import dialogue has more options than the export. That's because importing is a little more complicated than exporting, but not much. |
Next, decide if you want Infix to use it's clever text-fitting
during the import. As a general rule this is helpful though it
does slow things down a little. You can always go through the
document later and add or remove the auto-fitting from any text
boxes that need adjustment.
We'll ignore Font Substitutions for now - we'll come to them
later once we've done the first import.
Choose the file containing your translated XML and press OK to
start the ball rolling.
The first thing the software does is to check that all the
imported text can be represented using the fonts included in the
original PDF. This is important since a typical PDF will only
include subsets of the fonts containing just those characters
actually used in the PDF. This is done automatically by most PDF
generators to reduce the overall size of the PDFs.
![]() |
When a PDF font doesn't have the characters to display your translation the software will look on your own computer for the same font. If it doesn't find a match, it will then ask you to choose a substitute font. The characters that are missing from the font are presented together with a menu of all the fonts you could try instead. You will be asked to make this choice for every font which doesn't have the needed characters. |
Once you've made all the required substitutions, Infix begins the import process. Depending upon the size of your document, this can take a while. The software only imports a text block when it find a difference between the text in the PDF and that in the XML so for minor changes to your translation, it should be quite a quick process.
Once the import is complete you will be left with a translated PDF. This may look perfect but the chances are some things will need tweaking. For example, even with auto-text fitting, you may find some text blocks just have too much text. You then have two options - rephrase the translation to make it shorter or reshape the text box to make it larger. Both can be easily done using Infix.
![]() |
It's wise to make use of the software's basic preflight checks to discover any overset text boxes. This occurs when there is too much text to fit into a text box (or linked chain of text boxes). The overflowing, "overset", text is hidden from view and the text box marked with a small red box. The preflight-check facility will locate any text boxes which have become overset so that you can reshape them or rephrase the text. |
This is also a good time to review any font substitution choices you made during the import (if any).
If you did have to make some substitutions then the software can list them and allow you to modify, save or load them from disk. Over time you can then build up a set of font substitutions to use for a particular client or language.

Well almost. There are a few things I would like to see in future versions.
Looking through some of the posts on Proz.com's forums regarding PDF translation, it appears as if the company pays attention to feedback from users. So perhaps these kind of improvements will come in time.
However, the ability to open-up any PDF and translate it with little more difficulty than a RTF document, plus all the additional tools included in the price, makes it a "must have" addition to this translators tool chest.
Infix comes in "Standard" and "Pro" versions though only "Pro" offers the translation facilities we're talking about here and it costs $159.
There is also a kind of pay-as-you go option which means you can use all the "Pro" features for free whilst working then only pay $30 at the end of the project to remove watermarks from the finished document. This is known as the 'Pay & Save' option. More details of pricing and enterprise licensing are available from the company's web site.