Can Swordfish process PDF documents?
Thread poster: Thomas Johansson
Thomas Johansson
Thomas Johansson  Identity Verified
Peru
Local time: 11:03
English to Swedish
+ ...
Apr 21, 2011

I will receive a PDF file with approx. 40,000 words and have been asked to process it with a CAT tool while generating a TM (for future versions). Is this something I can do with Swordfish?

Also, I got the impression Swordfish is written in Java (though I am not sure). Is it by any chance slow to work with or does it perform well?

Thomas

[Edited at 2011-04-21 19:12 GMT]


 
Rodolfo Raya
Rodolfo Raya  Identity Verified
Local time: 13:03
English to Spanish
No PDF Apr 21, 2011

Hi,

Swordfish doesn't support PDF files. You will have to use an OCR to extract the text into a better format (.docx for example).

Java is not slow, it is as fast as C++. Speed depends mostly on your hardware (memory & processor).

Regards,
Rodolfo


 
Laurent KRAULAND (X)
Laurent KRAULAND (X)  Identity Verified
France
Local time: 18:03
French to German
+ ...
PDF = pain in the back Apr 22, 2011

Hi Thomas,
while I certainly understand that clients may have their reasons to request translations from PDF files, it must be said once again that PDF was thought to be a non-editable format.

And if the document is as you described it, there must be an original in an editable and CAT-compatible format somewhere.


 
Thomas Johansson
Thomas Johansson  Identity Verified
Peru
Local time: 11:03
English to Swedish
+ ...
TOPIC STARTER
It is in an "editable" format Apr 22, 2011

Well, it is in an "editable" format, at least for instance in the sense that I can copy the text and paste it, say, to a Word document, if I like. So, OCR shouldn't really be needed. (I am not sure whether "editable" is the right word here, but it is in one of those modern PDF formats that started appearing a few years ago, where you can e.g. highlight text, copy it and paste it into some other file.)

Given this, is Swordfish still not able to process this? (I would prefer to delive
... See more
Well, it is in an "editable" format, at least for instance in the sense that I can copy the text and paste it, say, to a Word document, if I like. So, OCR shouldn't really be needed. (I am not sure whether "editable" is the right word here, but it is in one of those modern PDF formats that started appearing a few years ago, where you can e.g. highlight text, copy it and paste it into some other file.)

Given this, is Swordfish still not able to process this? (I would prefer to deliver the translation back to the client as a PDF file, i.e. in the same format as the source file.)

Or otherwise, what CAT tool could process PDF files of this sort?

Thomas
Collapse


 
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Poland
Local time: 18:03
Member (2005)
English to Polish
+ ...
PDF files are best served by OCR Apr 22, 2011

PDF files are best served by OCR, unless your client has tools to convert PDF into some DTP formats that Swordfish can import/export.

OCR processing has its drawbacks and you have to know the quirks of your OCR software, many clients do not like the way Finereader formats documents for example, so I learned to mark blocks for recognition manually to avert their rage.

I prepared a 24page PDF for translation
... See more
PDF files are best served by OCR, unless your client has tools to convert PDF into some DTP formats that Swordfish can import/export.

OCR processing has its drawbacks and you have to know the quirks of your OCR software, many clients do not like the way Finereader formats documents for example, so I learned to mark blocks for recognition manually to avert their rage.

I prepared a 24page PDF for translation manually once, it had graphics and tables and Greek characters sometimes. After that it looked like the original PDF, but it took me more than two days!

Regards,

Piotr
Collapse


 
Milos Prudek
Milos Prudek  Identity Verified
Czech Republic
Local time: 18:03
English to Czech
+ ...
Print into PDF May 6, 2011

I would prefer to deliver the translation back to the client as a PDF file, i.e. in the same format as the source file.)


Here is the workflow:
- Use OCR to convert PDF to MS Word (or read the text and translate it)
- Translate the MS Word file with any CAT
- Print the MS Word file into PDF (OpenOffice can do this)


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Maya Gorgoshidze[Call to this topic]

You can also contact site staff by submitting a support request »

Can Swordfish process PDF documents?






Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »