Find/Replace across multiple files, multiple directories Автор темы: Samuel Murray
|
Samuel Murray Нидерланды Local time: 16:22 Член ProZ.com c 2006 английский => африкаанс + ...
G'day everyone Can you please give me your recommendation of a program or programs that can do the following? It needn't be freeware, but the cheaper the better, obviously. I need to make edits on multiple plain text files located in multiple directories (but usually only one tree, i.e. only one top-level or ancestor directory). I'm using Windows XP Pro. The files are usually in UTF8 format (but if your tool can handle other formats too, so much the better). The fil... See more G'day everyone Can you please give me your recommendation of a program or programs that can do the following? It needn't be freeware, but the cheaper the better, obviously. I need to make edits on multiple plain text files located in multiple directories (but usually only one tree, i.e. only one top-level or ancestor directory). I'm using Windows XP Pro. The files are usually in UTF8 format (but if your tool can handle other formats too, so much the better). The files often do not have UTF8 byte order marks, but sometimes they do. The files usually have LF (Unix) line endings but may also have CRLF (Dos/Windows) line endings. What I need to do, is this: 1. Find CRLF (carriage return/line feed, aka Dos/Windows line endings) and replace them with LF (line feed, aka Unix line endings). (must have) 2. Find a UTF8 byte order mark, and remove it (or optionally also add it). This can usually be done if the program is capable of doing hex editing, because the byte order mark is nothing more than three unique bytes at the start of a file. (must have) 3. If at all possible, if regex can be built into your recommended tool, it'd be so much better. (optional, but nice) 4. Do find/replace operations in UTF8, even if the file has no byte order mark (or alternative even if the file has a byte order mark). (kinda non-optional, but depends on the tool) So, what can you recommend? Samuel ▲ Collapse | | |
Adam Łobatiuk Польша Local time: 16:22 Член ProZ.com c 2009 английский => польский + ...
Rainbow (from Okapi Tools) - free and Ultra Edit 32 (commercial, but is probably available as a time-limited demo). You might want to try both for the different features. Good luck. | | |
Robert Tucker (X) Великобритания Local time: 15:22 немецкий => английский + ... |
Samuel Murray Нидерланды Local time: 16:22 Член ProZ.com c 2006 английский => африкаанс + ... Автор темы
Robert Tucker wrote: But command line Perl probably can too. Unix to DOS text conversion with Perl is shown here: http://sial.org/howto/perl/one-liner/ Not quite sure if it will work recursively though. I'll gladly use Perl, if I can use it recursively and not have to specify the individual file names. I know that Perl can change line endings. Can Perl find/replace or find/remove hex characters like \xEF\xBB\xBF ? | |
|
|
Robert Tucker (X) Великобритания Local time: 15:22 немецкий => английский + ... |
Kevin Lossner Португалия Local time: 15:22 немецкий => английский + ... You could treat it as a "translation" project | Dec 8, 2009 |
Both MemoQ & DVX will enable you to do this. Copy the source text to the target by the usual methods and do your search/replace. The results will be exported in the same directory structure. | | |
Samuel Murray Нидерланды Local time: 16:22 Член ProZ.com c 2006 английский => африкаанс + ... Автор темы Don't use XReplace-32! | Dec 9, 2009 |
Robert Tucker wrote: Came across XReplace-32: http://xreplace.vestris.com/ "XReplace-32 is the tool you need for massive search-and-replace operations among all of your text files, including html web documents and source code." I tried to test this program on a single file in a single directory, but when I pressed "Go!" it starting processing all files (all non-binary files) on my Desktop and all subdirectories of all folders on the Desktop! And its progress report indicated that it was making changes to all of the files. I tried to cancel the replacement process by clicking on the "x" but the program refused to close. There is no "Stop" button in the program as far as I can see either. Luckily I have a process killer utility on my desktop (Taskill, which I normally use for programs that hang) and I was able to kill XReplace-32 before it damaged all files on my computer. Luckily the program makes backups of all changed files, so I was able to revert the changes by deleting the changed files and renaming the backups back. | | |
Samuel Murray Нидерланды Local time: 16:22 Член ProZ.com c 2006 английский => африкаанс + ... Автор темы Comment on UltraEdit 32 | Dec 9, 2009 |
Adam Łobatiuk wrote: ....and Ultra Edit 32 (commercial, but is probably available as a time-limited demo). You might want to try both for the different features. I tried UltraEdit 32, thanks. It can find/replace CRLF and LF, using Perl regular expressions, if you specify the hexadecimal values: Find: \x0D\x0A Replace: \x0A But it can't find the UTF8 BOM. I tried to search for \xEF\xBB\xBF but it could not find it. It did open a BOM'ed and a BOM-less UTF8 file both as UTF8, which is nice at least. But it once opened a BOM-less UTF8 file as ISO-8859-1 (see next paragraph). One thing that is somewhat disconcerting is that if the file is UTF8 but is also valid ISO-8859-1, it sometimes opens it as ISO-8859-1 without asking, but sometimes it asks if I want it to "convert the file to DOS format" (and if I answer "no", then it opens the file as UTF8). In one case it opened a UTF8 file without asking if it should convert it, but when I saved the file, it was in ISO-8859-1, not UTF8. I haven't checked extensively but I could not find any option to tell it to always assume that UTF8 files are in UTF8 and to always save such files as UTF8, even if it can be saved as ISO-8859-1 without data loss. | |
|
|
Samuel Murray Нидерланды Local time: 16:22 Член ProZ.com c 2006 английский => африкаанс + ... Автор темы Perl, hex and Windows | Dec 9, 2009 |
Hmm, it doesn't seem to work. I tried this line: perl -pe 's/\xEF\xBB\xBF//g' file1.txt > file1.txt on a file with a UTF8 byte order mark (EF BB BF) but it fails to remove it. Another problem is that the "-i" switch, which means "process the file itself, in place" doesn't seem to work on Windows, which means that I have to write the result to a new file, and I would then have to find a way to ensure that files that weren't modified are also copied to the new location. And because the target location must be mentioned, and must typically be a full path, in quotes, the way to use Perl would involve writing a large BAT file with each replacement on a new line. I've been down that road... it aint pretty. The lines mentioned on that page does not work, but then, they look rather weird anyway. Besides, the "find" command in Windows does not search for files (as it seems to do in Linux) but for lines in files. The relevant command in Windows may be "dir", and you may have to pipe the commands, but I have been unsuccessful in tinkering with it.
[Edited at 2009-12-09 10:46 GMT] | | |
Samuel Murray Нидерланды Local time: 16:22 Член ProZ.com c 2006 английский => африкаанс + ... Автор темы Rainbow does it | Dec 9, 2009 |
Adam Łobatiuk wrote: Rainbow (from Okapi Tools)... I installed Rainbow-R00003-v5.0.1 (found on my computer somewhere) and it works. It can process multiple files in multiple directories, and you can drop a directory tree into Rainbow as-is. The relevant options are on the toolbar menu: Utilities > Line-break conversion Utilities > Byte-order-mark conversion I'm not sure if it is required to specify the input and output encoding in the "Options" tab, but it can't hurt to do so. One can replace files or create new files with a regular type of name. Rainbow doesn't do find/replace, though. Added: I see the latest version of Rainbow is 5.0.20, here: http://okapi.sourceforge.net/downloads.html
[Edited at 2009-12-09 11:02 GMT] | | |
Robert Tucker (X) Великобритания Local time: 15:22 немецкий => английский + ... Perl – Remove BOM | Dec 9, 2009 |
Samuel Murray wrote: Hmm, it doesn't seem to work. I tried this line: perl -pe 's/\xEF\xBB\xBF//g' file1.txt > file1.txt on a file with a UTF8 byte order mark (EF BB BF) but it fails to remove it. Try: perl -CD -pe 'tr/\x{feff}//d' bom.txt > nobom.txt It seemed to work on Linux. (I opened Bluefish, typed Ctrl+Shift+U EF, Ctrl+Shift+U BB, Ctrl+Shift+U BF so that I got  and then just added some text; ran the above command and in the new file the  was absent.) http://www.perlmonks.org/?node_id=724474
[Edited at 2009-12-09 12:26 GMT] | | |
Robert Tucker (X) Великобритания Local time: 15:22 немецкий => английский + ...
Samuel Murray wrote: The lines mentioned on that page does not work, but then, they look rather weird anyway. Besides, the "find" command in Windows does not search for files (as it seems to do in Linux) but for lines in files. The relevant command in Windows may be "dir", and you may have to pipe the commands, but I have been unsuccessful in tinkering with it. I tried them on Linux. The one with the "find" command seemed to only edit one file in a directory (when in fact I had two files which could have been edited). The command with grep seemed to work. Don't know if this Windows grep would be up to the job: http://pages.interlog.com/~tcharron/grep.html | | |