LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-19-2010, 01:58 AM   #1
/dev/stderr
LQ Newbie
 
Registered: Jul 2010
Posts: 1

Rep: Reputation: 0
Renaming files with asian characters


Hi,

I have a kind of strange problem that I haven't been able to resolve for a couple of days now so I thought I'd ask to see if anyone else had come across this.

I have a bunch of files that I need to rename, ordinarily this is pretty easy task. The problem here is that the file names have Chinese / Japanese characters (sorry for my ignorance I can't tell the difference).

ie [$$$$$$$$].SOMETHING BLAH BLAH.ext

Where all the "$$$$" are insert Chinese characters.

The problem is that sed or perl doesn't seem to handle the Chinese characters correctly so using a regular expression like this 's/^[*.]//' which would normally work doesn't.

From what I have read so far I believe these characters are double encoded UTF-8 (not 100% sure) which could be the problem.

So far I've tried numerous different regex's as well as playing around with convmv to see if I could convert the filenames to just single encoded characters but I've had no luck.

Has anyone else come across this? I don't really want to rename 100+ files by hand.

Cheers,
/dev/stderr
 
Old 07-19-2010, 06:09 AM   #2
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Yes, it can be quite difficult to work with foreign language filenames on the command line. For CJK, the encodings are usually 2-4 bytes long, and unless you have the appropriate IM and knowledge, almost impossible to work with directly.

First of all, sed does work with unicode. However the example pattern you gave above is all wrong. The "wildcard" is ".*" (not "*."), and they shouldn't go inside brackets unless you're trying to match literal characters. So...
Code:
sed 's/^.*SOMETHING/SOMETHING/'
...should strip all the characters in front of "SOMETHING".

Second, try using globbing to grab the file by the part of the name you can type.
Code:
mv "*SOMETHING BLAH BLAH.ext" "new_file_name.txt"
Now a more complicated method. The uniname command (available in the uniutils package) will show you the multi-byte encodings of the individual characters, which you can then use in the shell, with a bit of trickery.
Code:
$ ls *file1.txt
日本語file1.txt

$ ls *file1.txt |uniname -bnpu
character  encoded as     glyph
        0   E6 97 A5       日
        1   E6 9C AC       本
        2   E8 AA 9E       語
        3   66             f
        4   69             i
        5   6C             l
        6   65             e
        7   31             1
        8   2E             .
        9   74             t
       10   78             x
       11   74             t
       12   0A

$ mv $'\xE6\x97\xA5\xE6\x9C\xAC\xE8\xAA\x9Efile1.txt' new_file_name.txt
$'' is a bash shell pattern which expands various escape sequences into their actual characters. Octal byte-codes like the ones above can be used in it in the pattern "\xNN".

This is cumbersome though, and only really good for scripting purposes. I'm only including it here for completeness.

One option I recommend is to use the qmv command from the renameutils package. qmv will load all the filenames given to it into a text editor, which you can then edit by hand. When you save the file, it will automatically rename all the files at once.

The following uses my personal set-up for qmv, which places the from and to names on sequential lines. You can set it up as an alias if you want. The editor used is determined by your $EDITOR environmental variable, or by the -e option.

Code:
$ ls *
日本語file1.txt
本日語file2.txt
語日本file3.txt
語本日file4.txt

$ qmv -v -f sc -o separate,indicator1='f|',indicator2='t|' *

#outputs to nano:

f|\346\227\245\346\234\254\350\252\236file1.txt
t|\346\227\245\346\234\254\350\252\236file1.txt

f|\346\234\254\346\227\245\350\252\236file2.txt
t|\346\234\254\346\227\245\350\252\236file2.txt

f|\350\252\236\346\227\245\346\234\254file3.txt
t|\350\252\236\346\227\245\346\234\254file3.txt

f|\350\252\236\346\234\254\346\227\245file4.txt
t|\350\252\236\346\234\254\346\227\245file4.txt
Unfortunately though, it doesn't seem to want to send them to the editor as pure unicode strings, so all you get are bytecodes. Notice that in this case the bytes are displayed in decimal form instead of octal, but the basic concept is the same. Just rename the "t" lines (but don't touch the "f" lines) and save and close the editor.

Finally, there are other bulk renamers out there (mostly gui), such as krename and pyrenamer.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
renaming files with spaces and special characters. bowens44 Linux - Newbie 8 06-29-2009 06:52 PM
cannot copy files with asian character file name. ufmale Linux - Newbie 2 05-31-2008 05:03 AM
Asian files hosted on windows machine not displaing TRaven Ubuntu 7 10-26-2007 02:56 PM
Files using Asian characters can't be read Mega Man X Ubuntu 0 04-29-2007 04:20 PM
How to modify the names of files and replace characters with other characters or symb peter88 Linux - General 2 12-10-2006 03:05 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:48 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration