Japanese Unicode en...
 
Notifications
Clear all

Japanese Unicode encoding

2 Posts
2 Users
0 Reactions
541 Views
(@akson)
Active Member
Joined: 19 years ago
Posts: 5
Topic starter  

Hi all, I've got a bit of a dillemma and I was hoping some of the bright minds on this forum might be able to help. I was given a DVD full of documents, spreadsheets and .PST files that are either in Japanese or a mix of English and Japanese. My task is to produce all of the documents in their native language. I thought, no problem, EnCase handles Unicode - I'll be done by lunch. Well, EnCase read the majority of them but a significant chunk remains that appear to gibberish. I've tried installing all the Microsoft Japanese IME's and Language packs and if I throw the PSTs into Outlook then I can get most of them but some still remain gibberish. I would like to look on the suspect hard drive to see what kind of software is installed, I'm guessing there is some kind of font or encoding mechanism that I am missing but the original hard drives are not accessible.

Any ideas out there? Anybody have much experience in Japanese or other foreign character based languages and solved a similar problem?

Thanks!


   
Quote
_nik_
(@_nik_)
Trusted Member
Joined: 19 years ago
Posts: 93
 

DVD full of documents, spreadsheets and .PST files that are either in Japanese or a mix of English and Japanese. My task is to produce all of the documents in their native language. I thought, no problem, EnCase handles Unicode - I'll be done by lunch. Well, EnCase

Language packs and if I throw the PSTs into Outlook then I can get most of them but some still remain gibberish. I would like to look on the suspect hard drive to see what kind of software is installed, I'm guessing there is some kind of font or encoding mechanism that I am missing but the original hard drives are not accessible.

Any ideas out there? Anybody have much experience in Japanese or other foreign character based languages and solved a similar problem?

Thanks!

1) make sure you have support for the east asian codepages on your examiner's machine (to get support for the codepages)

2) EnCase does suppot unicode and codepages, but some file formats/documents do not include the type of codepage used. it usually is the system code page.

For Japanese this would be most likely 932 (shift-JIS). there are many other ones though.

Try setting the Text Style to one of those.
Then you can Export the text view as unicode.

The same thing holds for PST's, but with a twist;
outlook encodes the data/text with "compressible encryption"
and the data then might also be in shift-JIS or other encodings.

Another thing is that you can open the documents in word and then save them as unicode. This can be scripted.

for a single pst message there are to fields
PR_MESSAGE_CODEPAGE and PR_INTERNET_CPID. The first represents a codepage, the latter an encoding.

To get the codepage from the encoding look in the registy at HKEY_LOCLA_MACHINE\SOFTWARE\Classes\MIME\Database\Charset and HKEY_LOCLA_MACHINE\SOFTWARE\Classes\MIME\Database\Codepage and figure out the codepage.

hope this helps.
Nik


   
ReplyQuote
Share: