I'm working with a case that I have to find some keywords in emails and documents. Unfortunately, I found that my target evidence files (emails) were using ISO-2022-JP charset, which is a stateful encoding.
Since ISO/IEC 2022 is a stateful encoding, a program cannot jump in the middle of a block of text to search, insert or delete characters. This makes manipulation of the text very cumbersome and slow when compared to non-stateful encodings. Any jump in the middle of the text may require a back up to the previous escape sequence before the bytes following the escape sequence can be interpreted.
Do you know which software (Encase,FTK,X-Ways or else) can do this work?
Thank you.
Hi,
One approach is raw search with codepages.
For example, you can conduct raw search with appropriate codepage such as 932, 50221, 51932, 65001, 1200(common codepage in Japan), etc. in EnCase or similar software. Please note that raw search doesn't find encoded data. Attached document within email is encoded with Base64, Office 2007 document is zipped with, PDF is stored with encoded proprietary, so they are not found with raw search.
Index Search is another approach. However, some products can't index japanese words correctly. I don't know FTK can index correctly.
Thank you,
Hi,
Today I tried with FTK4. FTK4 can parse and index japanese (ISO-2022) very well. However, I confirm that Encase7 cannot do this job by indexing. Now I have been trying with Encase6 and I saw Encase6 parsed ISO-2022 wrongly in Doc View.
Thanks,