Hi I know this is a bit non-forensicky but thought there might be someone who could help.
I'm working with our internal audit dept, getting data dumps from various accounting packages for processing.
One of the output files I have received (in csv) seems to have random 0x0D characters (when viewed in WinHex) inserted in a description field which then messes up records when imported into Excel (which is what we have to use for processing 'cos we've bought some licenses for an Add-in). The records look messy in TextPad and WordPad but not in Notepad or MS Word (
What I want to do is get rid of all the random 0x0D characters but of course every line/record ends with 0x0D 0x0A so that would mess things right royally.
So I thought I'd try
1 Replace 0x0D 0x0A with “anything fairly unique”
2 Then replace any remaining instances of 0x0D with 0x20 (or possibly with nothing at all)
3 And finally replace “anything fairly unique” with 0x0D 0x0A
Thus getting rid of random 0x0D characters which are causing me problems
I did 1 above in MS Word (replacing ^p with "anything fairly unique" but there are still 0x0D 0x0A all over the palce so I'm no further forward
Anyone got any ideas?
Cheers
Peter
I did 1 above in MS Word (replacing ^p with "anything fairly unique" but there are still 0x0D 0x0A all over the palce so I'm no further forward
Anyone got any ideas?
If you are convinced of the procedure (which does sound "OK") use a "better" (or "more suited") tool than MS Word to do the replacement.
gsar
http//
is such a suitable tool.
jaclaz
Try out PowerGrep. you can build a regular expression to make the changes in a sane manner
RegexBuddy can help you build the actual regex with a preview of what will match too.
This is a very simple program to write in almost any language
The better quality text editors will also do this easily. I like UltraEdit.
Another solution would be to add double quotes to the field, Excel should then import it correctly and keep the line feeds. Really it is the fault of the accounts package that didn't do a valid CSV export in the first place.
Thanks everyone. Hat tip to Passmark, UltraEdit appears to do the job on initial testing.
PowerGrep - may do the job but is more expensive (
Tjaberg - sorry, I just didn't follow through on it (
Simple to write - 2014 may just be the year when I finally learn to write a little bit of code !
Thanks again
Update
Notepad++ lets you convert from ASCII to Hex (under Plugins), do the search/replace, then convert back again
Only problem was that it only replaced 164 out of 167 known instances in a relatively small sample file, difficult to know how it would behave with my 1MB behemoth
Oh, and I forgot to say, we're stuck (for the moment anyway) with the output we've got, anything more means our request goes into a development queue and we wait another 6 months. Might happen anyway but we needed something now.
Oh and also BTW, we have at least 7 different accounting/ERP packages spread across the Group, all with their own separate support and development teams )
Thanks again
Tjaberg - sorry, I just didn't follow through on it (
Well, with all due respect ) , it is not like brain surgery 😯 .
The logical step you miss is probably this piece of info
DOS/Windows use 0x0D 0x0A as "end of line" (CR+LF)
'nixes use only 0x0A (LF)
gsar has a built in function to convert from Dos to Unix (and viceversa)
-du
Convert a DOS ASCII file to UNIX (strips carriage return).-ud
Convert a UNIX ASCII file to DOS (adds carriage return).
So I would try
converting the file to Unix
gsar -du -o file.txt
(the above will overwrite the source file with the modified one)
Replace all the 0x0D occurrences with spaces
gsar -sx0D -rx20 -o file.txt
or remove them altogether
gsar -sx0D -r -o file.txt
and finally convert back from Unix to Dos
gsar -ud -o file.txt
Using another tool with regular expression support or the like might be more elegant, of course.
jaclaz
I did 1 above in MS Word (replacing ^p with "anything fairly unique" but there are still 0x0D 0x0A all over the palce so I'm no further forward
You already have several good suggestions. I'll just observe that if you want to operate at the byte level of a document, don't use a program that operates on a level above that, such as Word. It's liable to add things after its own head.
I would like to add a recommendation for HxD as a small, fast and very handy hex editor. Editing bytes in the way you suggest would be very easy.
I would like to add a recommendation for HxD as a small, fast and very handy hex editor. Editing bytes in the way you suggest would be very easy.
Brilliant, thanks. Not to mention the attractive price-point )