File concatenation ...
 
Notifications
Clear all

File concatenation on recovery from unallocated space

5 Posts
5 Users
0 Likes
351 Views
 SteD
(@sted)
Posts: 1
New Member
Topic starter
 

Hope this is the right place to post this, did consider the general forum, but thought it more a software issue.

Sometime ago,I found some free time, and while playing at deleting and recovering files from a usb pen, I noticed that some of the recovered text files seemed to have paragraphs or text out of place or missing, I only noticed because I was the author of the text document that I recovered so when i just happened to read a bit of it, it didn't seem correct, I checked and found that the text which I had recovered was actually from two versions of the same name document, which i had previously deleted, so the software had recovered the file, finding the start and the end and build/recovered the file from the two older same name files.

Following on from this, and testing it over and over, I found it didn't always happen, but sometimes I managed to recover File-A even with an odd few lines of text from a totally different but same extension type file-B

So I built myself a small test bed, using a brand new USB stick, On my workstation I made a small directory of files and a set of sub directories, same basic names each with the same files, such as

file-01.txt
file-02.txt
file-03.txt
-/file-01
-/file-01/file-01.txt
-/file-01/file-02.txt
-/file-01/file-03.txt
-/file-02
-/file-02/file-01.txt
-/file-02/file-02.txt
-/file-02/file-03.txt
-/file-03
-/file-03/file-01.txt
-/file-03/file-02.txt
-/file-03/file-03.txt

Ok, hope that stays in format and posts ok.

What I did was to edit the contents of each file to be just the number of the file, but in a simplified way, so that I could easily check it later, so the content of file-01.txt was similar to

1111 11 1 111111111 11111 1111 111111 11111 111 11 11 11 11111111111 1 1 111111

11111111111 11 11111111111 111111 1111111111 111111 111111111111 11111111111
11111111111 11111111 1111111111 11111111111

you get the idea, it only contained fake text and paragraphs, all as 1's

you guessed it, file-02.txt was similar, but not the same text pattern, but again all 2's and in file-03.txt that was all 3's

I then copied, deleted, copied, deleted, formatted, copied, deleted some, copied, overwrote, deleted, etc etc onto the USB stick, and then worked on it, and recovered some of the deleted files from it

I found that some of the recovered files, the content was in the format similar to

1111111111111 11 111111 111111111111 11111 222222222 222 2222222222 222
22222222 22 2222 2 222222222 222222222222222 222222222 2 11111111111 111

A few of the recovered files were mixed content, and I found the more times i did this, and the more initial files that I started with the more often it recovered mixed content

Oddly I only found it affected file types that were txt based..

I really don't want to say which software that I used, might do that later, just wanted to see if anyone else had ever tried this or found similar, is this a known issue when recovering from unallocated space, if it is, how can content if text based, such as chat logs etc when recovered from unallocated space be used at all.

Ste

 
Posted : 19/10/2014 1:04 am
joakims
(@joakims)
Posts: 224
Estimable Member
 

Keeping the software as a secret just decreases the likelyhood of any good replies. And not to mention, how did you use this unknown software… The target filesystem would have been nice to know too.

 
Posted : 19/10/2014 2:01 am
(@athulin)
Posts: 1156
Noble Member
 

Oddly I only found it affected file types that were txt based..

So you did additional tests with other file types? And … ?

Without knowing what software you used, and what kind of method it uses to locate and join file fragments, it's just guessing.

However, txt files don't have any internal structure to help in deciding if fragment 2 comes before or after fragment 1. If there is any smartness in the tested software, it can only be brought to bear on actual contents … if I had to choose, I'd probably rank software that only produced a sequence of '1' somewhat lower than software that went for slightly more variation.

That may be where your testing methodology falls down – if the software you use is smart, you better test its smartness. If it looks for things recognizable words, or character, digram or trigram or other statistics that's … 'even' from one fragment to another … then you had better hand it real text to work with. Otherwise you'll probably just see the usual GIGO effcts. Hand those strings of 1 and 2 over to Eliza …

… such as chat logs etc when recovered from unallocated space be used at all.

Well, think about it. If there are no file structures to line up, any matching must be based on contents. If an analyzer can say 'fragment 1 looks like English', but 'fragment 2 looks like Spanish', you wouldn't expect those two fragments to be joined up, except perhaps if nothing else makes better sense. And if you have two fragments in the same language, the 'join' between them should preferrably not produce a word, crossing the join, that isn't in that language (here's where digram and trigram statistics may enter the picture).
If you want to make conclusions about if the software can do chat logs, give it chat logs.

But what does your sequences of 1 and 2 look like? More like code than natural language. And how do you decide on the criteria for evaluating restoration attempts?

Looks like a Turing test to me … why not give a random person the 'original' file, and the restored file, and ask him/her to tell you which of them is correct, and perhaps also why…

 
Posted : 19/10/2014 11:40 am
jaclaz
(@jaclaz)
Posts: 5133
Illustrious Member
 

As always I may be a lot off target 😯 , but it seems like it's some fuss over something that is "quite normal" roll .

After the

I then copied, deleted, copied, deleted, formatted, copied, deleted some, copied, overwrote, deleted, etc etc onto the USB stick, and then worked on it, and recovered some of the deleted files from it

processes, there are only TWO possibilities
1) the files were contiguous
2) the files were fragmented

And three possibilities about the way the recovery software worked
a. "dumbly" i.e. not using any of the (if remaining after the process) filesystem addressing data
b. "smartly" i.e. using (if remaining after the process) some filesystem addressing data
c. "randomly" i.e. inventing an order in which to read the sectors.

IF #1.b or #2.b then the "recovery software" would be able (IF the filesystem addressing data survived the procedure ) to recover the text files "as they were".

IF #1.a the result could be either the files as they were or a larger file containing all the files.

IF #2.a the result could be either a set of "mish-mashed" files or a larger file containing mish-mashed contents.

If #1.c or #2.c *anything* depending on the specific method/algorithm the specific tool uses.

A "plain" txt file has not any "header" nor "footer" to be able to recognize it from "random data" or to understand where it begins and where it ends.

An ASCII txt could be recognized because it has only a given subset of byte values (printable characters) and CR's and LF's, and in the case of Unicode a large number of alternating 00's, but that's all you can gather by looking at a sector view on disk, there is no way to know if the "next" sector belongs to the same file or is a fragment of another similar file.

So besides the specific "unknown" recovery tool used it is to be ascertained HOW EXACTLY the delete/rename/format/whatever procedure was carried and what filesystem addressing data may have survived it (and this depends on the OS, on the specific commands and also on the filesystem used, as Joakim pointed out).

Athulin is instead making a reference to what I would consider a "further step", grammatical/language/syntax analysis.

Imagine that you have three split sentences, and you have them "found" in this order

  1. Between the devil and
  2. an angry man
  3. A hungry man is
  4. a friend indeed
  5. A friend in need is
  6. the deep sea
  7. [/listo]

    Would you take them "as given" or would you re-order them as 1-6, 3-2 and 5-4?
    What software could do that?
    And how would it work?

    jaclaz

 
Posted : 19/10/2014 8:18 pm
(@shep47)
Posts: 51
Trusted Member
 

I then copied, deleted, copied, deleted, formatted, copied, deleted some, copied, overwrote, deleted, etc etc onto the USB stick, and then worked on it, and recovered some of the deleted files from it
Ste

I may be far off the point with the limited information about the tool & methodology used but surely if you carry out the above practice and then used some linear carving process you are likely to recover fragmented patterns of ASCII once the carved items are concatenated into a file? Are the 'patterns' in 512byte (or multiples of) sections?

I think we need to know a little more about your methodology and tools.

Rgds

Shep

 
Posted : 19/10/2014 8:21 pm
Share: