Could two seemingly...
 
Notifications
Clear all

Could two seemingly identical PDF files contain different information?

16 Posts
3 Users
9 Reactions
3,270 Views
(@ray-m)
Eminent Member
Joined: 3 years ago
Posts: 18
Topic starter  

@c-r-s 

Thank you very much for the explaination! I appreciate your time and effort a lot.

I have one more question regarding file system metadata: Reading the first paragraph of your latest post I would assume that sending a PDF file to another device without making any effort of also transferring the file system the file is stored on would cause any file system metadata to be lost (or rather: remain on the device the file is being sent from and not be copied to the recipient's device). Is that correct? 

If so, this seems to be conflicting with something the user @tuckerhst wrote on another topic about PDF metadata (here: https://www.forensicfocus.com/forums/general/pdf-metadata-2/#post-6606330 ). In the fifth post he writes that some file system metadata is usually transferred along with the file (the topic is about the scenario of sending a PDf file via email to a different device). Do you happen to know what data he could mean by that, and is this really conflicting with what you wrote? 

 


   
ReplyQuote
(@c-r-s)
Estimable Member
Joined: 14 years ago
Posts: 170
 

@ray-m To be precise, name and extension of a file are file system metadata that is usually transferred with a file, e.g. by a mail client. If an application wants to create a new file, it needs to provide this information, hence it makes sense for the application to enforce certain extensions and to suggest a name to the user, e.g. the extension that matches the native file type of the applicaiton or the name that the email attachment originally had and was transmitted in the email. As this requirement is very generic, it gives no hint to which file system the file had been stored on.

Do you mean this sentence?

...and within a strictly controlled closed system, ensure that file transfer services maintain the integrity of the usage log metadata...

I don't think he means here that other file system metadata is "usually" transferred, as he talks about options for new file systems. At least it is not the case via email.

Think of it this way: The file system metadata is maintained by the file system and not the user application that accesses a file. It depends entirely on the permissions that the application runs with, which file system metadata it can request to read or to change. Due to purposes of file system metadata, e.g. to display when a file has been created on a file system, a user mode application does not have permissions to request changes to such data.

Of course, there are many cases where you need to replicate file system metadata and you use appropriate tools and permission settings to allow that. For example, if you migrate enterprise storage, you want to keep all the ACLs and time stamps, archive bits etc. and can use e.g. robocopy to read the meta data from the source file system and set it in the destination file system. However, this is not the way how files are regularly exchanged between users.


   
ray m reacted
ReplyQuote
(@ray-m)
Eminent Member
Joined: 3 years ago
Posts: 18
Topic starter  

@c-r-s Thanks again!

I actually meant another part of what tuckerhst wrote, but your reply nevertheless covers perfectly what interested me.

 

In the past few days I've read about something else that made me curious: so called extended file attributes. I haven't found a lot of information about that topic, and some of the things I've read seem contradicting to me. The basic idea of these EFAs seems to be that they enable associating file system metadata that's not being interpreted by the file system with files. Also they are supposed to work across file systems and operating systems. 

Considering these points EFAs appear to be a way of associating a piece of information (possibly some sort of ID) with a file without altering that files' content. If they would indeed work in a cross-FS way I would expect them to be preserved when a file is being sent across devices - so this method should enable transferring information while at the same time not being detectable by simply analyzing the file. 

However, from some sources I've read I got the impression that they work slightly different (or the term "extended file attribute" might even refer to a different concept) on different operating systems - while other sources cleary gave the opposite impression. Also, there doesn't seem to be a way of displaying or editing EFAs on Windows. 

Despite the idea of cross-OS and -FS compatibility I've even read somewhere that simply saving a file on a non-NTFS-storage or zipping and unzipping would cause EFAs to be deleted. I did some experimenting with a Linux command to add and alter EFAs (getfattr, setfattr) and found indeed that sending a file via email or zipping/unzipping it caused the added EFAs to be lost. Now, while that's in line with what you, @c-r-s, wrote in the second last paragraph of your latest reply about metadata in general, it contradicts the idea of how specifically EFAs seem to be supposed to work. 

Could someone please explain these extended file attributes to me?

I would be specifically interested in

1. whether the term describe the same concept across different operating systems.

2. whether they would work in the way described above to transmit information via email.

3. how one could display or edit them on Windows.


   
ReplyQuote
(@c-r-s)
Estimable Member
Joined: 14 years ago
Posts: 170
 
Posted by: @ray-m

1. whether the term describe the same concept across different operating systems.

I wouldn't say so. They are "extended" in the sense that they add features which aren't required by the file system - a difficult categorization, because they still are necessarily implemented in the file system and either supported or not. Therfore, they are as specific to the file system as any other file system metadata, even though compatibility may exist between different file systems.

Compatibility means that two file systems can hold the same metadata for a file. A software for transferring files is most likely to support that compatibility, if two file systems are mounted locally. If you have to go through a network protocol/file system or an application encoding, like email, you are facing their specific limitations. You can pass practically any data via any channel, if software on both ends works hand in hand to do so. But a regular email client is designed to encode only the file data stream and attach it together with information on the original file name and extension.

A good example for EFAs is the zone identifier ADS on NTFS that is interpreted the Windows Explorer or Office. I consider that a niche application with a purely local function. Wikipedia lists several other purposes of EFAs, which don't seem practical or commonplace to me:
Storing an author? Relevant file types either can store this internal metadata, or, if you create a DMS that needs additional data fields, it is preferable to use a file system-agnostic overlay, e.g. stored in a database, in order to support multiple platforms.
Are there file systems that store checksums for application use? I don't know. But if we are talking about ZFS or ReFS integrity streams, they are not "extended" but interpreted by the file system, because you don't want the file system to return incorrect data in general and not leave the checks to each individual application.


   
ray m reacted
ReplyQuote
(@ray-m)
Eminent Member
Joined: 3 years ago
Posts: 18
Topic starter  

@c-r-s 

Thank you! Your replies have been very helpful for me.


   
ReplyQuote
(@ray-m)
Eminent Member
Joined: 3 years ago
Posts: 18
Topic starter  

I actually have one last question regarding metadata in general. I feel like you, @c-r-s , already answered it, but  I'd like to make sure that I got everything right: In one of your replies above you differentiated the "payload" / primary data stream of a file from (depending on the file system) possible (but not necessarily existent) alternate data streams and file system metadata. If I understand it correctly, this primary data stream is the part of the file which contains/constitutes the actual file content, and every change to this part of the file would be apparent if one performed a byte-by-byte comparison (and in the case of a PDF file this would be the part which would show if one opened the file raw?). Furthermore, this is the part of the file which is (usually) hashed, and it also is the part of the file which is being transferred when it's being sent as an email attachment.

So, in sum, every piece of metadata (or data in general for that matter) which isn't system file meta data and isn't stored inside something like an alternate data stream would 1. be included in a byte-by-byte comparison and 2. be expected to be hashed. Is this correct?

The reason I'm asking this is that in another topic about PDF metadata it has been explained to me that PDF metadata is stored inside the file and affects the hash, as long as it isn't file system metadata. I wasn't sure if this only applies specifically to PDF metadata or to metadata in general. After reading and thinking about your replies (and the replies of the other helpful users) I suspect the latter to be true, but would like to be sure.


   
ReplyQuote
Page 2 / 2
Share: