Recovery of SQLite ...
 
Notifications
Clear all

Recovery of SQLite deleted records

9 Posts
2 Users
0 Reactions
1,781 Views
 p4c0
(@p4c0)
Active Member
Joined: 17 years ago
Posts: 9
Topic starter  

Hi all,

I'm trying to understand how to extract/recover deleted records from sqlite3 db file. I read some posts in the forum (and related external articles pointed out), especially related to the WAL mechanism and the "vacuum feature".
My questions/doubts are the following

- I had the following output using sqlite3_analyzer. Is it correct that "Pages of auto-vacuum overhead 1" means that there is 1 page of old entries (marked as deleted) to be "vacuumed" but still into the db? Of course, the content of this page is ignored by the normal sqlite engine, so how do I extract these content?
Page size in bytes.................... 1024
Pages in the whole file (measured).... 10
Pages in the whole file (calculated).. 10
Pages that store data................. 9 90.0%
Pages on the freelist (per header).... 0 0.0%
Pages on the freelist (calculated).... 0 0.0%
Pages of auto-vacuum overhead......... 1 10.0%
Number of tables in the database...... 8
Number of indices..................... 0
Number of named indices............... 0
Automatically generated indices....... 0
Size of the file in bytes............. 10240
Bytes of user payload stored.......... 522 5.1%

- If in the same folder where the .db file is located I have the corresponding (and not empty) WAL file, opening the .db without the WAL should give me as output the table status before the last commit, which is in the WAL file, is that correct?

- what's exactly the role (and the content) of the corresponding SHM file in all this?

Thank you very much.

P4c0


   
Quote
(@alexc)
Reputable Member
Joined: 16 years ago
Posts: 301
 

Hi all,

- I had the following output using sqlite3_analyzer. Is it correct that "Pages of auto-vacuum overhead 1" means that there is 1 page of old entries (marked as deleted) to be "vacuumed" but still into the db? Of course, the content of this page is ignored by the normal sqlite engine, so how do I extract these content?

The overhead refers to the fact that in order for autovacuum to operate, there must be one or more pages designated as Pointer Map pages. The overhead in this case tells you that you have 1 pointer map page present. There are no pages in the freelist, so no, there are no pages left which are awaiting vacuuming.

- If in the same folder where the .db file is located I have the corresponding (and not empty) WAL file, opening the .db without the WAL should give me as output the table status before the last commit, which is in the WAL file, is that correct?

Since the last WAL CHECKPOINT. You can commit a transaction and still have all of the data residing in the WAL.
I wrote a blog post a while back which I think answers this question in a fair bit of detail. You can find it here.

- what's exactly the role (and the content) of the corresponding SHM file in all this?

Thank you very much.

P4c0

The SHM file is an index for the WAL. The way the WAL file is structured means you have to read the whole thing to work out what's going on, the SHM basically gives the database engine the offsets of the most recent versions of each page in the WAL. That said, it doesn't NEED it to be present for WAL to work, it just speeds things up. If you delete the SHM file it'll just get rebuilt next time you access the database.

If you don't have this link http//sqlite.org/fileformat2.html it's worth perusing. It explains a lot of what goes on behind the scenes in SQLite.

Hopefully that helps. Let me know if you have any other questions I'll try my best to assist.


   
ReplyQuote
 p4c0
(@p4c0)
Active Member
Joined: 17 years ago
Posts: 9
Topic starter  

Hi Alex,

Thanks for your reply. I had read your post (is the one I was referring to actually), it's indeed very valuable. Thanks.
But I'm still missing something.

Hi all,

- I had the following output using sqlite3_analyzer. Is it correct that "Pages of auto-vacuum overhead 1" means that there is 1 page of old entries (marked as deleted) to be "vacuumed" but still into the db? Of course, the content of this page is ignored by the normal sqlite engine, so how do I extract these content?

The overhead refers to the fact that in order for autovacuum to operate, there must be one or more pages designated as Pointer Map pages. The overhead in this case tells you that you have 1 pointer map page present. There are no pages in the freelist, so no, there are no pages left which are awaiting vacuuming.

Ok, so can you help me understanding this? Looking at the following outputs of sqlite3_analyzer of a db before issuing the vacuum command
Page size in bytes.................... 1024
Pages in the whole file (measured).... 12
Pages in the whole file (calculated).. 12
Pages that store data................. 11 91.7%
Pages on the freelist (per header).... 0 0.0%
Pages on the freelist (calculated).... 0 0.0%
Pages of auto-vacuum overhead......... 1 8.3%
Number of tables in the database...... 6
Number of indices..................... 3
Number of named indices............... 0
Automatically generated indices....... 3
Size of the file in bytes............. 12288
Bytes of user payload stored.......... 7 0.057%

and this is after
Page size in bytes.................... 1024
Pages in the whole file (measured).... 10
Pages in the whole file (calculated).. 10
Pages that store data................. 9 90.0%
Pages on the freelist (per header).... 0 0.0%
Pages on the freelist (calculated).... 0 0.0%
Pages of auto-vacuum overhead......... 1 10.0%
Number of tables in the database...... 6
Number of indices..................... 3
Number of named indices............... 0
Automatically generated indices....... 3
Size of the file in bytes............. 10240
Bytes of user payload stored.......... 7 0.068%

The size of the db file shrank from 12K to 10K, which tells me that some data have been actually "vacuumed" from the db, am I right? If so, even in the first case there are no pages in the freelist, so how am I suppose to discriminate if there are still hidden data to be vacuumed out? Only by issuing the command and checking the db file size?


   
ReplyQuote
(@alexc)
Reputable Member
Joined: 16 years ago
Posts: 301
 

The routine executed by an Autovacuum operation is NOT the same as issuing a VACUUM command

http//www.sqlite.org/lang_vacuum.html

In short Autovacuum removes freepages, a VACUUM command actually consolidates the data which may be spread across multiple pages.

For example, imagine a situation where there were 4 pages in the database which were formally full of live records, you do a bunch of deletions, so now they only had 1 live record on each. Autovacuum would not do anything to these pages - they are not "free pages" as long as at least one live record is still present, on the other hand issuing VACUUM would take each of these remaining 4 live records, put them on a new page together and jettison the 4 old pages (assuming those records were all from the same table of course - SQLite won't put records from different tables on the same page).

Does that make sense?


   
ReplyQuote
 p4c0
(@p4c0)
Active Member
Joined: 17 years ago
Posts: 9
Topic starter  

Ok thanks, that make sense )

To summarize (hopefully I got it right ) )
1 - A freepage is a page with Zero live records;
2 - If the "autovacuum"/checkpoint didn't take place yet (for whatever reasons), I may find that there are freepages from the output of sqlite3_analyzer;
3 - These freepages may contain old records which I will not see querying the db normally, because it's transparent to the db engine;
4 - If there are no freepages, but vacuum command shrinks the size of the db, this may mean that it could be that there were "dead records" in the db (as from your example, 1 live record only on each page of the db).

If all the above is (hopefully) correct
- If I find freepages, how do I extract them out from the db?
- In the case of point 4, would be enough to do
diff file_name.db file_name_vacuumed.dbto get out the eventual (probably unstructured) dead records?

Thanks


   
ReplyQuote
(@alexc)
Reputable Member
Joined: 16 years ago
Posts: 301
 

That…might work if it was a binary diff…I wouldn't want to deal with the output though, not by hand at least . Also you can can have deleted records on non-free pages (and this approach will miss those).

If you want to output the freelist the best thing to do is to traverse it directly. The structure of the Freelist is detailed on the file format link and is a pretty straight forward linked list (you'll need to read a few values out of the header as well ie the Page Size and page number of the first freelist page).

That being said (and I've tried really hard to stay away from anything approaching advertising because it makes me feel dirty), but, have you had a look at epilog? I'm obviously biased (as I wrote it) but I think that it's pretty comprehensive when it comes to recovering SQLite records and will be far less painful! But if this is more of a learning exercise then obviously I don't want to dissuade you from digging deeper yourself!


   
ReplyQuote
 p4c0
(@p4c0)
Active Member
Joined: 17 years ago
Posts: 9
Topic starter  

Ahah, don't worry… after all the helpful information you provided in all the sqlite related posts, I don't think anyone could (and should dare to) say anything if you mention epilog. Which, btw, I had checked of course because seems to be one of the few useful tools for this. Is there any trial-version? I think I didn't see it mentioned on the website, it would be useful to try though.

In the meantime, I'll try to do two python "exercise" scripts
- one to carve out "dead records" from active pages;
- another one to go through the db file to extract eventual freepages;

Not sure how long it will take, but it sounds like a nice exercise ). If successful, I'll post here the code.

One more question/curiosity (sorry if I'm flooding you, new to db forensics) what about db that uses still "rollback journal"? I found many of this in android apps. Is there any potential interesting "recovery trick" even if the file_name.db-journal size is zero?


   
ReplyQuote
(@alexc)
Reputable Member
Joined: 16 years ago
Posts: 301
 

Old rollback journals. If they are 0 bytes…well if you have a physical image of the storage media you could take a look around the area where the truncated file begins, but the way YAFFS2 works…that probably wouldn't be that useful.

That said, Epilog can do a search for records against an arbitrary block of data (like a flash dump/image) and you could recover the deleted records that way - we've done that a lot of times on Android and you can have a lot of success that way.

If you'd like a trial version drop epilog@ccl-forensics.com a line and we'll see what we can do )


   
ReplyQuote
(@alexc)
Reputable Member
Joined: 16 years ago
Posts: 301
 

Incidentally, Epilog started out life as a Python script (a BIG one, I don't think I've approached anything the same size since - though that might just be because I'm a better programmer now!) so I wish you the best of luck on your Pythonic Journey!


   
ReplyQuote
Share: