Hi,
How to know where a TEXT or BLOB field ends in a sqlite3 record ?
I have a database with the following fields sequence
(…)
- data TEXT
- timestamp INTEGER (4 bytes)
(…)
The TEXT field seem having variable length.
After checking with an hexadecimal editor, I could not see any separator character between the "data" field and the "timestamp" field.
So, where are the length of TEXT (or BLOB) fields for each record stored ?
The fact is the I'm trying to parse some corrupted databases where the header was damaged.
Other databases using the same database structure are OK, but it seems that using their header would not suffice, since sqlite3 databases are paginated.
Thanks for your advices.
The sqlite database stores the length of a text field in the page header. Basically the type and length of each record in each row is stored at the start of each page and the records (the data) are stored towards the end of each page. So to determine the length of a text value (or an integer as sqlite uses variable length integers) you need to start at the beginning of each page and then work through each record in turn.
The information you require is here (mainly sections 1.5 and 2.1)
http//
I have a program on my web site which can recover corrupt sqlite databases here
http//
The sqlite database stores the length of a text field in the page header.
I'm not sure this is quite right… Isn't it the case that the database stores the offsets to each "database cell" (which each contain a record) on a page in the page header?
The database cells themselves are made up of 3 parts
- A Payload Length value (a variable length integer giving the length in bytes of the record)
- A ROWID (a variable length integer which is a unique numerical value for each record on a table)
- The record (the length of which is given in Payload Length value)
The record itself is made up of a record header and a record body.
The record header contains a variable length integer giving the length in bytes of the header (including this value). This is followed by an array of variable length integers called "serial type codes" (their meanings are defined in the link that Paul posted). These values give you the type and length of each of the values in the record body - which follows the header directly.
In the case of strings, the serial type code will be odd and greater than 13. To get the length of the string, (in bytes) you take this value, subtract 13, and divide what's left by 2. So if you a 4 byte long string the serial type code will be 21 (0x15).
The lengths are not stored specifically but the stored type of a field determines its length
From the sqlite file format
Serial Type Content Size Meaning
0 0 NULL
1 1 8-bit twos-complement integer
2 2 Big-endian 16-bit twos-complement integer
3 3 Big-endian 24-bit twos-complement integer
4 4 Big-endian 32-bit twos-complement integer
5 6 Big-endian 48-bit twos-complement integer
6 8 Big-endian 64-bit twos-complement integer
7 8 Big-endian IEEE 754-2008 64-bit floating point number
8 0 Integer constant 0. Only available for schema format 4 and higher.
9 0 Integer constant 1. Only available for schema format 4 and higher.
10,11 Not used. Reserved for expansion.
N≥12 and even (N-12)/2 A BLOB that is (N-12)/2 bytes in length
N≥13 and odd (N-13)/2 A string in the database encoding and (N-13)/2 bytes in length. The nul terminator is omitted.
So, a type of 0 is NULL and therefore has no corresponding stored data
type 3 is a 24 bit integer so there are 3 bytes of stored data
and for a string (as per the OP's question) of say type 55 (an odd type >= 13 is a string) then the size is (55-13)/2 = 42/2 = 21 bytes
This is the danger of not posting a clear and concise reply - my intention was to get the OP to read the format doc, not to do the work for him - so I typed up a quick reply, apologies if it wasn't as clear as it could be (
My post should have been - the database stores the offset of each record after the page header. Each record has a field definition array (storing the lengths - implied) this array is followed by the data.
With reference to the OP's question there are no separators and to get a field length you need to start at the page header to determine where each record lies, read the record header to determine the type of each field. Each field is stored in order immediately following the record header.
Thats probably more info than I wanted to supply on here - because it just regurgitates the info in the file format doc, at the link supplied above, which I find is very concise and is where the OP needs to go if he wants to decode by hand.
IMHO only slightly different degrees among wink
- giving a man a fish
- directing a man to the hunting and fishing section of the public library
- teaching a man how to fish
[/listo]
D
jaclaz