Notifications
Clear all

NAND flash disk ECC

42 Posts
6 Users
0 Reactions
5,952 Views
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

Ah, ok, now it's clear you artificially divided each of the two 4 GB chip dumps into two more manageable 2 Gb sized files ) .

Let's go on

After this action, I was able to recognize the Strucktur.
16x (512 dates bytes / 24 ECC bytes) and 64 bytes of Management Data.

For each 8k dates, I wrote would save 512 Bytes spare.
…..

For the pure ECC and mgmt data, only 448 bytes would be needed. I filled up the remainder with self own control data like NAND Nr. ; NAND Block Nr. ; NAND Page Nr; and other variables.

Here I am losing you again.
A 8K size is made of 16*512=8,192 bytes "all data" (please "data", not "dates" wink)
Then there is a whole sector of ECC, and that would be
16*24=384 "ECC checksums"
1*64=64 "Management data" ?
1*64= 64 "Unused data" ? (all 00's)?

so, to recap, the minimum "chunk you can get" is 16 kb of data, i.e. 16,384 bytes of data + 2 sectors (1,024 bytes) for the corresponding ECC an "management data".

To the better handling, I separated the Dumps again in data and spare file.

Ok.

This enabled me also the concurrence of data pages to spare file.
for example
0X800 in spare file corresponds data start address 0X8000.

Let's see if I get this right
First "chunk" of "all data only" main file is at offset 0 and corresponding first chunk in "ECC only file" is at offset 0.
Second "chunk" of "all data only" main file is at offset 8,192 (0x2000) and corresponding second chunk in "ECC only file" is at offset 512 (0x200).
Third "chunk" of "all data only" main file is at offset 16,384 (0x4000) and corresponding third chunk in "ECC only file" is at offset 1,024 (0x400).
Etc.
Is this correct?

jaclaz


   
ReplyQuote
(@ka8712)
Active Member
Joined: 15 years ago
Posts: 11
Topic starter  

(deleted)


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

Posted Sat May 25, 2013 239 am
If I understand you right, your question is about the organization of the data. Well, it seems that my USB stick was probably organized in a

bit different way, but I can tell you what I did…

First I unsoldered the 2 flash chips of the 8 GB stick and downloaded the data sheet of the manufactor. It tells the following

- each chip has 2 planes
- each plane has 4096 blocks
- each block has 128 pages
- each page has 4314 bytes

=> I dumped each chip completely and got two files, each one with 2x4096x128x4314 bytes = 4.523.556.864 bytes. Next question how is the data

spread around these 2 files?

For this I did a search for human readable ascii strings and with some luck I found some fragements of file names. I turned out that one chip

is responsible for all even byte positions, and one for all odd byte positions, for example in one chip I found "atrn" and in the other one I

found "uou " at the same byte position -> combining this byte per byte, alternating, gives two possibilities "uaotur n" or "autorun". Now it

was easy to combine these two files into one big file, with 9.047.113.728 bytes.

The next step now was to look inside a "page" how is the data organized within a page? The smallest unit that can be erased independently in

a NAND flash is a "page", which has 2x4313=8628 bytes (2 chips combined) in my case. By doing a string search again I found some directory

structures (the USB stick had a FAT32 file system on it). But the structures of the directory were "distorted", there always werer 512 bytes,

followed by 24 bytes of "something" and that repeated 16 times per page. Conclusion

=> each page contains 16 sectors
=> after each sector we have 24 bytes, possibly ECC data

16 sectors with each one having 512+24 bytes = 536 x 16 = 8576 bytes, but the page has 8628 bytes, so what is inside these 52 remaining bytes?

I dumped all of them and saw that most of them are unused, except the first few ones.

But enough for this time, I can explain this in the next post. I hope this explanation was clear up to this point, or do you have any open

questions so far?

Conclusion

=> each page contains 16 sectors
=> after each sector we have 24 bytes, possibly ECC data

16 sectors with each one having 512+24 bytes = 536 x 16 = 8576 bytes, but the page has 8628 bytes, so what is inside these 52 remaining bytes?

I dumped all of them and saw that most of them are unused, except the first few ones.

But enough for this time, I can explain this in the next post. I hope this explanation was clear up to this point, or do you have any open questions so far?

Yep, it is clear, the difference with the rs8191's experience is then seemingly in just the "exceeding data", and in page size, in your case the
4314*2=8628
16*(512+24)=8576
leaves 52 "excess data"
8628-8576=52
whilst in rs8191's the "excess data" seems to be 64 (times 2), and - to be confirmed - he has seemingly a page that is 8,192+512=8,704 as opposed to your page of 4314, in his case
8704*2=17408
2*16*(512+24)=17152
17408-17152=256

jaclaz


   
ReplyQuote
(@rs8191)
Active Member
Joined: 12 years ago
Posts: 7
 

Hello, jaclaz I have edited the previous post with the description of the NAND and equipped to better understand with an example of Hexdumpe.
KA8712 has your question, I think so, very well answered.
I'm just going to reconstruct the long way (ECC solution) and putting it on paper.
I hope that I can present this as soon as possible.
Kind regards


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

Hello, jaclaz I have edited the previous post with the description of the NAND and equipped to better understand with an example of Hexdumpe.

Thanks for that ) , but - probably it is just me 😯 - you seem like having a "quirk" for either making things more complex than needed or to mix info "liberally".

In the edited post you stated mixed/contrasting info

After this action, I was able to recognize the Strucktur.
16x (512 dates bytes / 24 ECC bytes) and 64 bytes of Management Data.

in my case the NANDs have 256Byte Data area and 14 Byte Spare area (16x256 + 16x14) x2 = 8640Byte

16*(512+24)+64=8640
2*16*(256+14)+0=8640

Maybe the actual structure is something like
2*16*(256+12)+64=8640
i.e. last 2 bytes of the 14 Byte Spare Area form the 64 bytes "excess data"?
2*16*2=64

If you combine 256 bytes (odd bytes) with 256 bytes (even bytes) you have 512 bytes alright, but if you combine 14 bytes with 14 bytes, you get 28 bytes and not 24.

I'm just going to reconstruct the long way (ECC solution) and putting it on paper.
I hope that I can present this as soon as possible.

No hurry whatsoever, better take a little time more and have things laid out as clear as possible.

jaclaz


   
ReplyQuote
jhup
 jhup
(@jhup)
Noble Member
Joined: 16 years ago
Posts: 1442
 

I know this is weird but I just try to eyeball the data in hex, if I know some known structures existed on the device, like file system structure, or text files (love log files on embedded systems!). I adjust the columns and re-arrange areas (be it chip, plane, block or page) to make sense of it.

My nightmares are crazy spans of data across chips, planes, blocks, and the neighbor's back yard… whatever the controller feels like. oops


   
ReplyQuote
(@ka8712)
Active Member
Joined: 15 years ago
Posts: 11
Topic starter  

(deleted)


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

Posted Tue May 28, 2013 1117 am
Hi all, in my setup I have some different flash geometry, but the principle seems to be the same here.

When I first read out the dump of data from the flash, I first tried to just ignore all ECC and other data and stored only the contents of the

sectors into a seperate file. But that was quite a mess, no file system structures, everything seemed to be scrambled. Well this was not

unexpected, as it is well known that those flash chips need some "wear leveling" and remap the data around the whole chip. I read some reports

that some chips have some kind of "lookup table" for the mapping, but I had no luck in finding such a thing…

But, I know that the file system was FAT32 and so I wrote a little program that searched for sectors that contain the start of a directory.

The format of a directory is well documented and a very helpful fact is that each subdirectory contains an entry "." at the very start, which

points to the cluster in which it is located. Before that I wrote a tool that searched for the boot sector of the disk, so I knew how many

sectors belong to one cluster and at which sector the numbering starts. That gave me a little list of directory start locations + FAT locations

and boot sector location, so I had the location "where things are" and "where they should be".

After some experiments I found out that the solution of this puzzle lies in the correlation between the "page tail", (the data which follows

directly after the sectors and their ecc) and the logical page locations.

The page tail in my case looks like this
40154015FFFF4B

This can be interpreted like this
- the first two 16 bit numbers should be the same (tail0 and tail1)
- bit 15 is always zero, bit 14 is always one
- bit 0 of each tail value is the parity bit
- the third 16 bit value seems to be the "state", 0xFFFF means "in use"
other values seem to mean "unused" or "no longer valid"
- the 8 bit value at the end is some kind of crc
- each value could have up to 2 bit errors in my case, so one
has to be careful in interpreting the values
=> the interesting "masked tail" value is (tail0 & 0x3FFF) >> 1

I analyzed these values and made a linear equation system from all obviously valid tails (tail0 == tail1 and parity is ok), to get the crc (->

gauss jordan algorithm).

Additionally I found out that all pages within one block normally have the same tail, so I ignored all tails within the block and took only

the first tail (if it was valid) - this gave some more blocks of valid data at the end. Seems that the manufactor does not take the tails of

pages (except the first one) not for too serious

=> Conclusion the tail is responsible for remapping blocks to some location.

It also seems that most tail values exists 4x128 times in the whole dump (in groups of 128 pages). -> A tail value therefore is unique within

a bank. Unfortunately the maximum number of blocks per bank seems not to be constant, each bank seems to have some other logical start

location. I found out this mapping

bank[0] 0x0786 blocks, first block = 0
bank[1] 0x0788 blocks, first block = 0x786
bank[2] 0x0788 blocks, first block = 0x786 + 0x788
bank[3] 0x0790 blocks, first block = 0x786 + 0x788 + 0x788

For getting the real page number within a bank, I had to take the "masked tail" value from above and remap (demix) the bits of that value bit

7 -> bit 0, bit 0 -> bit 1, bit 1-> bit 2, … , bit 8 -> bit 8, … and so on.

Here some source code to make it a bit clearer
(input page_abs absolute number of the page in the dump, tail raw tail value)

u_int64_t bank = (page_abs / PAGES_PER_BLOCK) / BLOCKS_PER_BANK;
u_int64_t block = (page_abs / PAGES_PER_BLOCK) % BLOCKS_PER_BANK;
u_int64_t page = (page_abs % PAGES_PER_BLOCK);
u_int64_t offset_src = page_abs * BYTES_PER_PAGE; // <= read from here
u_int16_t lba = ((tail & 0x3FFF) >> 1);

u_int32_t mixed = (lba * PAGES_PER_BLOCK) + page;
u_int32_t demixed = demix(mixed);
u_int16_t lba2 = demixed / PAGES_PER_BLOCK;
u_int16_t page2 = demixed % PAGES_PER_BLOCK;
u_int64_t block_dst = _first_block_of_bank[bank] + lba2;
u_int64_t page_dst = (block_dst * PAGES_PER_BLOCK) + page2;
u_int64_t offset_dst = page_dst * (SECTORS_PER_PAGE * SECTOR_DATA_SIZE); // <= write there

to be continued…

….
- each value could have up to 2 bit errors in my case, so one
has to be careful in interpreting the values
=> the interesting "masked tail" value is (tail0 & 0x3FFF) >> 1

I analyzed these values and made a linear equation system from all obviously valid tails (tail0 == tail1 and parity is ok), to get the crc (-> gauss jordan algorithm).

I guess that what one should always know in life is
http//en.wikipedia.org/wiki/The_Gambler_(song)

You got to know when to hold 'em, know when to fold 'em,
Know when to walk away and know when to run.

I'll have to walk away, while I still have some small grey cells not busy in attempting to understand these "explanations". cry

After all Vogon's Poetry
http//en.wikipedia.org/wiki/Vogon#Poetry
is not that bad.

More seriously, ka8712, it is clear that you have delved very deep in the thingy you analyzed, but - sorry to say so - you won' t get my recommendation for a job as "technical facilitator" 😯 , it is really very difficult (if possible at all) to follow your post, at least for my limited knowledge oops .

But please do continue your posts ) , maybe someone will be able to fully understand your reports and put them in more layman's terms .

jaclaz


   
ReplyQuote
(@ka8712)
Active Member
Joined: 15 years ago
Posts: 11
Topic starter  

(deleted by author)


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

Posted Thu May 30, 2013 350 am
well, don't mind! I know that these things are really hard to understand without the mathematical background. As you see from the start date of my post, I had time since ~2011 to dig deep into this matter (and of course, some of the stuff I learned at university 15 years ago was also quite helpful)

If you are interested, there are some nice explanations on wikipedia,
I suggest reading the articles which belong to these search terms (in this order)

- "system of linear equations"
- "boolean algebra"
- "Gauss-Jordan elimination"
- "Hamming distance"
- "cyclic code"
- "cyclic redundancy check"
- "BCH code"

well, don't mind! I know that these things are really hard to understand without the mathematical background.

I doubt that you can actually know at what level is my mathematical background 😯 , I personally would attribute my losing myself at your "explanations" to the actual "explanations" themselves roll (of course these are just differing opinions wink ) and to my (declared) lack of familiarity with C (or whatever language is supposed to be your "source code").

There is an evident "gap" (lack of consequence) between some of the steps you reported, in your previous post you had 52 "excess data bytes".

Suddenly (probably ? ) they changed name to "page tails" becoming only 7 significant (or decoded) bytes mapping spare sectors.
Of these 7 bytes you decode them as
2 bytes=16 bit number -> referenced/named as "tail0"
2 bytes=16 bit number -> referenced/named as "tail1"
(insert here some unrelated babbling about bit values of the above)
2 bytes=16 bit number -> NOT referenced/named -> explained as "in use state"
1 byte=8 bit number -> NOT referenced/named -> explained as "some kind of CRC"
(insert here some unreferenced babbling about 2 bit errors)
For some strange reasons (NOT explicited) it is "interesting" to mask "tail0" as "(tail0 & 0x3FFF) >> 1"

Maybe you could get a step back, attribute a univocal "name/reference" to ALL those bytes/values, provide a few examples, etc.

To clear the above I am providing a snippet of source code

++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++
..+++.>++.<<+++++++++++++++.>.+++.——.——–.>+.>.

I am sure you are familiar with
http//en.wikipedia.org/wiki/Brainfuck

jaclaz


   
ReplyQuote
Page 4 / 5
Share: