Notifications
Clear all

NAND flash disk ECC

42 Posts
6 Users
0 Likes
2,620 Views
jaclaz
(@jaclaz)
Posts: 5133
Illustrious Member
 

I wouldn't be so sure.
USB sticks may still use (up to 8 Gb) ol' CHS.

USB sticks came along way after CHS was dead, why would they implement a rotating media solution on someting that does not rotate? makes no sense. Just because CHS works with < 8GB doesn't mean we should expect it. Clearly we cant rule it out which is why I said "likely".

Yep ) , my "not so sure" was intended just as your "likely", but with a little more "emphasis".

The way things have been implemented in the past (and still are implemented nowadays) are sometimes, in my experience, illogical or more simply outdated/a leftover form previous technologies/re-used code.

As a completely unrelated example wink , if Acer BIOS programmers can make a code to check for a single byte in the MBR
http//reboot.pro/10503/page__st__4
then I can claim that the good guys at USBest (I mean the USB controller chip programmers) may have used CHS in the ECC code (or is the ECC code coming from the actual NAND chip manufacturer? ? ).

jaclaz

 
Posted : 17/01/2011 7:26 pm
(@ka8712)
Posts: 11
Active Member
Topic starter
 

(deleted)

 
Posted : 23/01/2011 3:57 pm
PaulSanderson
(@paulsanderson)
Posts: 651
Honorable Member
 

Hi, could we please close the "CHS" topic?

Seems to have been closed about a week ago,

 
Posted : 23/01/2011 8:01 pm
jaclaz
(@jaclaz)
Posts: 5133
Illustrious Member
 

Posted Sun Jan 23, 2011 457 am
Hi, could we please close the "CHS" topic? CHS/sector addressing is done at a much higher level than I try to deal with and I think it is

proved by several sectors filled with 0xFF that this does not influence the ECC in any way.

Just to come back to my previous questions
I found some sectors that are completely filled with 0x00 => they have no visible ECC at the end (also 0x00). Doesn't this prove that the seed

of the algorithm must be zero as well?

And another question, just to be sure I got it right if the datasheet of the controller says that it has hardware support for "BCH-14 or

BCH-8", does that mean that the grade of the generator polynomial has grade 14 or 8 and that the output of such a "hardware unit" spits out 14

or 8 bits as result?

Seems to have been closed about a week ago,

I cannot but confirm. )

jaclaz

 
Posted : 23/01/2011 8:17 pm
(@rs8191)
Posts: 7
Active Member
 

Hello
I was exact in the same position! (reconstruct the complete filesystem, but all the images are seriously
damaged caused of thousands of "bit-flips" that occurred on the NAND-flashes)
with try and trai I finde out the solution
The algorithmus is a bch 14 code [8373,8191,14]. The minimal polynom is {1,0,0,0,0,0,0,0,0,1,1,0,1,1} (0X201B)

1. Place the 512 * 0XFF field at bit pos 0-4095 (fill the rest with NULL) and calculate the ecc value.
2. clear the data field and place the new calculated ecc value at bit pos 3699.
3. calculate now again the ecc.
BINGO -> 0x59 0xC0 0xED 0x58 …. 0xD2 0x46 0x90
I hope this mini description will you help.
kind regards

 
Posted : 12/05/2013 12:08 am
jaclaz
(@jaclaz)
Posts: 5133
Illustrious Member
 

I hope this mini description will you help.
kind regards

If you could spend some more time better explaining the actual procedure you used, which tools (if any), etc., I am sure that your contribution would be much more helpful.
Also some reference (again if any) to documents containing the algorithm would be very helpful.

jaclaz

 
Posted : 12/05/2013 9:08 pm
(@ka8712)
Posts: 11
Active Member
Topic starter
 

(deleted)

 
Posted : 14/05/2013 11:25 pm
jaclaz
(@jaclaz)
Posts: 5133
Illustrious Member
 

Posted Tue May 14, 2013 1125 am
Hi, I am also *very* curious how you came to these assumptions! …

Since I started this thread long time ago I continued to work on that problem for months and got some partial success.

My approach was to find the system matrix of the ECC block. For getting this, I took all sectors of the image and sorted them by some "score". Sector which had many duplicates got higher score, those which were unused or damaged got score zero. This gave some few million rows of data. Then I sorted out those which had a too small minimal hamming distance "d" compared to other rows (assuming that those might contain multi bit errors), a limit of "d" ~16 bits gave quite good results. A too small value of "d" gives an unsolvable equation, a too high value left not enough independent vectors to solve the equation.

This left a linear equation system with ~300000 rows, each with 4096+182 columns, which I solved by standard Gauss-Jordan and returned a complete solution with exactly 4096 rows and 182 columns - perfect

With this system matrix I can perfectly compute the ECC of all sectors, find out which one are damaged and which ones are ok. By using a simple trial and error (loop over all bits toggle, calculate and check ECC) I can fix all single bit and even 2-bit errors. Unfortunately this is not sufficient and gave a recovery rate of some few percents only. Using trial & error for higher bit error counts is unfortunately not really feasible as the effort rises by ~4096^n

My next approach was to get the generator polynome of the ECC. The theory says that it is the response of the system to the vector [0,0,0,…,0,1]. Unfortunately this does not work, no matter how I order the bits, the bytes, or whatever, I tried all kinds of bit- and byte-swapping and reversing, with no luck

My conclusion now is that either the original chip implemented some kind of "scrambling" of input and output data, or the data takes some "extra rounds" through the shift registers.

Now I have no idea how to continue solving this puzzle…

Which maybe you could share with us…. (possibly) in more layman terms.

Personally I can understand more or less 10% of your post, and maybe 5% of rs8191's one. 😯

I would expect (if possible/allowed/etc. of course) either a simple algorithm or (better) a working program/spreadsheet/whatever that puts the results into practice.

I believe (since the ECC is built-in in the controller and the controller must not be a "Cray-1") that an algorithm must exist and it must be a failrly simple one, due to the limited computing power and the speed with which data is written.

Would these documents from Samsung help at all?
http//www.elnec.com/sw/samsung_ecc_algorithm_for_256b.pdf
http//www.elnec.com/sw/samsung_ecc_algorithm_for_512b.pdf
This
http//www.infradead.org/pipermail/linux-mtd/2004-March/009500.html
says that it exists C code for the algorithm
https://dev.openwrt.org/browser/trunk/target/linux/generic-2.6/files-2.6.26/fs/yaffs2/yaffs_ecc.c?rev=11427
https://dev.openwrt.org/browser/trunk/tools/firmware-utils/src/nand_ecc.c?rev=16458

Also, it seems like there is a library in Linux
http//lwn.net/Articles/426856/

And maybe this is useful as well
http//www.channelscience.com/ecc.html

jaclaz

 
Posted : 15/05/2013 1:27 am
(@ka8712)
Posts: 11
Active Member
Topic starter
 

(deleted)

 
Posted : 15/05/2013 2:16 am
jaclaz
(@jaclaz)
Posts: 5133
Illustrious Member
 

Would these documents from Samsung help at all?

www.elnec.com/sw/samsu...r_256b.pdf
www.elnec.com/sw/samsu...r_512b.pdf

Sorry, but no. The samsung algorithm is pretty primitive and has nothing to do with BCH (which is the algorithm used, for sure).

This
www.infradead.org/pipe...09500.html
says that it exists C code for the algorithm

jaclaz

My problem is not the lack of source code or code templates, there are many many examples around and you can implement the algorithm itself quite easily if you have understood how it works.

It's more about how to retrieve the parameters the algorithm uses. There are a lot of things that you need to know, e.g. block size, ecc size, bit order, polynome size, number of passes of the shift registers, input/output scrambling etc.

Maybe rs8191 can give us some hints …

My problem is not the lack of source code or code templates, there are many many examples around and you can implement the algorithm itself quite easily if you have understood how it works.

Never said it was your problem, as a matter of fact I just said how that was my problem (actually two of them)

  1. understanding the way it works
  2. having some code or examples I can understand
  3. [/listo]
    I thought that if you shared what you have done so far, maybe that would have helped me (and possibly someone else still at my same primitive level).

    It's more about how to retrieve the parameters the algorithm uses. There are a lot of things that you need to know, e.g. block size, ecc size, bit order, polynome size, number of passes of the shift registers, input/output scrambling etc.

    Sure, and that is your problem.

    Maybe rs8191 can give us some hints wink …

    That's what I hope, maybe it could help with all the three problems.

    jaclaz

 
Posted : 15/05/2013 3:28 pm
Page 2 / 5
Share: