A case about business secret the suspect took lots of photos and screenshots from BOM, RD papers… We have to conduct a keyword search to find out what he/she had stolen. We're not going to spend time to "take a look" at all documents and pictures so we need to use OCR function to figure it out.
You guys could take a look at my blog to see what's going on.
http//
To my surprise FTK could not recognize texts on those pics. I used to trust its OCR function, but now my confidence on FTK's OCR function is eroded.
A case about business secret the suspect took lots of photos and screenshots from BOM, RD papers… We have to conduct a keyword search to find out what he/she had stolen. We're not going to spend time to "take a look" at all documents and pictures so we need to use OCR function to figure it out.
You guys could take a look at my blog to see what's going on.
http//www.cnblogs.com/pieces0310/p/5297350.html To my surprise FTK could not recognize texts on those pics. I used to trust its OCR function, but now my confidence on FTK's OCR function is eroded.
From the given link
Sorry I can't show you guys contents in the evidence.
What a pity ( , thouogh I am pretty sure that trade secret thieves do use to re-caption celebrities images with witty quotes, it would have been nice to have a confirmation.
You should let the good Accessdata guys know that real-life severe limitation of their software.
On the other hand, it is possible that they were submitted a FOIA order (and adjoined gag order) making them exclude "apple" and "fbi" (+ a number of three or more letter US government agencies) from the results, and have them make a special version of the software for US government use only that can actually find those strings through OCR.
As often happens when such measures are taken there are of course "collateral damages", imagine how many trade secrets related to apple-growing and orchard can now be stolen without the digital forensics investigators finding traces.
jaclaz
Hardly surprising that OCR misses stuff.
I'm pretty sure that FTK runs on the ABBYY OCR engine as opposed to the Tesseract engine.
If its an important job you might want to consider using a tool that uses Tesseract as well as OCR from FTK.
Hardly surprising that OCR misses stuff.
I'm pretty sure that FTK runs on the ABBYY OCR engine as opposed to the Tesseract engine.
If its an important job you might want to consider using a tool that uses Tesseract as well as OCR from FTK.
Hmmm.
From the same blog
jaclaz
Ooops… well try ABBYY rather than Tesseract )
Ooops… well try ABBYY rather than Tesseract )
So, the message is "whatever you are doing, you are doing it wrong"? 😯
wink
mrgreen
jaclaz
OCR percentages can be fairly awful depending on font, size, italics/bold/etc.
Each engine has varying success with these and may work better for a different font etc.
So basically dual tooling )
Thank you guys. FTK OCR did extract texts from "order.pdf" as shown in my blog. Unfortunate it could not handle other jpg files. If JOCR could extract texts from those jpg files, FTK should be ok, right? I believe that the OCR function checkbox right there is not just a decoration…