Something wrong wit...
 
Notifications
Clear all

Something wrong with FTK OCR

8 Posts
3 Users
0 Likes
845 Views
(@gorvq7222)
Posts: 229
Reputable Member
Topic starter
 

A case about business secret the suspect took lots of photos and screenshots from BOM, RD papers… We have to conduct a keyword search to find out what he/she had stolen. We're not going to spend time to "take a look" at all documents and pictures so we need to use OCR function to figure it out.

You guys could take a look at my blog to see what's going on.
http//www.cnblogs.com/pieces0310/p/5297350.html

To my surprise FTK could not recognize texts on those pics. I used to trust its OCR function, but now my confidence on FTK's OCR function is eroded.

 
Posted : 20/03/2016 10:30 am
jaclaz
(@jaclaz)
Posts: 5133
Illustrious Member
 

A case about business secret the suspect took lots of photos and screenshots from BOM, RD papers… We have to conduct a keyword search to find out what he/she had stolen. We're not going to spend time to "take a look" at all documents and pictures so we need to use OCR function to figure it out.

You guys could take a look at my blog to see what's going on.
http//www.cnblogs.com/pieces0310/p/5297350.html

To my surprise FTK could not recognize texts on those pics. I used to trust its OCR function, but now my confidence on FTK's OCR function is eroded.

From the given link

Sorry I can't show you guys contents in the evidence.

What a pity ( , thouogh I am pretty sure that trade secret thieves do use to re-caption celebrities images with witty quotes, it would have been nice to have a confirmation.

You should let the good Accessdata guys know that real-life severe limitation of their software.

On the other hand, it is possible that they were submitted a FOIA order (and adjoined gag order) making them exclude "apple" and "fbi" (+ a number of three or more letter US government agencies) from the results, and have them make a special version of the software for US government use only that can actually find those strings through OCR.

As often happens when such measures are taken there are of course "collateral damages", imagine how many trade secrets related to apple-growing and orchard can now be stolen without the digital forensics investigators finding traces.

jaclaz

 
Posted : 20/03/2016 1:11 pm
minime2k9
(@minime2k9)
Posts: 481
Honorable Member
 

Hardly surprising that OCR misses stuff.
I'm pretty sure that FTK runs on the ABBYY OCR engine as opposed to the Tesseract engine.
If its an important job you might want to consider using a tool that uses Tesseract as well as OCR from FTK.

 
Posted : 20/03/2016 1:23 pm
jaclaz
(@jaclaz)
Posts: 5133
Illustrious Member
 

Hardly surprising that OCR misses stuff.
I'm pretty sure that FTK runs on the ABBYY OCR engine as opposed to the Tesseract engine.
If its an important job you might want to consider using a tool that uses Tesseract as well as OCR from FTK.

Hmmm.
From the same blog

jaclaz

 
Posted : 20/03/2016 1:30 pm
minime2k9
(@minime2k9)
Posts: 481
Honorable Member
 

Ooops… well try ABBYY rather than Tesseract )

 
Posted : 20/03/2016 5:43 pm
jaclaz
(@jaclaz)
Posts: 5133
Illustrious Member
 

Ooops… well try ABBYY rather than Tesseract )

So, the message is "whatever you are doing, you are doing it wrong"? 😯

wink

mrgreen

jaclaz

 
Posted : 20/03/2016 7:00 pm
minime2k9
(@minime2k9)
Posts: 481
Honorable Member
 

OCR percentages can be fairly awful depending on font, size, italics/bold/etc.
Each engine has varying success with these and may work better for a different font etc.
So basically dual tooling )

 
Posted : 20/03/2016 7:06 pm
(@gorvq7222)
Posts: 229
Reputable Member
Topic starter
 

Thank you guys. FTK OCR did extract texts from "order.pdf" as shown in my blog. Unfortunate it could not handle other jpg files. If JOCR could extract texts from those jpg files, FTK should be ok, right? I believe that the OCR function checkbox right there is not just a decoration…

 
Posted : 21/03/2016 2:44 pm
Share: