Something wrong wit...
 
Notifications
Clear all

Something wrong with FTK OCR

8 Posts
3 Users
0 Reactions
1,330 Views
(@gorvq7222)
Reputable Member
Joined: 11 years ago
Posts: 236
Topic starter  

A case about business secret the suspect took lots of photos and screenshots from BOM, RD papers… We have to conduct a keyword search to find out what he/she had stolen. We're not going to spend time to "take a look" at all documents and pictures so we need to use OCR function to figure it out.

You guys could take a look at my blog to see what's going on.
http//www.cnblogs.com/pieces0310/p/5297350.html

To my surprise FTK could not recognize texts on those pics. I used to trust its OCR function, but now my confidence on FTK's OCR function is eroded.


   
Quote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

A case about business secret the suspect took lots of photos and screenshots from BOM, RD papers… We have to conduct a keyword search to find out what he/she had stolen. We're not going to spend time to "take a look" at all documents and pictures so we need to use OCR function to figure it out.

You guys could take a look at my blog to see what's going on.
http//www.cnblogs.com/pieces0310/p/5297350.html

To my surprise FTK could not recognize texts on those pics. I used to trust its OCR function, but now my confidence on FTK's OCR function is eroded.

From the given link

Sorry I can't show you guys contents in the evidence.

What a pity ( , thouogh I am pretty sure that trade secret thieves do use to re-caption celebrities images with witty quotes, it would have been nice to have a confirmation.

You should let the good Accessdata guys know that real-life severe limitation of their software.

On the other hand, it is possible that they were submitted a FOIA order (and adjoined gag order) making them exclude "apple" and "fbi" (+ a number of three or more letter US government agencies) from the results, and have them make a special version of the software for US government use only that can actually find those strings through OCR.

As often happens when such measures are taken there are of course "collateral damages", imagine how many trade secrets related to apple-growing and orchard can now be stolen without the digital forensics investigators finding traces.

jaclaz


   
ReplyQuote
minime2k9
(@minime2k9)
Honorable Member
Joined: 14 years ago
Posts: 481
 

Hardly surprising that OCR misses stuff.
I'm pretty sure that FTK runs on the ABBYY OCR engine as opposed to the Tesseract engine.
If its an important job you might want to consider using a tool that uses Tesseract as well as OCR from FTK.


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

Hardly surprising that OCR misses stuff.
I'm pretty sure that FTK runs on the ABBYY OCR engine as opposed to the Tesseract engine.
If its an important job you might want to consider using a tool that uses Tesseract as well as OCR from FTK.

Hmmm.
From the same blog

jaclaz


   
ReplyQuote
minime2k9
(@minime2k9)
Honorable Member
Joined: 14 years ago
Posts: 481
 

Ooops… well try ABBYY rather than Tesseract )


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

Ooops… well try ABBYY rather than Tesseract )

So, the message is "whatever you are doing, you are doing it wrong"? 😯

wink

mrgreen

jaclaz


   
ReplyQuote
minime2k9
(@minime2k9)
Honorable Member
Joined: 14 years ago
Posts: 481
 

OCR percentages can be fairly awful depending on font, size, italics/bold/etc.
Each engine has varying success with these and may work better for a different font etc.
So basically dual tooling )


   
ReplyQuote
(@gorvq7222)
Reputable Member
Joined: 11 years ago
Posts: 236
Topic starter  

Thank you guys. FTK OCR did extract texts from "order.pdf" as shown in my blog. Unfortunate it could not handle other jpg files. If JOCR could extract texts from those jpg files, FTK should be ok, right? I believe that the OCR function checkbox right there is not just a decoration…


   
ReplyQuote
Share: