Notifications

Clear all

Something wrong with FTK OCR

General (Technical, Procedural, Software, Hardware etc.)

Last Post by gorvq7222 9 years ago

8 Posts

3 Users

0 Reactions

1,330 Views

RSS

gorvq7222

(@gorvq7222)

Reputable Member

Joined: 11 years ago

Posts: 236

Topic starter 20/03/2016 11:30 am

A case about business secret the suspect took lots of photos and screenshots from BOM, RD papers… We have to conduct a keyword search to find out what he/she had stolen. We're not going to spend time to "take a look" at all documents and pictures so we need to use OCR function to figure it out.

You guys could take a look at my blog to see what's going on.
http//www.cnblogs.com/pieces0310/p/5297350.html

To my surprise FTK could not recognize texts on those pics. I used to trust its OCR function, but now my confidence on FTK's OCR function is eroded.

Quote

jaclaz

(@jaclaz)

Illustrious Member

Joined: 18 years ago

Posts: 5133

20/03/2016 2:11 pm

A case about business secret the suspect took lots of photos and screenshots from BOM, RD papers… We have to conduct a keyword search to find out what he/she had stolen. We're not going to spend time to "take a look" at all documents and pictures so we need to use OCR function to figure it out.

You guys could take a look at my blog to see what's going on.
http//www.cnblogs.com/pieces0310/p/5297350.html

To my surprise FTK could not recognize texts on those pics. I used to trust its OCR function, but now my confidence on FTK's OCR function is eroded.

From the given link

Sorry I can't show you guys contents in the evidence.

What a pity ( , thouogh I am pretty sure that trade secret thieves do use to re-caption celebrities images with witty quotes, it would have been nice to have a confirmation.

You should let the good Accessdata guys know that real-life severe limitation of their software.

On the other hand, it is possible that they were submitted a FOIA order (and adjoined gag order) making them exclude "apple" and "fbi" (+ a number of three or more letter US government agencies) from the results, and have them make a special version of the software for US government use only that can actually find those strings through OCR.

As often happens when such measures are taken there are of course "collateral damages", imagine how many trade secrets related to apple-growing and orchard can now be stolen without the digital forensics investigators finding traces.

jaclaz

ReplyQuote

minime2k9

(@minime2k9)

Honorable Member

Joined: 14 years ago

Posts: 481

20/03/2016 2:23 pm

Hardly surprising that OCR misses stuff.
I'm pretty sure that FTK runs on the ABBYY OCR engine as opposed to the Tesseract engine.
If its an important job you might want to consider using a tool that uses Tesseract as well as OCR from FTK.

ReplyQuote

jaclaz

(@jaclaz)

Illustrious Member

Joined: 18 years ago

Posts: 5133

20/03/2016 2:30 pm

Hardly surprising that OCR misses stuff.
I'm pretty sure that FTK runs on the ABBYY OCR engine as opposed to the Tesseract engine.
If its an important job you might want to consider using a tool that uses Tesseract as well as OCR from FTK.

Hmmm.
From the same blog

jaclaz

ReplyQuote

minime2k9

(@minime2k9)

Honorable Member

Joined: 14 years ago

Posts: 481

20/03/2016 6:43 pm

Ooops… well try ABBYY rather than Tesseract )

ReplyQuote

jaclaz

(@jaclaz)

Illustrious Member

Joined: 18 years ago

Posts: 5133

20/03/2016 8:00 pm

Ooops… well try ABBYY rather than Tesseract )

So, the message is "whatever you are doing, you are doing it wrong"? 😯

wink

mrgreen

jaclaz

ReplyQuote

minime2k9

(@minime2k9)

Honorable Member

Joined: 14 years ago

Posts: 481

20/03/2016 8:06 pm

OCR percentages can be fairly awful depending on font, size, italics/bold/etc.
Each engine has varying success with these and may work better for a different font etc.
So basically dual tooling )

ReplyQuote

gorvq7222

(@gorvq7222)

Reputable Member

Joined: 11 years ago

Posts: 236

Topic starter 21/03/2016 3:44 pm

Thank you guys. FTK OCR did extract texts from "order.pdf" as shown in my blog. Unfortunate it could not handle other jpg files. If JOCR could extract texts from those jpg files, FTK should be ok, right? I believe that the OCR function checkbox right there is not just a decoration…

ReplyQuote

8 Forums
15.7 K Topics
92.3 K Posts
222 Online
41.1 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed