Notifications

Clear all

Unscrambling text for an assignment

Page 2 / 2 Prev

Education and Training

Last Post by jaclaz 10 years ago

18 Posts

4 Users

0 Likes

1,700 Views

RSS

jaclaz

(@jaclaz)

Posts: 5133

Illustrious Member

Take a look at byte-wise histogram.

If it's not "helping too much" ? I will translate the above (for the OP benefit) to
1) get Hxd http//mh-nexus.de/en/hxd/ (as it is both portable and freeware)
2) open the file with it and choose Analysis->Statistics

@jhup
Nice example, I found myself giggling at the idea of showering scrubbing one's back with the photo of the box containing the sponge. D

jaclaz

Posted : 24/05/2014 3:21 am

athulin

(@athulin)

Posts: 1156

Noble Member

I mentioned to my lecturer I've found a lot of character pairs that exist in the text. He stated that I'm doing lexical analysis - upon wikipedia'ing it, I understand that you feed a stream of characters into a tool like lex, and using a program like c, you tokenize particular characters, like attributing an integer or string to a global variable.

You are confusing two concepts here. The 'lexical analysis' you're talking about is that used by people interpreting programming languages or structures of a similar nature that lexical analysis is the foundation for identifying the symbols (reserved words, identifiers, numbers, strings, etc.) that are used in the input. lex is a tool for creating such lexical analyzers.

The lexical analysis that your lecturer is talking about is related, in that it deals with the analysis of the smallest 'units' of a message or a text. But the purpose is quite different Instead of using the lexical units (the characters) to build up symbols/tokens (i.e. each lexical unit is well known, and need no interpretation – if you see a '[', it's a '['), you're trying to identify what the lexical units are and how they are to be interpreted.

Thus, don't bother too much with lex and such tools – unless you know for certain what you're doing.

Posted : 24/05/2014 12:08 pm

forveux

(@forveux)

Posts: 20

Eminent Member

Topic starter

Thank you for all your answers! I've checked out the statistics and seen that the graph indicates the count of letters found

http//imgur.com/w64g56C

Is the above a representation of the histogram you mentioned jhup?

And is there a way in HxD I can choose to tell the software to not display statistics for characters that appear over a threshold, say 30 times? I'm just trying to narrow down the clutter. I'd like to display the characters that appear above the 1%

Appreciate the explanation on Lexical analysis athulin

Posted : 24/05/2014 5:58 pm

jaclaz

(@jaclaz)

Posts: 5133

Illustrious Member

And is there a way in HxD I can choose to tell the software to not display statistics for characters that appear over a threshold, say 30 times? I'm just trying to narrow down the clutter. I'd like to display the characters that appear above the 1%

No (there is not such a provision in HxD) and no (you should concentrate not on the "lower" percentages/frequencies, but rather on the "higher" ones).

Try re-reading the clues jhup posted at the light of these added three ones.

The English alphabet has 26 characters, but in *almost any* text file there are a number of "non-letters", such as numbers, spaces, punctuation marks, parenthesis, and on a computer text file non printable characters, like carriage returns/line feeds and tabulators, etc.
As a matter of fact, the characters in the English alphabet are NOT 26, but rather 52, as you distinguish between UPPERCASE LETTERS and lowercase letters.
All common computer coding of text into hex values (ASCII, ANSI, etc.) use contiguous values following alphabetic order, as an example, in ASCII, A=65, B=66, C=67, etc.

[/listo]

jaclaz

Posted : 24/05/2014 9:48 pm

forveux

(@forveux)

Posts: 20

Eminent Member

Topic starter

looking at the busy area of the histogram, there is a two-byte gap, then two byte count.

I'm interpreting this as what is squared in red

http//imgur.com/9F3n70P

2E and 2F are together in the first square. And then in every other square there are 16 bits in a row. Does this mean I take the 1's and 0's that make up the 16 bits and convert into something else?

From http//www.prepressure.com/library/technology/ascii-binary-hex

2E = 46 00101110
2F = 47 00101111

Referring to the prepressure table

93 = an open bracket…hmmm

Looking at the 52 possible letters in the alphabet (upper and lower case), I'm trying to see how it fits in. Its not as simple as just 'get the hex values and compare with ACSII', evidently

Thank you in advance!

Posted : 12/06/2014 9:37 am

jaclaz

(@jaclaz)

Posts: 5133

Illustrious Member

No. cry

Let's try another way. )

Get *any* piece of text (not too short and not too long, let's say what fits in one page when printed on A4 paper).

Just copy and paste it in Notepad and save it as mytest.txt.

Open mytest.txt in a Hxd.

You will see a "pattern" (let's call it a "skyline" for simplicity wink ) where
You will have a few "skyscrapers" (possibly corresponding to 0x20, 0x61, 0x65 and 0x74).

Now compare visually that skyline against the one you get from your file.

What is the impression you get of these differences?

Aren't in mytest.txt all the "buildings" one next another with no or few "gaps" between them?
Aren't in mytest.txt all the buildings on the left side of the screen?

Which kind of algorithm do you think you may apply to mytest.txt to transform it's "skyline" view in such a way that is more similar to your crypted file?

jaclaz

Posted : 12/06/2014 1:47 pm

forveux

(@forveux)

Posts: 20

Eminent Member

Topic starter

Thanks for having faith jac!

Ok - the skyscrapers are next to one another in mytest.txt because since the english characters are one after the other in the HEX chart, it'd make sense that the skyscrapers relating to those hex values are together.

Comparing to my original file, the findings are almost the opposite and there are gaps in the original

We aren't using encryption in regards to text are we, since the ASCII characters will be outside the range of normal ASCII characters therefore rendering it impossible to read

So we are therefore relying on obfuscation through a cipher. Correct? And since the 'skyscrapers' could be deemed at opposite ends of the HEX chart, could it be a substitution cipher?

Posted : 16/06/2014 11:10 pm

jaclaz

(@jaclaz)

Posts: 5133

Illustrious Member

So we are therefore relying on obfuscation through a cipher. Correct? And since the 'skyscrapers' could be deemed at opposite ends of the HEX chart, could it be a substitution cipher?

Yes and no.

Meaning that you are

See
http//www.exploratorium.edu/brain_explorer/images/jumping2.gif

What you know now is that a "normal", "plain", "unencrypted" text has around 26 (or a few more) tallish buildings one next to the other and possibly another 26 or so smaller buildings also one next another.
On the other hand the object of the test has these gaps between "couples" of skyscrapers (and actually all the landscapes are shifted way to the right), but still if you count the total number of the buildings in your sample text and compare that number with the amount of total buildings in the encrypted text they will result similar.

So, in the original you have something *like*
1,2,3,4 ….
and in the test text you have something *like*
100,101, 104,105, 106,107

Try answering just the asked question for the moment

Which kind of algorithm do you think you may apply to mytest.txt to transform it's "skyline" view in such a way that is more similar to your crypted file?

How can you obtain the second series from the first?
Like can you just multiply the first set of numbers for a fixed multiplier?
Or can you add a fixed amount to each term of the first series?
Or can you elevate all terms of the first series to (say) the power of 2?
Etc….

Remember that you know nothing of what has been used to encrypt the original text, so you need to explore possibilities in a logical way, one step at the time while being ready to change hypothesis, but you should always start from the simpler ones.

jaclaz

Posted : 17/06/2014 2:06 am

Page 2 / 2 Prev

8 Forums
15.5 K Topics
92 K Posts
3 Online
40.1 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed