Notifications

Clear all

How do you use statistics in investigations?

Page 1 / 2 Next

General (Technical, Procedural, Software, Hardware etc.)

Last Post by twjolson 15 years ago

12 Posts

6 Users

0 Reactions

908 Views

RSS

Audio

(@audio)

Estimable Member

Joined: 19 years ago

Posts: 149

Topic starter 11/10/2010 6:10 pm

We know that programming plays a role in investigations, but how much of a role does statistics have? Does anyone have any stories where standard deviation, z-score, correlation, probability, etc played a role in analysis?

I just started learning statistics, but here is an example from a brute force attack on a honeypot, where I quantified the correlation between sessions and snort alerts, which was 0.98 (very strongly related).

#!/usr/bin/perl use warnings; use strict; use StatisticsBasic qw(all);


my @session = qw(2 3 26 2 7 7 27 16 5 7 16 16 9 21 10 19 14 8 8 30 20 3 10 0 12

        4 6 7 7 6 112 236 237 605 224 13 7 14 1 6 21 111 7 32 19 13 20 16 38 12);

my @snort = qw(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

        0 0 0 0 0 46 100 99 187 63 0 0 0 0 0 0 34 0 2 1 0 0 0 1 0);

my $cor = correlation( \@session, \@snort ); print "Correlation $cor\n";
Correlation 0.98

Quote

azrael

(@azrael)

Honorable Member

Joined: 19 years ago

Posts: 656

11/10/2010 7:32 pm

Very pretty -)

I've no stories in particular, but I really, really, really think that statistics and metrics have a role to play. ( See me spouting here. ) You might also like to have a look at the following

Security Data Visualisation - Greg Conti
Applied Security Visualization - Raffael Marty
Security Metrics - Andrew Jaquith

and Andrew's mailing-list/website at www.securitymetrics.org

ReplyQuote

Audio

(@audio)

Estimable Member

Joined: 19 years ago

Posts: 149

Topic starter 11/10/2010 8:00 pm

Thanks for the links! I've read Security Data Visualization and plan on reading Security Metrics since I've heard good things about it. )

I've actually been using basic descriptive statistics like min, max, sum, average, range, etc. without thinking about it. Now I've been reading a little more on statistics and am encountering equations that I know have uses in investigations but am having trouble thinking of them.

Various correlation equations lets you quantify relationships. The correlation of 0.98 in sessions and alerts strongly links the two, or a correlation of -0.98 on data contained in a binary I would think would suggest it was encrypted.

There has to be a lot of examples of how these kinds of equations can help someone investigate an incident, but I'm having a brain block. (

ReplyQuote

azrael

(@azrael)

Honorable Member

Joined: 19 years ago

Posts: 656

11/10/2010 8:31 pm

Remember

"There are three kinds of lies lies, damned lies, and statistics."

The trouble is that a correlation is not evidence of causation - don't forget that the decline in pirate numbers is directly responsible for global warming.

And this is where forensics and statistics usually part ways I imagine ! However, there is, I think room for corrolations between data sets where other evidence supports e.g.

Suspect "A" has logged on to computer "C" at times "X", "Y" and "Z".
The websites "W1", "W2" and "W3" were defaced at times "X+n", "Y+n" and "Z+n".
There are files that are on computer "C" that showed up in the defacing.

Although no other evidence exists that Suspect "A" commited the crime, there is a corrolation between times that might lead us to believe that there may be more than co-incidence in the events.

This is, however, an opinion, and isn't reliant on fact - so unless asked directly what you "might infer" from the above - it's beyond the realm of a statment of fact.

ReplyQuote

azrael

(@azrael)

Honorable Member

Joined: 19 years ago

Posts: 656

11/10/2010 8:32 pm

Of course, your point about randomness in files & encryption is a valid practical application !

ReplyQuote

jelle

(@jelle)

Trusted Member

Joined: 18 years ago

Posts: 52

11/10/2010 8:46 pm

Another good use for statistics - not a per case but on an aggregated basis - is for describing general information security trends.

Within our team we use the VERIS framework to aggregate statistics from all our breach investigations - a practical application is the Data Breach Investigations report. Good example (in my - probably slightly biased - opinion) of how you can learn from other people's mistakes by using risk metrics for your decision making.

There is also a presentation from one of my colleagues about this subject on the Securitymetrics.org site that was already mentioned earlier.

ReplyQuote

benfindlay

(@benfindlay)

Estimable Member

Joined: 16 years ago

Posts: 142

11/10/2010 8:56 pm

The correlation of 0.98 in sessions and alerts strongly links the two, or a correlation of -0.98 on data contained in a binary I would think would suggest it was encrypted.(

Just wanted to check; your correlation here I assume is using a standard correlation co-efficient, such as the Product-Moment Correlation Co-efficient? If so, a strong negative correlation would not necessarily imply encryption. Encryption methods typically work by introducing as much entropy (chaos) as possible into the system involved, so (unless I am getting completely the wrong end of the stick) a result of zero would suggest encryption, or simply that there is actually no correlation. A result of -0.98 would suggest that there is a strong negative correlation between the 2 entities compared. In your example of snort attacks and sessions, I'm struggling to see how a strong negative correlation would occur, and what its significance would be.

Simplifying these further, a correlation of +1 is fundamentally a graph of y=x (showing perfect co-incidence), whereas a correlation of -1 is fundamentally a graph of y=-x (showing that co-indicence definitely does not occur, in fact the complete opposite!).

Just to clarify; a -ve correlation does not mean that there is no correlation, that is what zero represents.

How you determine whether a zero result represents encryption or simply nothing of interest would in itself be an interesting topic, but one that I would expect to be beyond the scope of an honours project

ReplyQuote

pbobby

(@pbobby)

Estimable Member

Joined: 16 years ago

Posts: 239

11/10/2010 9:24 pm

Check out this paper by Geoff Black

http//www.geoffblack.com/downloads/IQPC_E-Discovery_West%202010_(with_Notes)-Geoff_Black.pptx
Statistical Validation and Data Analytics in eDiscovery

And check out his blog too. It's cool, 'cause he links to mine )

ReplyQuote

Audio

(@audio)

Estimable Member

Joined: 19 years ago

Posts: 149

Topic starter 11/10/2010 9:40 pm

Remember

"There are three kinds of lies lies, damned lies, and statistics."

The trouble is that a correlation is not evidence of causation - don't forget that the decline in pirate numbers is directly responsible for global warming.

And this is where forensics and statistics usually part ways I imagine ! However, there is, I think room for corrolations between data sets where other evidence supports e.g.

Suspect "A" has logged on to computer "C" at times "X", "Y" and "Z".
The websites "W1", "W2" and "W3" were defaced at times "X+n", "Y+n" and "Z+n".
There are files that are on computer "C" that showed up in the defacing.

Although no other evidence exists that Suspect "A" commited the crime, there is a corrolation between times that might lead us to believe that there may be more than co-incidence in the events.

This is, however, an opinion, and isn't reliant on fact - so unless asked directly what you "might infer" from the above - it's beyond the realm of a statment of fact.

Thanks for the warning! It sounds like statistics might be similar to forensics in that where you can't jump to conclusions in statistics by saying correlation implies causation, you can't jump to conclusions in forensics by saying a suspect read a file because an artifact like the a-time changed. You have to test your hypothesis which is something I haven't yet learned about.

ReplyQuote

Audio

(@audio)

Estimable Member

Joined: 19 years ago

Posts: 149

Topic starter 11/10/2010 9:42 pm

The correlation of 0.98 in sessions and alerts strongly links the two, or a correlation of -0.98 on data contained in a binary I would think would suggest it was encrypted.(

Just wanted to check; your correlation here I assume is using a standard correlation co-efficient, such as the Product-Moment Correlation Co-efficient? If so, a strong negative correlation would not necessarily imply encryption. Encryption methods typically work by introducing as much entropy (chaos) as possible into the system involved, so (unless I am getting completely the wrong end of the stick) a result of zero would suggest encryption, or simply that there is actually no correlation. A result of -0.98 would suggest that there is a strong negative correlation between the 2 entities compared. In your example of snort attacks and sessions, I'm struggling to see how a strong negative correlation would occur, and what its significance would be.

Simplifying these further, a correlation of +1 is fundamentally a graph of y=x (showing perfect co-incidence), whereas a correlation of -1 is fundamentally a graph of y=-x (showing that co-indicence definitely does not occur, in fact the complete opposite!).

Just to clarify; a -ve correlation does not mean that there is no correlation, that is what zero represents.

How you determine whether a zero result represents encryption or simply nothing of interest would in itself be an interesting topic, but one that I would expect to be beyond the scope of an honours project

Thanks for setting me straight. I guess I need more than one weekend of statistics ) As for the sessions and alerts, I found the relationship first without the correlation coefficient. The snort alerts about a password guessing attack started at the same time the sessions suddenly increased due to the password guessing attack. I proved they were related first by looking at the snort and session logs, and then went and quantified how much they were related.

Is there a problem with doing it that way and do you know of any good intro to statistics books that you can recommend?

ReplyQuote

Page 1 / 2 Next

8 Forums
15.7 K Topics
92.3 K Posts
248 Online
41.1 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed