How do you use stat...
 
Notifications
Clear all

How do you use statistics in investigations?

12 Posts
6 Users
0 Reactions
908 Views
(@audio)
Estimable Member
Joined: 19 years ago
Posts: 149
Topic starter  

We know that programming plays a role in investigations, but how much of a role does statistics have? Does anyone have any stories where standard deviation, z-score, correlation, probability, etc played a role in analysis?

I just started learning statistics, but here is an example from a brute force attack on a honeypot, where I quantified the correlation between sessions and snort alerts, which was 0.98 (very strongly related).

#!/usr/bin/perl
use warnings;
use strict;
use StatisticsBasic qw(all);

my @session = qw(2 3 26 2 7 7 27 16 5 7 16 16 9 21 10 19 14 8 8 30 20 3 10 0 12
4 6 7 7 6 112 236 237 605 224 13 7 14 1 6 21 111 7 32 19 13 20 16 38 12);
my @snort = qw(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 46 100 99 187 63 0 0 0 0 0 0 34 0 2 1 0 0 0 1 0);

my $cor = correlation( \@session, \@snort );
print "Correlation $cor\n";

Correlation 0.98


   
Quote
azrael
(@azrael)
Honorable Member
Joined: 19 years ago
Posts: 656
 

Very pretty -)

I've no stories in particular, but I really, really, really think that statistics and metrics have a role to play. ( See me spouting here. ) You might also like to have a look at the following

Security Data Visualisation - Greg Conti
Applied Security Visualization - Raffael Marty
Security Metrics - Andrew Jaquith

and Andrew's mailing-list/website at www.securitymetrics.org


   
ReplyQuote
(@audio)
Estimable Member
Joined: 19 years ago
Posts: 149
Topic starter  

Thanks for the links! I've read Security Data Visualization and plan on reading Security Metrics since I've heard good things about it. )

I've actually been using basic descriptive statistics like min, max, sum, average, range, etc. without thinking about it. Now I've been reading a little more on statistics and am encountering equations that I know have uses in investigations but am having trouble thinking of them.

Various correlation equations lets you quantify relationships. The correlation of 0.98 in sessions and alerts strongly links the two, or a correlation of -0.98 on data contained in a binary I would think would suggest it was encrypted.

There has to be a lot of examples of how these kinds of equations can help someone investigate an incident, but I'm having a brain block. (


   
ReplyQuote
azrael
(@azrael)
Honorable Member
Joined: 19 years ago
Posts: 656
 

Remember

"There are three kinds of lies lies, damned lies, and statistics."

The trouble is that a correlation is not evidence of causation - don't forget that the decline in pirate numbers is directly responsible for global warming.

And this is where forensics and statistics usually part ways I imagine ! However, there is, I think room for corrolations between data sets where other evidence supports e.g.

Suspect "A" has logged on to computer "C" at times "X", "Y" and "Z".
The websites "W1", "W2" and "W3" were defaced at times "X+n", "Y+n" and "Z+n".
There are files that are on computer "C" that showed up in the defacing.

Although no other evidence exists that Suspect "A" commited the crime, there is a corrolation between times that might lead us to believe that there may be more than co-incidence in the events.

This is, however, an opinion, and isn't reliant on fact - so unless asked directly what you "might infer" from the above - it's beyond the realm of a statment of fact.


   
ReplyQuote
azrael
(@azrael)
Honorable Member
Joined: 19 years ago
Posts: 656
 

Of course, your point about randomness in files & encryption is a valid practical application !


   
ReplyQuote
(@jelle)
Trusted Member
Joined: 18 years ago
Posts: 52
 

Another good use for statistics - not a per case but on an aggregated basis - is for describing general information security trends.

Within our team we use the VERIS framework to aggregate statistics from all our breach investigations - a practical application is the Data Breach Investigations report. Good example (in my - probably slightly biased - opinion) of how you can learn from other people's mistakes by using risk metrics for your decision making.

There is also a presentation from one of my colleagues about this subject on the Securitymetrics.org site that was already mentioned earlier.


   
ReplyQuote
benfindlay
(@benfindlay)
Estimable Member
Joined: 16 years ago
Posts: 142
 

The correlation of 0.98 in sessions and alerts strongly links the two, or a correlation of -0.98 on data contained in a binary I would think would suggest it was encrypted.(

Just wanted to check; your correlation here I assume is using a standard correlation co-efficient, such as the Product-Moment Correlation Co-efficient? If so, a strong negative correlation would not necessarily imply encryption. Encryption methods typically work by introducing as much entropy (chaos) as possible into the system involved, so (unless I am getting completely the wrong end of the stick) a result of zero would suggest encryption, or simply that there is actually no correlation. A result of -0.98 would suggest that there is a strong negative correlation between the 2 entities compared. In your example of snort attacks and sessions, I'm struggling to see how a strong negative correlation would occur, and what its significance would be.

Simplifying these further, a correlation of +1 is fundamentally a graph of y=x (showing perfect co-incidence), whereas a correlation of -1 is fundamentally a graph of y=-x (showing that co-indicence definitely does not occur, in fact the complete opposite!).

Just to clarify; a -ve correlation does not mean that there is no correlation, that is what zero represents.

How you determine whether a zero result represents encryption or simply nothing of interest would in itself be an interesting topic, but one that I would expect to be beyond the scope of an honours project


   
ReplyQuote
pbobby
(@pbobby)
Estimable Member
Joined: 16 years ago
Posts: 239
 

Check out this paper by Geoff Black

http//www.geoffblack.com/downloads/IQPC_E-Discovery_West%202010_(with_Notes)-Geoff_Black.pptx
Statistical Validation and Data Analytics in eDiscovery

And check out his blog too. It's cool, 'cause he links to mine )


   
ReplyQuote
(@audio)
Estimable Member
Joined: 19 years ago
Posts: 149
Topic starter  

Remember

"There are three kinds of lies lies, damned lies, and statistics."

The trouble is that a correlation is not evidence of causation - don't forget that the decline in pirate numbers is directly responsible for global warming.

And this is where forensics and statistics usually part ways I imagine ! However, there is, I think room for corrolations between data sets where other evidence supports e.g.

Suspect "A" has logged on to computer "C" at times "X", "Y" and "Z".
The websites "W1", "W2" and "W3" were defaced at times "X+n", "Y+n" and "Z+n".
There are files that are on computer "C" that showed up in the defacing.

Although no other evidence exists that Suspect "A" commited the crime, there is a corrolation between times that might lead us to believe that there may be more than co-incidence in the events.

This is, however, an opinion, and isn't reliant on fact - so unless asked directly what you "might infer" from the above - it's beyond the realm of a statment of fact.

Thanks for the warning! It sounds like statistics might be similar to forensics in that where you can't jump to conclusions in statistics by saying correlation implies causation, you can't jump to conclusions in forensics by saying a suspect read a file because an artifact like the a-time changed. You have to test your hypothesis which is something I haven't yet learned about.


   
ReplyQuote
(@audio)
Estimable Member
Joined: 19 years ago
Posts: 149
Topic starter  

The correlation of 0.98 in sessions and alerts strongly links the two, or a correlation of -0.98 on data contained in a binary I would think would suggest it was encrypted.(

Just wanted to check; your correlation here I assume is using a standard correlation co-efficient, such as the Product-Moment Correlation Co-efficient? If so, a strong negative correlation would not necessarily imply encryption. Encryption methods typically work by introducing as much entropy (chaos) as possible into the system involved, so (unless I am getting completely the wrong end of the stick) a result of zero would suggest encryption, or simply that there is actually no correlation. A result of -0.98 would suggest that there is a strong negative correlation between the 2 entities compared. In your example of snort attacks and sessions, I'm struggling to see how a strong negative correlation would occur, and what its significance would be.

Simplifying these further, a correlation of +1 is fundamentally a graph of y=x (showing perfect co-incidence), whereas a correlation of -1 is fundamentally a graph of y=-x (showing that co-indicence definitely does not occur, in fact the complete opposite!).

Just to clarify; a -ve correlation does not mean that there is no correlation, that is what zero represents.

How you determine whether a zero result represents encryption or simply nothing of interest would in itself be an interesting topic, but one that I would expect to be beyond the scope of an honours project

Thanks for setting me straight. I guess I need more than one weekend of statistics ) As for the sessions and alerts, I found the relationship first without the correlation coefficient. The snort alerts about a password guessing attack started at the same time the sessions suddenly increased due to the password guessing attack. I proved they were related first by looking at the snort and session logs, and then went and quantified how much they were related.

Is there a problem with doing it that way and do you know of any good intro to statistics books that you can recommend?


   
ReplyQuote
Page 1 / 2
Share: