Hello,
I was wondering how many failed login attempts you think is worth investigating. The environment is a few hundred Unix servers, no Windows servers.
Currently I'm investigating if there are 5 or more failed login attempts within the span of 10 minutes. This method spawns dozens of new investigations a day, and so far has revealed 0 malicious activities.
I'm thinking this # per time should be retooled and would like your advice.
What do you think the value should be for Web facing boxes?
What do you think the value should be for intranet only boxes?
I'm thinking more along the lines of 10 or more failed logins in one minute would be worthy to investigate.
Any other suggestions would be greatly appreciated!
Edit I realize this is a forensics forum, but I haven't really found another forum filled with info sec professionals. Also, after identifying a malicious attempt, the next step is a forensic investigation on the machines in question.
Greetings,
If the systems only have internal IP addresses, I'd look into any case where there are more than three failed logins in ten minutes. You might find the occasional script that is hitting a changed password, but that's worth solving, too.
I gave up worrying about failed logins on externally facing systems. Then again, whenever possible, I disabled any service that would allow remote login on an external IP.
-David
Thanks for your quick reply David
Could you explain the reasoning for investigating 3+ failed logins in ten minutes? Is that part of some kind of security best practices?
So far I've investigated about 100 incidents and have every single one be either a script using outdated credentials or a user error (i.e. forgot their password)
Currently I'm investigating if there are 5 or more failed login attempts within the span of 10 minutes. This method spawns dozens of new investigations a day, and so far has revealed 0 malicious activities.
Depends on circumstances how important information does the system house, what authentication methods are used, what lock-out rules are in place. A system with secret and/or business critical information should be treated differently from a file or terminal server. It also depends on the account being used is it a usual user account, or a sysadmin account or the account that creates the local PKI infrastructure?
What do you think the value should be for Web facing boxes?
What do you think the value should be for intranet only boxes?It should be what your organization needs them to be. If you have some kind of information classification system in place, start from there. If you don't, you'll need to get at least a rough idea of how sensitive a system or an account is, and set your limits accordingly. For very sensitive systems, you may have to accept that most investigations end up with user errors just because you want to make sure you catch the single event / 5 y when it is a deliberate intrusion attempt.
I'm thinking more along the lines of 10 or more failed logins in one minute would be worthy to investigate.
You haven't told us what the damage of a successful unauthorized login would be. If the damage is extremely high, you may not want to change the thresholds. If these are ordinary user accounts, then you consider your threat model before you decide, as well as the general level of noise as already noted, on an externally exposed system, be prepared for a fairly high noise level. (You still want to monitor that level, though if it suddenly increases, or goes away entirely, you want to know.
10/m seems more intended to detect automated guessing attempts – and that's important discover no matter how sensitive the system/information/account is. Don't throw away raw data, though. If someone knew you investigate 10/m, but not 9/m, it would be easy to mount a 9/m attack and not be discovered. You probably also want to detect 4/m, but while not acting on them, feed them into the next layer of detection – if there are five 4/m events in a (rolling) period of 60 minutes on the same system, it might be another kind of blip.
But somewhare around here it becomes a cost/benefit problem which only you or your organization can solve.
Where you are discovering broken scripts, surely there is value in identifying these and removing them from the equation - if you've seen 100 and say 70 of these are broken scripts, are you now seeing a reduction ? One would hope so, and until you are seeing nothing except user error / valid cracking attempts, I'd keep looking at them all.
When you're down to just those, then, on internal systems, lock the account after three or five incorrect attempts, and make the user phone the help desk to reset - you don't need to investigate these. If it's a user, then it's sorted, if it's a script ( legitimate ) then it should flag a failing, will be fixed, the password would be reset and it's sorted - if it's an attempt to break in, then they've had five guesses and failed - after that they can go no further until the password is reset - these attempts flag up very quickly as the end user will kick up a fuss and the IT department will for broken scripts. Administrative accounts on UNIX shouldn't be accessible for remote login, so this isn't an issue ( you shouldn't be able to login as root … that's what Sudo is for.)
When you get a lot of issues from users/IT - then you should investigate.
( I concur with the above about disabling web logins btw. - you'll see a lot of traffic there. )
Here is my frank approach )
When I sat at the helm and had to review 100s of these noisy events on a daily basis, I was finally successful in persuading my management to back me up on the following premise
Failed log in attempts (as well as much of the other noise) is indicative of poor system administration practices. Your system administrators are not doing their jobs properly. Etc etc.
Management loved this, the sys admins hated it. Because I then became the internal affairs for our IT group.
Rather than assuming all these events are malicious, I assumed they were simply lax sysadmin practices. And for the first 6 months, they were, 100%. After six months of email, copying managers and being reflected in monthly metrics, the situation cleaned itself up and our network environment was a lot more clean.
The signal to noise ratio dropped, and now this process has become common place. The system administrators in fact welcome these notifications because it is now one of the earliest indicators of something that has broken.
Again, it required management buy in - so sell it as a business case. Management love this kind of stuff.
Currently I'm investigating if there are 5 or more failed login attempts within the span of 10 minutes. This method spawns dozens of new investigations a day, and so far has revealed 0 malicious activities.
How the internet/intranet works and how manual/automated attacks work are totally different.
On the inside, you may want to look for N number of guesses per minute. But like mentioned above, N - 1 attempts will go unnoticed. On the outside, you have directed (fast) attacks that could come from anywhere (via proxies, botnets or other compromised hosts), as well as automated attacks that could try multiple passwords against multiple accounts over weeks.
What you need to do is to sit yourself down and create a baseline What is normal and what is not. Educate users not to try to guess passwords so that outside attacks will be easy to spot. Make a policy to restrict unnecessary remote access and Voila' - less events.
Insiders are a totally different story. Here you have to look at user habits to determine if an failed password guess was legit or not, i.e. a user working as a programmer that normally works in the timespan of 730 to 1745 will typically not access his/her account outside that time. Is the user coming back from a long vacation? Well, then expect lots of legit password attempts.
Also make use of the helpdesk is the user known to change password often because it gets locked because the user is really bad at remembering passwords or is a bad typist. Visit the users desk, can you see indicators of password-postits? Then that is the norm.
You can also use fake accounts that just lie there in the AD and see if they are accessed, if they are - it is definitly time to investigate.
Good luck.
Here you have to look at user habits to determine if an failed password guess was legit or not, i.e. a user working as a programmer that normally works in the timespan of 730 to 1745 will typically not access his/her account outside that time. Is the user coming back from a long vacation? Well, then expect lots of legit password attempts.
For the record, programmers working from 730 to 1745
- are NOT programmers (that usually work from 2012 to 423 on odd days and holidays wink )
- are NOT allowed to take vacations, let alone long ones
- if they are programmers, they usually remember their password allright
Apparent probability level of mentioned example happening in real life between 0.47% and 0.63%.
Just for the record, it could happen 😯
http//
(though "not a common case")
D
jaclaz
I'm just a student, but these blog posts might help.
World class forensics engineers are the ones who quickly and intelligently reduce millions of sessions to about a dozen worthy of deeper analysis.
What constitutes quickly? I suppose it depends on the tool being used to perform the analysis, but I’d generalize by saying no more than a couple minutes and/or the same number of clicks. We’ll see this in a moment.
What constitutes intelligently? We can answer this question by looking at a host-based forensics analogy. Suppose you were given a hard disk of a compromised machine and you needed to find the malware. There could be millions of files on the computer, so where do you start? Most of the time, especially for most standard compromises, the following steps will work (this is an over-generalization, but one that works nonetheless)
1. Show only PE files (exe, dll, etc..). At this point you’ve probably gone from nearly a million to about 100,000.
2. Show only PE files outside the Program Files directory. Here you may go from about a hundred thousand files to tens of thousands.
3. Depending on the assumed time of compromise, show only those PE files modified or created in a specific range of days. At this point you should go from tens of thousands to less than 100.
4. Since malware tends to be smaller in size, show only those PE files less than 500k. At this point you should be looking at only a handful of files, and most of the time, the malware you’re looking for will be one of them.In the above steps, you found malware NOT by looking for known traits of malware. You did it by examining general characteristics about file traits. In other words, by examining characteristics external to the file, not by searching for signatures or other characteristics internal to the file. Typically, each of those traits by themselves are completely uninteresting until they are combined with other “uninteresting” traits, making them very interesting when layered together.
Source
Pre-Summary
Even though a large amount of web traffic is coming into your organization gzip compressed, making most inline/real-time security products totally “blind” to what’s inside, we can use standard forensic principals to identify which of those sessions are worth examination. In this case, we combined to following traits to reduce 50,000 network sessions to a single one
1. Gzip’ed web content
2. Suspicious country
3. Uncommon webserver applicationOnce we drilled into that single session, we saw how trivial it was to use NetWitness to automatically decompress and content, extract it, then validate it as “bad.”
Source
Aside from just the number of login attempts per minute, are you seeing any other traits that can be used to separate the legitimate from malicious? For example
1. Failed passwords might be normal, but what about taking into account invalid user attempts?
2. Can you narrow it down by focusing on systems that also have failed logins on multiple hosts.
3. Any normal/suspicious SSH client banners if you have that data?
4. Can you narrow it down by the time of day or between failed login attempts?
5. Signs of recon like "sshd[5648] Did not receive identification string from 111.222.333.444."
6. Can you focus on investigating possible incidents involving more sensitive servers to be more productive?
7. Do the admin scripts stop after a certain number of failed logins? Can they be configured to stop and alert the admin of the problem after 2 failed attempts?
8. Is there a certain admin(s) who is creating these scripts and is the source of the problem?
etc.
Thank you kindly for the many thoughtful and insightful responses. It will take me some time to digest all of the information provided so far. A few things I can add about my scenario to clarify… We have over 40000 employees, the failed login attempts I am seeing are not in an IDS system, they are coming from a log aggregation software that spits out Failed Login reports daily. My influence isn't over the content of the reports, all I can do is advise on what we should be looking for in the reports. So my ability to correlate these events with other factors is quite limited. I will go through all the suggestions again with one of my senior coworkers to see what we can apply. Thanks again!