Thoughts on testing...
 
Notifications
Clear all

Thoughts on testing tools

47 Posts
8 Users
0 Reactions
5,182 Views
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

At least as long as you are not prepared to argue that flipping a coin was the appropriate method.

No problem ) .
Flipping the coin has its merits.
On average it is correct about 50% of the times, i.e. it's accuracy is predictable.
Whether the 50% accuracy (statistically), i.e. the corresponding 50% of uncertainty of measurement is acceptable according to point 5.4.6 of the norm is another thing.

BUT now imagine a different setup for coin flipping.

You are not allowed to flip the coin, but you ask me (I am in another room and you cannot see what I am doing) to flip it and tell you the result. [1]

We just introduced some degrees of uncertainty, I may flip the coin and either tell you the truth or lie on the result, and - for that matters - I could even NOT flip the coin and tell you the result using the I-Ching (which is slightly less accurate than coin flipping wink ) or even tell you "head" or "tails" just as I feel like.

Now, in order to have some sort of compliance with UNI 17025, you would need to verify what I am actually doing in the other room, and this you cannot do in the case of an automated proprietary tool.

If there are more competing tools that output the same kind of info, you might compare their results, but still if both (or all three or *all*) the tools are wrong you would not be able to validate the results.

And what if two out of three tools output "head" and the third "tails"?
Do you file a "minority report"? 😯

What happens in the case of a single tool (like the example that kacos just posted)?

Is the use (according to the vendor instructions) of a proprietary third-party tool a "non-statndard" method" (5.4.4) or (for each laboratory) a "Laboratory-developed method" (5.4.3)?

What if the vendor provides no (or not detailed enough) instructions?

You have to ignore its results since you have no way to know if they are correct and - since you are limited by UNI 17025 - you cannot even in theory use it before validation as "non-standard" or "Laboratory-developed".

More or less, in practice you will be limited to the use of the (I presume very, very few) tools (and methods) that the manufacturer will provide UNI 17025 conformity for, and even then (just like it will be the issue for 5.4.3 and 5.4.4 methods) you will need updated validation for each release of Operating System (and its updates) and for each piece of hardware.

jaclaz

[1] This is more or less when you run an automated third party tool.


   
ReplyQuote
kacos
(@kacos)
Trusted Member
Joined: 10 years ago
Posts: 93
 

ISO 17025 is about method validation and result verification. Not tool validation. I say again METHOD. As for calibration, that is mostly inapplicable to DF.

Now, in order to have some sort of compliance with UNI 17025, you would need to verify what I am actually doing in the other room, and this you cannot do in the case of an automated proprietary tool.

You forget 5.9.1c which is all about quality assurance of test and its results
"c) replicate tests or calibrations using the same or different methods"
which in the case of the example I mentioned above, all that one needs to show (I assume) is that with the same source file and same tool, another examiner can replicate the result.

and 5.10 which is about Documentation of method and results.


   
ReplyQuote
(@Anonymous 6593)
Guest
Joined: 17 years ago
Posts: 1158
 

there are cases (such as Windows Media Player Database CurrentDatabase_XXX.wmdb files) where there is only tool available, and no documentation whatsoever from either the tool provider or the source (in this case Microsoft). How would you test/verify/double check the results of such tools?

The tool operation needs to be based on something if no other reliable tool is available, it would need to be observations and conclusions drawn from those observations and perhaps also hypotheses. (A bit like the work done on the Windows prefetch/superfetch format over at forensicswiki, largely by Joachim Metz). If such material is not available (even on request from the tool maker), the research process needs to start from the beginning, and testing or verification is not clearly possible until those results are at hand.

Otherwise, it's a question of creating test data and verifying that the expected results are forthcoming – preferably using original mechanisms (in this case, WMP) for the creation. (In some cases there may be alternative mechanisms, such as NTFS file system we have one implementation from Microsoft, we have one from NTFS-3G, there seem to be one from Axis, etc. In those cases, additional work will be required).

Once it is clear that test data for a test domain can be produced in a controlled and repeatable manner (see below for example), it is possible to design a testing method, and implement it. (Sometimes this will uncover assumptions that may not hold true, and which need to be revisited and design and tests possibly updated.)

FInally, there needs to be some kind of test protocol to document the test, and some kind of evaluation criteria, though the latter depends very much on what the test actually is about, and may even depend on the individual organization.

The operation of the tool may need testing does the tool need configuration to operate correctly, is such configuration persistent or does it need to be set up each time, is there risk for misinterpretation, and are the relevant configurations documented in tool output? These considerations will influence any SOPs related to that tool.

if the tool creator has any kind of test setup, like the disktype tool that comes with a small number of test images, that may be taken into account. If there is no test setup, and the tool maker cannot or does not explain how he tests his tool before making it available, it's also something that may need to be taken into consideration.

This is rather specialized work. It could to some extent be done once for every test situation, while to another extent (tool configuration and operation) probably needs to be done for every installation, and possibly also every user. (In observational physics, the observer usually had to establish his 'personal equation' before being trusted what systematic errors did he make? colour blindness or astigmatism could be serious sources of errors.)

In this case, it seems that WMP might be possible to use to create this database, and once the basic items were known, other tools might be used to create the different files and file metadata that the database was created from.

Example Extracting NTFS time stamps from FILE_NAME attributes.

Last time I looked, these were undocumented by Microsoft. So basic research results need to be established. I find lots of claims, but very little solid foundation for those claims, so I suspect that a) a thorough literature study is required to establish if there are any well-founded claims, or b) that that research actually needs to be done, or at least repeated and properly documented. Briefly, the data needs to be established to be timestamps and not just leftover data.

Normal manipulation (by API call) seems to be impossible - or at least unknown -, so manipulation mechanism needs to be established (and verified) if further tests of extraction should be possible. (Some results seem to be known – again, they need to be tested and verified on their own.)

But once there, extraction tests are fairly simple does the tool identify extracted data correctly (i.e. does it identify the correct fields of the time stamps?), and is the extracted information 'correct' in some sense – probably by minimum error, that is, the original time stamp, and a time stamp created from the extracted representation should differ as little as possible, and preferably also be stable over the entire domain, or sub-domain of the test data.)

If there are shortcomings in the tool, additional method/process may need to be tested does a particular analyst do the right thing when faced with a situation that the tool doesn't handle correctly? (Say, like in EnCase where timestamps outside a fairly well defined range are rendered as '' (blank). Or if lab ops require errors to be less than x, but tool fails to maintain that error limit for some time ranges.)

Interpretation of the extracted timestamps is for this example another matter however, if a tool provides such interpretation, that, too, must be backed up by solid research in a similar manner.


   
ReplyQuote
(@Anonymous 6593)
Guest
Joined: 17 years ago
Posts: 1158
 

We just introduced some degrees of uncertainty, I may flip the coin and either tell you the truth or lie on the result, and - for that matters - I could even NOT flip the coin and tell you the result using the I-Ching (which is slightly less accurate than coin flipping wink ) or even tell you "head" or "tails" just as I feel like.

Which is exactly what tool testing is intended to catch. So before accepting your proposed tool (oracle in another room), I would want to test it send in data for which I already knew the expected answer, and see what the result you produced.

If it works within the error margins that are considered acceptable, the tool would probably also be acceptable.

Now, in order to have some sort of compliance with UNI 17025, you would need to verify what I am actually doing in the other room, and this you cannot do in the case of an automated proprietary tool.

I'm not sure if you regard your 'other room' operations to be a method (in which case it probably need to be part of the quality system, hence documented, and under established quality control), or as a tool (in which case the internal operations are of less importance, only that it does produce results within the acceptable error limits). Methods often ensure that tools are operated correctly.

If the tool provides a faster way to extract data, or with lower error – which is usually the case – I would use old, slow methods to validate that it works. Once I have done that, I can use the tool instead, as long as I also ensure that its error rate doesn't increase.

If the tool provides entirely new data, not available by any other method, I cannot use it with any reasonable degree of confidence.

And what if two out of three tools output "head" and the third "tails"?
Do you file a "minority report"?

Not sure what you refer to. But there will be no hiding of results all three results would be presented as part of the result, probably with a summary stating the combined confidence level.

For each tool, I have (or should have) an error estimate, based on tool testing. If that expected error is sufficiently large that a second option (or a third one) is required, then that's what I'll get. If that information is inconsistent, it's still a result but the confidence with which I conclude (whatever) will be lower.

If it gets too low, I or possibly someone else may repeat the tests once or twice.
(All tests are reported). If the result is still inconclusive, and error estimate below the limit I accept for reports, then no result will be forthcoming from this particular test.

What happens in the case of a single tool (like the example that kacos just posted)

Simple can it be validated to work within acceptable error limits ir not?

Is the use (according to the vendor instructions) of a proprietary third-party tool a "non-statndard" method" (5.4.4) or (for each laboratory) a "Laboratory-developed method" (5.4.3)?

Ask someone more into 17025 about that. I'm discussing tool testing, not methods or processes.

But as you framed your question, the answer must reflect that it can almost certainly be either or neither, depending on circumstances entirely under your control.

What if the vendor provides no (or not detailed enough) instructions?

If tests show unexpected or unacceptable variance between actual test outcome and expected outcome, the first issue to solve is almost always 'do I use the tool correctly? Are the requirements for use of the tool fulfilled? Is the power cord plugged in?' How you address that depends on circumstances – does the vendor provide training? Or any kind of technical support? (That in itself is a factor in any purchasing decision.)

If the vendor plays deaf or mute, the tool test report would probably report test failure this tool does not meet establish testing criteria. Or possibly that criteria for starting the tests were not at hand. (Added Or you may start a post=test-method-research project what settings or configurations affect the results? And a following test of those do each of these separately or together produce better results? And then go back to the original test for a repeat. That is clearly a question of economy.)

And so that tool would either not be part of any method your lab has established (which is the best), or you pass on the uncertainty provided you can quantify it . But admitting to no better confidence in results than comes from flipping a coin is probably not a good thing to do …


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

@Athulin
We are discussing two different topics, one thing is the "common sense" and usual validating procedures (that everyone currently uses or should use) and that (outside possible mistakes) have generally speaking provided valid results till now and another thing is formalizing them in such a way that they are UNI 17025 compatible.

The point is that if the examiner is honest, knowledgeable and prudent (besides accurate and responsible) he/she can provide (and has provided till now) valid results with an undetermined (but good enough) approximation, with the introduction of the UNI 17025 (which as said is largely "wrong" for the digital forensics field) you need to measure with a certain degree of accurateness this undeterminedness and this is either impossible or it takes too much time for any practical use.

The issue is exactly the same as the one with good ol' ISO 9001, it is a norm that can be applied to industry and to "standardized", repetitive production where all factors are known, it is mis-applied (and actually wrongly applied) in fields that produce few, custom or specific *whatever* i.e. to artisans, craftsmen and the like.

@kacos
Repeatability is only a part of validation.
If I give you a simple formula, for the sake of the example
x^2=4
a tool may always provide the right answer x=2, but it also may well forget that x=-2 is also a valid one.

jaclaz


   
ReplyQuote
(@Anonymous 6593)
Guest
Joined: 17 years ago
Posts: 1158
 

The point is that if the examiner is honest, knowledgeable and prudent (besides accurate and responsible) he/she can provide (and has provided till now) valid results with an undetermined (but good enough) approximation,

Valid result – so how is validity determined? 'Undetermined but good enough' is a contradiction the only way to decide 'good enough' is by fiat.

with the introduction of the UNI 17025 (which as said is largely "wrong" for the digital forensics field) you need to measure with a certain degree of accurateness this undeterminedness and this is either impossible or it takes too much time for any practical use.

Impossible? – I'd like to see some concrete examples of that.

The issue is exactly the same as the one with good ol' ISO 9001, it is a norm that can be applied to industry and to "standardized", repetitive production where all factors are known, it is mis-applied (and actually wrongly applied) in fields that produce few, custom or specific *whatever* i.e. to artisans, craftsmen and the like.

That sounds like a basic misunderstanding of ISO 9001. In an earlier life, I worked with a company that started to use ISO 9001 for software engineering projects, where practically no two jobs were alike, each special to each customer. We were still able to say 'this is how we decide your requirements, and how we get your approval that we have understood your requirements, this is how we document and communicate project progress, this is how you can influence or modify project parameters, and this is how we manage our project' as well as 'this is our quality management process, and how we improve the things we just mentioned, and this is how we measure your satisfaction what what we are doing'. Those claims were what ISO auditors checked. Not that the customer got what they had asked for, or that it was according to specifications, or fulfilled any particular requirements, or even that customer was satisfied we never made any claims to do any of that, so there was nothing for auditors to check. (Counter-intuitive, yes, and one of the major mental hurdles we had to get over the mental readjustment to go from what we thought was a product quality assurance standard to what – at that time at least, there has been updates since – was a *process* quality assurance standard took some time, and caused a lot of heat.)

if we said 'our method to decide how we design X is by flipping a coin', the auditor would have asked for proof that the coin was flipped, and what it showed. They would not have questioned the method.

After writing that, I decided to find out what 17025 actually said. As always, it's probably a question of reading and rereading more times than one thinks necessary, but based on what I learned from 9001 work …

A considerable part of 17025 is quality management, and process quality, which has very small connections to digital forensics. Not really a problem as such, except that it has to be done.

But as far as I can see from a quick read through, there is nothing that seems to require standard testing methods it seems fully possible to make those customer specific (non-standard, probably) and develop them on a customer-by-customer basis, using customer requirements to specify where uncertainty estimates are required. That at least makes them more concrete and easy to discuss, and design according to customer needs and wants.

From a 17025 perspective, then, a statement that that is how testing methods are developed and that there at present early stage no methods have yet been fully developed should probably be enough for a pass on that particular part. (Probably need to identify pre-17025, pilot jobs done with limited compliancy and post-17025 jobs, as well. So some kind of 'this is how we're going to introduce the new quality system' document would be useful.)

And until any methods have been created, testing tools used in those methods … is not an issue.

From a general perspective, any requirements in 5.6 (Measurement traceability), it seems that only areas "having a significant effect on the accuracy or validity of the result of the test" are important. Again, those can probably be left to be identified by normal operation of other parts of the quality system (improvement? corrective action?), unless there are well-known areas that are already covered in current business practice and easily can be handled.

But that's just my personal take on it.

One of the recommendations we got from the 9001 consultants we used was basically, do as little as possible prior to certification, and let remaining things develop naturally as part of the operation of the quality system. We didn't follow that advice, and so we probably spent twice as much time and effort than we really needed for the certificate. Something like this is a long-term effort, and there is probably no reason to rush it.


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

That sounds like a basic misunderstanding of ISO 9001.

And that sounds like vaguely offensive.

In an earlier life, I worked with a company that started to use ISO 9001 for software engineering projects, where practically no two jobs were alike, each special to each customer. We were still able to say 'this is how we decide your requirements, and how we get your approval that we have understood your requirements, this is how we document and communicate project progress, this is how you can influence or modify project parameters, and this is how we manage our project' as well as 'this is our quality management process, and how we improve the things we just mentioned, and this is how we measure your satisfaction what what we are doing'. Those claims were what ISO auditors checked. Not that the customer got what they had asked for, or that it was according to specifications, or fulfilled any particular requirements, or even that customer was satisfied we never made any claims to do any of that, so there was nothing for auditors to check. (Counter-intuitive, yes, and one of the major mental hurdles we had to get over the mental readjustment to go from what we thought was a product quality assurance standard to what – at that time at least, there has been updates since – was a *process* quality assurance standard took some time, and caused a lot of heat.)

Which is good, as in an earlier life, in the '90's I cooperated in the ISO 9001 certification of a large building company, and actually we were among the faster ones in being accredited under ISO 9001, a little under six months from start to actual accreditation, with a very large number of operations and different methods and procedures.

I actually claim to have got at the time some slightly more than basic understanding of ISO 9001.

The ISO 9001 is (was) much more "vague" in a number of aspects, ISO 17025 clearly follows the same base principles, but is aimed (expressly, and unlike ISO 9001, which was titled generically "Quality management systems") to a specific target, "
General requirements for the competence of testing and calibration laboratories".

Now the issue here is that parts of the norm that can clearly apply to "testing and calibration laboratories" are not suitable to a digital forensics investigation.

Namely, once set aside the few methods that fall under 5.4.2 (standard methods) the problem comes with 5.4.3 and 5.4.4 (laboratory and non-standard methods), the a-k list in 5.4.4 is daunting, but while the a-i represent more or less "common sense", and the l is doable, the k is (as I see it) a large question mark.

And the real problem comes with 5.4.5.

I am aware that it is possible to use the contents of Note 3

Validation is always a balance between costs, risks and technical possibilities. There are many cases in which the range and uncertainty of the values (eg accuracy, detection limit, selectivity, linearity, repeatability, reproducibility, robustness and cross-sensitivity) can only be given in a simplified way due to lack of information.

to provide a "simplified validation" in "selected cases", of course, but what about the rest?

With all due respect ) , this

From a 17025 perspective, then, a statement that that is how testing methods are developed and that there at present early stage no methods have yet been fully developed should probably be enough for a pass on that particular part.

sounds midway between Socrates
https://en.wikipedia.org/wiki/I_know_that_I_know_nothing
and the common practice of studying to get a good vote at school (as opposed to learn something).

jaclaz


   
ReplyQuote
Page 5 / 5
Share: