When we begin an evaluation, the process starts with a two-week setup period. We provide the vendor with access to a unique Microsoft Azure environment, configured identically across all vendors. The vendor installs and configures their tool in detect/alert-only mode – preventions, protections, and responses are out of scope for the evaluation. Next comes the hands-on portion of the evaluation, which is a three-day process. As a general guideline, Day One is used to exercise our Cobalt Strike scenario. Day Two is designated for exercising the PowerShell Empire scenario. Day Three is used as an overflow day and to retest steps as needed. During the hands-on portion, MITRE and the vendor stay in open communication, either via a telecon or in person. We announce the techniques and procedures as they are executed, and the vendor shows us their detections and describes their process so that we can verify the detection. We take screenshots to provide proof of detection.
After the hands-on portion, MITRE processes the results. We apply detection categories and summarize the detections into short notes. We select screenshots to support the detection notes. We consider each vendor independently based on their capabilities, but we calibrate across vendors to ensure consistent application of detection categories. We provide our initial results to the vendor for a two-week review period. The vendor provides feedback for MITRE to consider, though we are not obligated to make any further modifications based on the feedback received. When reviewing the vendor’s feedback, we again consider how we apply detection categories across the entirety of the vendor’s evaluation as well as the other vendors’ results to ensure that we are making consistent and fair decisions. Once results are finalized, we publish those results to our ATT&CK Evaluations website.