The evaluation focuses on articulating how detections occur, rather than assigning scores to vendor capabilities.
For the evaluation, we categorize each detection and capture notes about how those detections occur. We organize detections according to each technique. Techniques may have more than one detection if the capability detects the technique in different ways, and detections we observe are included in the results. While we make every effort to capture different detections, vendor capabilities may be able to detect procedures in ways that we did not capture. For a detection to be included for a given technique, it must apply to that technique specifically (i.e. just because a detection applies to one technique in a Step or Sub-Step does not mean it applies to all techniques of that Step). For proof of detection in each category, MITRE requires that the proof be provided to us, but we may not include all detection details in public results, particularly when those details are sensitive.
To determine the appropriate category for a detection,we review the screenshot(s) provided, notes taken during the evaluation, results of follow-up questions to the vendor, and vendor feedback on draft results. We also independently test procedures in a separate lab environment and review open-source tool detections and forensic artifacts. This testing informs what is considered to be a detection for each technique.
After performing detection categorizations, we calibrate the categories across all vendors to look for discrepancies and ensure categories are applied consistently. The decision of what category to apply is ultimately based on human analysis and is therefore subject to discretion and biases inherent in all human analysis, although we do make efforts to hedge against these biases by structuring analysis as described above.
There are nine possible detection categories. These categories are divided into two types: “Main” and “Modifier.” Each detection receives one “Main” category designation and may optionally receive one or more “Modifier” category designations.
We break the two category types down as follows:
Main Detection TypesNone
The vendor is unable to detect red activity due to capability limitations or other reasons. If data is available that is not directly relevant to the procedure tested, this will be categorized as “None.” In these cases, a vendor may receive a categorization of “None” with additional notes and screenshots about that data.Detection constraints: None is a Main detection type.
|No data is collected related to the particular action being performed.|
|An alert fires at the same time that the red team performs the procedure, but it is not related to the technique under test.|
|The tool allows a user to run forensic analysis of a file. This forensic analysis shows the file’s capabilities but does not directly show that a procedure was performed.|
The capability produces some minimally processed data that is accessible to an end user and directly indicates that the red team activity occurred after the user performs human analysis. There is no evidence of complex logic or an advanced rule leading to the data output, and no labeling occurs other than simple field labeling. The detection needs to be demonstrably and logically related to the actual procedure performed. Proof of detection could include the view, query, or API search used to access the data and/or the detection output (e.g., table view or process tree).Detection Constraints: Telemetry is a Main detection type. The Telemetry category is only applied once per procedure, though there may be multiple ways of using different Telemetry to find the events in question.
|Command-line output is produced that shows a certain command was run on a workstation by a given username.|
|The capability shows all the potential behaviors a malicious file could perform but does not indicate which behaviors actually occur. This is not a Telemetry detection because the detection is not demonstrably related to the procedure being performed.|
The vendor identifies the red team activity based on known hashes, IP addresses, C2 domain, tool names, tool strings, or module names. Proof of detection could include the rule name, API/query used to access the data, and/or detection output.Detection constraints: Indicator of Compromise is a Main detection type.
|The red team C2 IP address is identified as malicious.|
The capability captures data (usually data as described above in the “Telemetry Available” category) and then enriches it with additional information such as a rule name, labels, tags, or ATT&CK tactics or techniques that would assist in a user’s analysis of the data beyond what would have been originally presented. The enrichment must be demonstrably and logically related to the ATT&CK technique under test. There is no evidence of complex logic or an advanced rule leading to the data output beyond a simple “if X, tag with Y” condition. Proof of detection could include the view, query, or API search used to access the data and/or the detection output (e.g., table view or process tree).Detection constraints: Enrichment is a Main detection type.
|A simple rule looking for any execution of foo.exe produces an alert called “Foo Discovery Occurred” with the supporting data “cmd.exe foo”.|
|Data showing “cmd.exe foo” that is tagged with the information “Reconnaissance observed.”|
|Data showing “cmd.exe foo” that is tagged with the information “ATT&CK Technique T9999 Foo Discovery.”|
|Data showing “Process: cmd.exe foo” that is tagged with “Process created” counts as Telemetry rather than Enrichment because “Process created” would already be apparent to an analyst and is not related to the technique under test.|
The capability produces an alert detection for suspicious or potentially malicious behavior based on some type of reported complex logic or rule (beyond a simple “if X, display Y Rule Name,” which would be categorized as Enrichment). The alert must be demonstrably and logically related to the technique under test, and the capability must have a visual designation indicating this is an alert (e.g. an icon indicating an alert, a notification, an impact score, or a similar visual representation). This detection may provide an ATT&CK “tactic”-level description (e.g. Discovery, Execution, etc.) and/or a general description indicating the behavior is anomalous but does not provide specific details on the procedure detected (i.e. a “black box”). (An ATT&CK reference is not necessary for detection to be categorized in this way, though would be noted if included.) Proof of detection could include the rule name or module that performs the detection as well as detection output.Detection constraints: General Behavior is a Main detection type.
|An alert called “Malicious Discovery” is triggered on a series of discovery techniques. The alert has a score indicating the alert is likely malicious. The alert shows does not identify the specific type of discovery performed.|
|A “Suspicious File” alert triggered upon initial execution of the executable file.|
The capability detects suspicious behavior based on some complex rule or logic and provides an ATT&CK “technique”-level description of the activity (beyond a simple “if X, display Y Rule Name,” which would be categorized as Enrichment). (An ATT&CK reference is not necessary for detection to be categorized in this way, though would be noted if included.) The detection includes additional language and/or explanation to provide more detail beyond a general designation that the behavior is malicious. The alert must be demonstrably and logically related to the = technique under test, and the capability must have a visual designation indicating this is an alert (e.g. an icon indicating an alert, a notification, an impact score, or a similar visual representation). Proof of detection could include rule name or module that performs the detection as well as detection output.Detection constraints: Specific Behavior is a Main detection type.
|The capability produces an alert named “UAC Bypass.” The alert contains details showing the capability detects a change in process integrity levels over a sequence of related events.|
|The capability produces an alert named “Credential Dumping” for Mimikatz logonpasswords credential dump.|
Modifier Detection TypesDelayed
The capability does not detect the activity in real-time or near-real-time when the red team executes the action, but subsequent alerts, data, enrichment, or additional processing produce a detection for the activity. The Delayed category is not applied to data based solely on the time it takes for regular detections to appear in the capability, nor is it applied due to range or connectivity issues that are unrelated to the capability itself. Proof of detection must include explanation for the delayed detection and proof that could include other alerts, narrative, queries used to obtain the detection, and/or the detection output.Detection constraints: Delayed is a Modifier detection type that is applied to a Main detection.
|The capability’s cloud service component uses machine learning algorithms that trigger a detection on credential dumping hours after the red team performs it. This detection would receive the Main detection category of Specific Behavior with a Modifier detection category of Delayed.|
|The capability sends data back to an analyst team, who manually analyze the raw data and create an alert called “Malicious Behavior Detected” that appears in the interface three hours after the red team performs the procedure. This detection will receive a Main detection category of General Behavior and a Modifier detection category of Delayed.|
The capability detects the activity based on previously identified suspicious/malicious behavior that is related to or “tainted by” the detection. An example could include previously identifying a process as malicious and marking subsequent events related to that process as malicious, either by direct action/result or other relationship. Proof of detection must include clear visual evidence indicating that the detection uses tainted propagation, including what other activity has led to this detection.Detection constraints: Tainted is applied as a Modifier detection type that is applied to a Main detection.
|The capability produces a General Behavior alert for “Malicious Process Detected,” and then shows in a tree view that cmd.exe ran ipconfig. The ipconfig Telemetry detection is shown to be tainted due to a line going from the parent General Behavior detection to the child Telemetry detection, so the Telemetry detection would be categorized as both Telemetry and Tainted.|
A detection is made possible by a special configuration change or additional API access that allows data not normally accessible to the end user to become available. This category is applied to detections that are produced because of changes made to the initial configuration (as described by the vendor at the start of the evaluation). Proof of detection should include the output of the detection that results from the change along with notes about the reason for the change, how it is changed, and how an end user could request it.Detection constraints: Configuration Change is a Modifier detection type that is applied to a Main detection.
|Data showing account creation is collected on the backend but not displayed to the end user by default. The vendor changes a backend setting to allow Telemetry on account creation to be displayed in the user interface, so a detection of Telemetry and Configuration Change would be given for the Create Account technique.|
|The vendor toggles a setting that would display an additional label of “Discovery” when the foo, foo1, and foo2 discovery commands are executed. A detection of Enrichment and Configuration Change would be given (as opposed to a detection of Telemetry that would have been given before the change).|
|A rule or detection logic is created and applied retroactively or is later retested to show functionality that exists in the capability.|
We differentiate between types of detection, but we do not weigh, score, or rank the types of detection. This approach allows end users of the results to determine what they value most in a detection (e.g. some organizations may want telemetry, while others would want specific behavioral detection).