Home  >  Methodology  >  Round 2  >  Detection Categories

Round 2 Detection Categories


The evaluation focuses on articulating how detections occur, rather than assigning scores to vendor capabilities.

For the evaluation, we categorize each detection and capture notes about how those detections occur. We organize detections according to each technique. Techniques may have more than one detection if the capability detects the technique in different ways, and detections we observe are included in the results. While we make every effort to capture different detections, vendor capabilities may be able to detect procedures in ways that we did not capture. For a detection to be included for a given technique, it must apply to that technique specifically (i.e. just because a detection applies to one technique in a Step or Sub-Step does not mean it applies to all techniques of that Step). For proof of detection in each category, MITRE requires that the proof be provided to us, but we may not include all detection details in public results, particularly when those details are sensitive.


Round 2 detection categories described in detail below


To determine the appropriate category for a detection, we review the screenshot(s) provided, notes taken during the evaluation, results of follow-up questions to the vendor, and vendor feedback on draft results. We also independently test procedures in a separate lab environment and review open-source tool detections and forensic artifacts. This testing informs what is considered to be a detection for each technique.

After performing detection categorizations, we calibrate the categories across all vendors to look for discrepancies and ensure categories are applied consistently. The decision of what category to apply is ultimately based on human analysis and is therefore subject to discretion and biases inherent in all human analysis, although we do make efforts to hedge against these biases by structuring analysis as described above.

For Round 2, there are 13 possible detection categories. These categories are divided into two types: “Main” and “Modifier.” Each detection receives one “Main” category designation and may optionally receive one or more “Modifier” category designations.

We break the two category types down as follows:

Main Detection Types

No data available that was automatically collected, processed, and was made available within the capability related to the behavior under test. If data is available that is not directly relevant to the procedure tested, this will be categorized as “None.” In these cases, a vendor may receive a categorization of “None” with additional notes and screenshots about that data.

Examples
No data is collected related to the particular action being performed.
An alert fires at the same time that the red team performs the procedure, but it is not related to the technique under test.

Minimally processed data collected by the capability showing that event(s) occurred specific to the behavior under test. (i.e. showing the procedure/command that was executed). Evidence must show definitively that behavior occurred and be related to the execution mechanism (did happen vs may have happened). Evidence must be related to what caused the behavior. There is no evidence of complex logic or an advanced rule leading to the data output, and no labeling occurred other than simple field labeling.

Example
Command-line output is produced that shows a certain command was run on a workstation by a given username.

Data is presented from a managed security service provider (MSSP) or monitoring service based on human analysis and indication of an incident occurring. MSSP are inherently delayed due to the manual analysis necessary and will be marked as delayed to remain consistent with other delayed detections.

Example
An email was received from an analyst describing the context of actions related to data exfiltration.

Processed data specifying that malicious/abnormal event(s) occurred, with relation to the behavior under test. (i.e. cmd.exe /c copy cmd.exe sethc.exe, is abnormal/malicious activity or an identifier stating that "suspicious activity occurred". No or limited details are provided as to why the action was performed (tactic), or details for how the action was performed (technique). 

Examples
An alert describing "cmd.exe /c copy cmd.exe sethc.exe" as abnormal/malicious activity, but not stating it's related to Accessibility Features or a more specific description of what occurred.
A “Suspicious File” alert triggered upon initial execution of the executable file.
An alert stating that "suspicious activity occurred" related to an action but did not provide detail.

Processed data specifying ATT&CK Tactic or equivalent level of enrichment to the data collected by the capability. Gives the analyst information on the potential intent of the activity, or helps answer the question "why this would be done" (i.e. Persistence was set up or there was a sequence of Discovery commands).

Examples
An alert called “Malicious Discovery” is triggered on a series of discovery techniques. The alert has a score indicating the alert is likely malicious. The alert does not identify the specific type of discovery performed.
An alert describing that persistence occurred but not specifying how persistence was achieved.

Processed data specifying ATT&CK Technique or equivalent level of enrichment to the data collected by the capability. Gives the analyst information on how the action was performed, or helps answer the question "what was done" (i.e. Accessibility Features or Credential Dumping).

Examples
An alert called "Credential Dumping" is triggered with enough detail to show what process originated the behavior against lsass.exe and/or provides detail on what type of credential dumping occurred.
An alert for "Lateral Movement with Service Execution" is triggered describing what service launched and what system was targeted.

Modifier Detection Types

Data is presented as priority notification to the analyst as an indication of a suspicious or malicious event occurring for further investigation (e.g.: icon, queue, highlight, popup, etc.). Not a modifier of Telemetry.

Examples
A visual notification occurred in a dashboard and/or alert queue that "Lateral Movement" occurred.
A recognizable identifier populated to a dashboard so that an analyst recognizes that a high severity event may have occurred.

Data is presented as being descendant of events previously identified as suspicious/malicious based on an alert or high suspicion activity that was brought to the attention of an analyst. Examples of correlation evidence include annotated process trees or tags applied to chains of events showing the relationship between the suspicious/malicious event and data from the technique under test.

Examples
A process tree or chain of events is annotated showing the relationship between a net.exe process and a prior alert on "Credential Dumping".
Telemetry in a dashboard shows a relationship between a process launch for ipconfig.exe and a prior alert on an IOC to show the linage of activity.

Detection (alerts, telemetry, tactic, technique, etc.) is unavailable due to a some factor that slows or defers its presentation to the user, for example subsequent or additional processing produce a detection for the activity. The Delayed category is not applied for normal automated data ingestion and routine processing taking minimal time for data to appear to the user, nor is it applied due to range or connectivity issues that are unrelated to the capability itself. The Delayed modifier will always be applied with modifiers describing more detail about the nature of the delay.

Delayed is subdivided into:

  • Manual — Processing was triggered by human action and not initiated automatically. In the case of detections provided by a MSSP, human analysts reviewed and produced the outputs that were later presented to an analyst.
  • Processing – Detection incurred a delay based on additional data processing to apply complex logic to the events where the results were later available to an analyst.
Examples
The capability’s cloud service component uses machine learning algorithms that trigger a detection on credential dumping hours after the red team performs it. This detection would receive the Main detection category of Technique with a Modifier detection category of Delayed-Processing.
The capability sends data back to an analyst team, who manually analyze the raw data and create an alert called “Malicious Behavior Detected” that appears in the interface three hours after the red team performs the procedure. This detection will receive a Main detection category of MSSP and a Modifier detection category of Delayed-Manual.

Data is manually pulled from an endpoint via the capability for analysis. This category represents possible behavior identified through manual analysis from data that is not automatically ingested and analyzed by the capability to show an analyst that event(s) occurred specific to the behavior under test. Though useful as a capability to some security teams, capturing data through these means may be difficult and/or depend on the skill level of the analyst to derive actionable information. Host Interrogation is a modifier that will only apply to the None category. It will be marked as delayed to remain consistent with other delayed detections.

Example
There is a remote shell component to the capability that can be used to pull native OS logs from a system suspected of being compromised for further analysis.

Data, such as a binary or process memory, that requires additional analysis to determine what capabilities or behaviors may have been used. This category represents possible behavior identified through manual analysis from data that is not automatically ingested and analyzed by the capability to show an analyst that event(s) occurred specific to the behavior under test. The collected data is more a byproduct of adversary actions and less indicative of adversary behavior. Though useful as a capability to some security teams, capturing data through these means may be difficult and/or depend on the skill level of the analyst to derive actionable information. Residual Artifact is a modifier that will only apply to the None category. It will be marked as delayed to remain consistent with other delayed detections.

Examples
Process memory of svchost.exe was collected for later analysis because it was identified as a suspicious process. Later strings analysis showed that there may have been a keylogger present on the system.
PowerShell scripts are collected automatically upon execution. Later analysis of a script shows it contains the functionality to capture a screenshot of the user's desktop.

The configuration of the capability was changed since the start of the evaluation. This may be done to show additional data can be collected and/or processed. The Configuration Change modifier may be applied with additional modifiers describing the nature of the change.

Configuration Change is subdivided into:

  • UX – Change was to the user experience and not to the capability's ability to detect behavior. Changes could include display of a certain type of data that was already collected but not visible to the user.
  • Detection – Change was to the capability's ability to capture or process information that impacts its ability to detect adversary behavior. Changes could include collecting a new type of information by the sensor or new processing logic that was deployed.
Examples
Data showing account creation is collected on the backend but not displayed to the end user by default. The vendor changes a backend setting to allow Telemetry on account creation to be displayed in the user interface, so a detection of Telemetry and Configuration Change-UX would be given for the Create Account technique.
The vendor toggles a setting that would display an additional label of “Discovery” when the foo, foo1, and foo2 discovery commands are executed. A detection of Tactic and Configuration Change-Detection would be given (as opposed to a detection of Telemetry that would have been given before the change).
A rule or detection logic is created and applied retroactively or is later retested to show functionality that exists in the capability. This would be labeled with a modifier Configuration Change-Detection.

Designation applied to innovative and useful ways to detect a technique under test. Not all techniques will have or can get this designation applied to vendor solutions. It is meant to highlight accurate and robust approaches that bring value and deeper insight to consumers. This modifier will be applied at the Evaluation Team's discretion and will take into account data collected, method of detection, accuracy of detection, context provided to the end user, and display of information.

Changes from Round 1

The Indicator of Compromise (IOC) category was removed. IOCs relate to known-bad identifiers that commonly target adversary tools and infrastructure. By definition they do not relate to behaviors that are the focus of this type of evaluation. IOC-based alerts will still be used for correlation of events, but will not be valid for main detections.
The Enrichment category was removed. The idea of enrichment as a way to build up relevant information on top of raw data collected by the capability continues in how General, Tactic, and Technique were defined but it is no longer itself a main detection category.
The General Behavior category was removed. General Behaviors from Round 1 will now be categorized as Tactic detections to provide a clearer definition of the level of detail the detection provides. The General category for Round 2 is to categorize nondescript suspicious activities (an alert stating “suspicious activity occurred” without further detail) and is not the same as General Behaviors from Round 1.
The Specific Behavior category was removed. Specific Behaviors will now be categorized as Technique detections to provide a clearer definitions of the level of detail the detection provides.
The MSSP category was added. MSSP detections that are labeled separately from other detection types will provide more distinction around when an MSSP contributed to a detection than in Round 1 when they were listed only as Specific Behavior or General Behavior. The evaluation process is designed to assess vendor capabilities and not human analysts.
The Tainted modifier was changed to Correlated to provide a clearer label to how the modifier is applied to the results. In Round 1, Tainted was intended to be a useful data point, but the negative connotation of the word was confusing to users of the results.
These modifiers were added to identify capabilities that do not meet the threshold of an automated detection (only applied as a modifier to None). They intend to cover capabilities that are relatively common among endpoint detection and response tools.
The Delayed modifier was updated with two sub-types to provide more information about why an event is delayed. This will help users of the results to better assess information without needing to dig into the notes and footnotes to understand why something was delayed.
The Configuration Change modifier was updated with two sub-types to provide more information about what was changed. This will help users of the results to better assess information without needing to dig into the notes and footnotes to understand what was changed and how it may have impacted the results.
We differentiate between types of detection to provide more context around the capabilities a vendor offers in a way that allows end users to weigh, score, or rank the types of detection against their needs. This approach allows end users of the results to determine what they value most in a detection (e.g. some organizations may want telemetry, while others would want Technique detection).