APT29 Evaluation: Detection Categories
The evaluation focuses on articulating how detections occur, rather than assigning scores to vendor capabilities.
For the evaluation, we categorize each detection and capture notes about how those detections occur. We organize detections according to each technique. Techniques may have more than one detection if the capability detects the technique in different ways, and detections we observe are included in the results. While we make every effort to capture different detections, vendor capabilities may be able to detect procedures in ways that we did not capture. For a detection to be included for a given technique, it must apply to that technique specifically (i.e. just because a detection applies to one technique in a Step or Sub-Step does not mean it applies to all techniques of that Step). For proof of detection in each category, we require that the proof be provided to us, but we may not include all detection details in public results, particularly when those details are sensitive.
To determine the appropriate category for a detection, we review the screenshot(s) provided, notes taken during the evaluation, results of follow-up questions to the vendor, and vendor feedback on draft results. We also independently test procedures in a separate lab environment and review open-source tool detections and forensic artifacts. This testing informs what is considered to be a detection for each technique.
After performing detection categorizations, we calibrate the categories across all vendors to look for discrepancies and ensure categories are applied consistently. The decision of what category to apply is ultimately based on human analysis and is therefore subject to discretion and biases inherent in all human analysis, although we do make efforts to hedge against these biases by structuring analysis as described above.
For the APT29 evaluation, there are 13 possible detection categories. These categories are divided into two types: “Main” and “Modifier.” Each detection receives one “Main” category designation and may optionally receive one or more “Modifier” category designations.
We break the two category types down as follows:
Main Detection Types
No data available that was automatically collected, processed, and was made available within the capability related to the behavior under test. If data is available that is not directly relevant to the procedure tested, this will be categorized as “None.” In these cases, a vendor may receive a categorization of “None” with additional notes and screenshots about that data.
|No data is collected related to the particular action being performed.|
|An alert fires at the same time that the red team performs the procedure, but it is not related to the technique under test.|
Minimally processed data collected by the capability showing that event(s) occurred specific to the behavior under test. (i.e. showing the procedure/command that was executed). Evidence must show definitively that behavior occurred and be related to the execution mechanism (did happen vs may have happened). Evidence must be related to what caused the behavior. There is no evidence of complex logic or an advanced rule leading to the data output, and no labeling occurred other than simple field labeling.
|Command-line output is produced that shows a certain command was run on a workstation by a given username.|
Data is presented from a managed security service provider (MSSP) or monitoring service based on human analysis and indication of an incident occurring. MSSP are inherently delayed due to the manual analysis necessary and will be marked as delayed to remain consistent with other delayed detections.
|An email was received from an analyst describing the context of actions related to data exfiltration.|
Processed data specifying that malicious/abnormal event(s) occurred, with relation to the behavior under test. (i.e. cmd.exe /c copy cmd.exe sethc.exe, is abnormal/malicious activity or an identifier stating that "suspicious activity occurred". No or limited details are provided as to why the action was performed (tactic), or details for how the action was performed (technique).
|An alert describing "cmd.exe /c copy cmd.exe sethc.exe" as abnormal/malicious activity, but not stating it's related to Accessibility Features or a more specific description of what occurred.|
|A “Suspicious File” alert triggered upon initial execution of the executable file.|
|An alert stating that "suspicious activity occurred" related to an action but did not provide detail.|
Processed data specifying ATT&CK Tactic or equivalent level of enrichment to the data collected by the capability. Gives the analyst information on the potential intent of the activity, or helps answer the question "why this would be done" (i.e. Persistence was set up or there was a sequence of Discovery commands).
|An alert called “Malicious Discovery” is triggered on a series of discovery techniques. The alert has a score indicating the alert is likely malicious. The alert does not identify the specific type of discovery performed.|
|An alert describing that persistence occurred but not specifying how persistence was achieved.|
Processed data specifying ATT&CK Technique or equivalent level of enrichment to the data collected by the capability. Gives the analyst information on how the action was performed, or helps answer the question "what was done" (i.e. Accessibility Features or Credential Dumping).
|An alert called "Credential Dumping" is triggered with enough detail to show what process originated the behavior against lsass.exe and/or provides detail on what type of credential dumping occurred.|
|An alert for "Lateral Movement with Service Execution" is triggered describing what service launched and what system was targeted.|
Modifier Detection Types
Data is presented as priority notification to the analyst as an indication of a suspicious or malicious event occurring for further investigation (e.g.: icon, queue, highlight, popup, etc.). Not a modifier of Telemetry.
|A visual notification occurred in a dashboard and/or alert queue that "Lateral Movement" occurred.|
|A recognizable identifier populated to a dashboard so that an analyst recognizes that a high severity event may have occurred.|
Detection (alerts, telemetry, tactic, technique, etc.) is unavailable due to some factor that slows or defers its presentation to the user, for example subsequent or additional processing produce a detection for the activity. The Delayed category is not applied for normal automated data ingestion and routine processing taking minimal time for data to appear to the user, nor is it applied due to range or connectivity issues that are unrelated to the capability itself. The Delayed modifier will always be applied with modifiers describing more detail about the nature of the delay.
Delayed is subdivided into:
- Manual — Processing was triggered by human action and not initiated automatically. In the case of detections provided by a MSSP, human analysts reviewed and produced the outputs that were later presented to an analyst.
- Processing – Detection incurred a delay based on additional data processing to apply complex logic to the events where the results were later available to an analyst.
|The capability’s cloud service component uses machine learning algorithms that trigger a detection on credential dumping hours after the red team performs it. This detection would receive the Main detection category of Technique with a Modifier detection category of Delayed-Processing.|
|The capability sends data back to an analyst team, who manually analyze the raw data and create an alert called “Malicious Behavior Detected” that appears in the interface three hours after the red team performs the procedure. This detection will receive a Main detection category of MSSP and a Modifier detection category of Delayed-Manual.|
Data is manually pulled from an endpoint via the capability for analysis. This category represents possible behavior identified through manual analysis from data that is not automatically ingested and analyzed by the capability to show an analyst that event(s) occurred specific to the behavior under test. Though useful as a capability to some security teams, capturing data through these means may be difficult and/or depend on the skill level of the analyst to derive actionable information. Host Interrogation is a modifier that will only apply to the None category. It will be marked as delayed to remain consistent with other delayed detections.
|There is a remote shell component to the capability that can be used to pull native OS logs from a system suspected of being compromised for further analysis.|
Data, such as a binary or process memory, that requires additional analysis to determine what capabilities or behaviors may have been used. This category represents possible behavior identified through manual analysis from data that is not automatically ingested and analyzed by the capability to show an analyst that event(s) occurred specific to the behavior under test. The collected data is more a byproduct of adversary actions and less indicative of adversary behavior. Though useful as a capability to some security teams, capturing data through these means may be difficult and/or depend on the skill level of the analyst to derive actionable information. Residual Artifact is a modifier that will only apply to the None category. It will be marked as delayed to remain consistent with other delayed detections.
|Process memory of svchost.exe was collected for later analysis because it was identified as a suspicious process. Later strings analysis showed that there may have been a keylogger present on the system.|
|PowerShell scripts are collected automatically upon execution. Later analysis of a script shows it contains the functionality to capture a screenshot of the user's desktop.|
The configuration of the capability was changed since the start of the evaluation. This may be done to show additional data can be collected and/or processed. The Configuration Change modifier may be applied with additional modifiers describing the nature of the change.
Configuration Change is subdivided into:
- UX – Change was to the user experience and not to the capability's ability to detect behavior. Changes could include display of a certain type of data that was already collected but not visible to the user.
- Detection – Change was to the capability's ability to capture or process information that impacts its ability to detect adversary behavior. Changes could include collecting a new type of information by the sensor or new processing logic that was deployed.
|Data showing account creation is collected on the backend but not displayed to the end user by default. The vendor changes a backend setting to allow Telemetry on account creation to be displayed in the user interface, so a detection of Telemetry and Configuration Change-UX would be given for the Create Account technique.|
|The vendor toggles a setting that would display an additional label of “Discovery” when the foo, foo1, and foo2 discovery commands are executed. A detection of Tactic and Configuration Change-Detection would be given (as opposed to a detection of Telemetry that would have been given before the change).|
|A rule or detection logic is created and applied retroactively or is later retested to show functionality that exists in the capability. This would be labeled with a modifier Configuration Change-Detection.|
Designation applied to innovative and useful ways to detect a technique under test. Not all techniques will have or can get this designation applied to vendor solutions. It is meant to highlight accurate and robust approaches that bring value and deeper insight to consumers. This modifier will be applied at the Evaluation Team's discretion and will take into account data collected, method of detection, accuracy of detection, context provided to the end user, and display of information.