Methodology Overview: Adversary Emulation
While we aspire to test across the entirety of ATT&CK, the number of actions required to test all techniques and the many possible variations in implementation (i.e. procedures) makes testing across the entirety of ATT&CK impractical. Additionally, certain techniques are complex and are not implementable in a lab environment. Since we need a way to select a subset of techniques to define test criteria, as well as chain activity together, we choose to focus on techniques used by a known threat group, which we refer to as adversary emulation, for our evaluations.
Adversary emulation uses techniques that have been publicly attributed to an adversary and then chains the techniques together into a logical series of actions that are inspired by how the adversary has acted in the past. To generate our emulation plans, we identify public threat intelligence reporting, map techniques in the reporting to ATT&CK, chain together the techniques, and then determine a way to replicate the behaviors.
When emulating an adversary, several factors differentiate the emulation from being a direct copy of the adversary’s actions or a full replay of an actual intrusion. First, the red team tasked with emulating the adversary generally does not use the actual adversary tools; instead, they attempt to emulate the techniques as closely as possible using publicly-available tools. To get “as close as possible,” the emulators analyze threat intel reports and malware reverse engineering reports to understand what the adversary (or specific pieces of the adversary’s malware) did at the lowest level. The emulators then map those observed functions to the closest analogous function of the publicly available tool, which may cause slight differences in functionality or implementation method.
Another limitation of adversary emulation is that emulators rely on publicly-available threat reporting. Not all adversary activity is covered in public reporting, so emulations will only cover a portion of all adversary activity. Additionally, threat reporting covers past adversary activity due to delays in the reporting cycle and availability of information. As a result, both the adversary and its intrusion environment may change in the time after the activity occurs. An adversary’s techniques could be different from those that were previously reported, and emulators may have to use different techniques. For example, a new Windows patch may prevent a specific User Account Control (UAC) bypass technique. When faced with such a situation, the emulators may use another or newer UAC bypass technique that is available. We recognize that when emulating an adversary, we will only be able to mimic their historical behavior.
To perform the evaluation using adversary emulation, we chain techniques together in a logical flow. For example, an adversary must discover that a host exists before they can move laterally to that host. Pacing for how quickly we execute ATT&CK techniques is also important for our evaluations. The techniques must be separable so we can identify distinct detections in vendor capabilities. Therefore, we have organized techniques into what we call “Steps.” We recognize that adversaries often do not execute atomic actions, so this is another limitation to the emulation of real adversary behavior. We also recognize that vendors build capabilities to address the real threats, which might include specific patterns or time constraints. As a result, we have sought to strike a balance between the need to perform separate detections and the need to maintain some realism with what an adversary would do. In some steps, we execute procedures separately and atomically, while we execute others in rapid sequence. An example of steps we have paired together is a series of discovery commands we have run together to emulate when the adversary first accesses a system.
Another significant difference between our evaluations and the “real world” is that we do not have any “user noise” in our lab environment. Our activity was concentrated into a two- to three-day event in an environment with no real or emulated noise. As a result, our activity is synthetically/unrealistically “loud.” Therefore, we encourage users of our results to perform additional testing in their own environments, which will have the noise necessary to determine if detections are valuable in your environment.
To remain consistent with Enterprise ATT&CK’s approach to focus on adversary activity across their lifecycle, for our evaluations, we focus across an entire emulated intrusion. While it may be true that initial activity might have been prevented, detected, or remediated (either because of an initial alert or the volume of activity), it is important to know what “defense in depth” capabilities have in case an adversary is able to circumvent initial defenses.