Frequently Asked Questions

The following are answers to frequently asked questions we have received that ensures everybody understands our transparent testing process:

The first cohort: Carbon Black, CrowdStrike, CounterTack, Endgame, Microsoft, RSA, SentinelOne
Rolling admissions: Cybereason, FireEye

Yes. Paid evaluations are new to MITRE. There has been significant demand for unbiased ATT&CK evaluations and we needed a formal process to open up evaluations to the security vendor market. Participating companies understand that all results will be publicly released, which is true to MITRE's mission of providing objective insight.

Vendors get a third-party evaluation of their ATT&CK detection capabilities. These evaluations are not ATT&CK certifications, nor are they a guarantee that you are protected against the adversary we are emulating (in this case APT3) because adversary behavior changes over time. The evaluations provide vendors with insight and confidence into how their capabilities map to ATT&CK techniques. Equally important, because we are publicly releasing the results, we enable their customers, and potential customers, to understand how to utilize their tools to detect ATT&CK-categorized behaviors.

ATT&CK evaluations are built on the publicly-available information captured by ATT&CK, but they are separate from the ongoing work to maintain the ATT&CK knowledge base. The team who maintains ATT&CK will continue to accept contributions from anyone in the community. The ATT&CK knowledgebase will remain free and open to everyone, and vendor participation in the evaluations has no influence on that process.

The evaluations use adversary emulation, which is a way of testing "in the style of" a specific adversary that allows us to select a relevant subset of ATT&CK techniques to test. To generate our emulation plans, we use public threat intel reporting, map it to ATT&CK, and then determine a way to replicate the behaviors. The first emulated adversary is APT3. We plan to offer new emulations approximately every six months that will complement previous evaluations. The next group we plan to emulate has yet to be announced.

The ATT&CK evaluations are based on a four-phased approach:
    1. Setup
      The vendor will install their tool on MITRE's cyber range. The tool will be deployed for detect/alert only -- preventions, protections, and responses will not be used for this phase.
    2. Evaluation
      During a joint evaluation session, MITRE adversary emulators ("red team") will execute an emulation in the style of APT3, technique-by-technique. The vendor being tested will provide the personnel who review tool output to detect each technique ("blue team"). MITRE will also provide personnel to oversee the evaluation and facilitate communication between red and blue, as well as capture results ("white team"). For purposes of the evaluation, the red team will be granted access to the host (mimicking as though they gained initial access into the environment).
    3. Feedback
      Vendors are provided an opportunity to offer feedback on the preliminary results, but the feedback does not obligate MITRE to make any modification to the results.
    4. Release
      MITRE will publicly release the evaluation methodology and results of the tool evaluations.

Results are available now. Vendors who participate in the subsequent rolling admissions will have their results released as they complete. We started with an initial cohort to maximize fairness, giving a group of vendors an equal opportunity to have their results released at the same time.

We aren't going to score, rank, or rate vendors. We are going to look at each vendor independently, evaluating their ability to detect ATT&CK techniques, and publishing our findings.

The stoplight chart, which uses red, yellow, and green to indicate level of detection, has been used since ATT&CK's creation, because it is a simple way to understand how ATT&CK is useful. While a stoplight chart may be useful to show coverage and gaps, we will not be using this visualization because it is not granular enough to convey our results. We will be testing techniques in a variety of ways (so-called procedures), and how a tool can detect each procedure may vary greatly. We are developing a new visualization to better convey the subtleties of detecting each individual technique.

While we understand the importance of minimizing false positives, they are often tied to environment noise. Without a good source of emulated noise in our testing environment, we won't address false positives directly, but rather address them indirectly in a couple of ways:
    1. Vendors are required to define how they configured their capabilities. With that provided configuration and the evaluation's results as a baseline, users can then customize detections to reduce false positives in their unique environment.
    2. Because we are not rating or scoring detections, there is no benefit to the vendors to fire an alert on every detection. Detections include alerts, as well as the existence of telemetry data that provides context to the adversary action. Given that many techniques in ATT&CK are actions sysadmins or users would perform, it may not be desirable to many end users for every technique execution to result in an alert.
We focus purely on articulating how the tool performs detection, and we'll leave it to each organization to determine how the tools operate in their specific environment.

Well, sort of. We will not make any judgments about one detection being better than another. However, we will distinguish between different types of detection and articulate those differences to include:
    • The vendor captured some data that is accessible to an end user and would allow a user to identify the red team activity, but no alert was produced.
    • The vendor detected suspicious behavior but does not provide context or details on the detection as to why.
    • The vendor detected suspicious behavior and provides an ATT&CK "technique" level description of the activity
We will focus on articulating behavioral detection, consistent with the spirit of ATT&CK's focus on detecting adversary behavior rather than indicators of compromise that detect a specific tool or artifact. That said, if suspicious activity is identified by a hash, IP address, tool name, etc., we will capture that in our detection notes.

MITRE has a long history with evaluating technology and using ATT&CK to guide evaluations is no different. Several vendors have previously been evaluated against a different APT3 emulation. This round of evaluations is substantially different to ensure the evaluation process is identical for every participant and beneficial to both vendor and final report consumers. We have refined the methodology, emulation tooling, and reporting, making this test quite different from those performed previously.