Frequently Asked Questions

The following are answers to frequently asked questions we have received that ensures everybody understands our transparent testing process:

The evaluations use adversary emulation, which is a way of testing "in the style of" a specific adversary. This allows us to select a relevant subset of ATT&CK techniques to test. To generate our emulation plans, we use public threat intel reporting, map it to ATT&CK, and then determine a way to replicate the behaviors. The Round 1 emulated adversary was APT3. APT29 is the Round 2 emulation. We plan to offer new emulations in subsequent rounds to complement previous evaluations.

ATT&CK evaluations are built on the publicly-available information captured by ATT&CK, but they are separate from the ongoing work to maintain the ATT&CK knowledge base. The team who maintains ATT&CK will continue to accept contributions from anyone in the community. The ATT&CK knowledge base will remain free and open to everyone, and vendor participation in the evaluations has no influence on that process.

Evaluations are expected to begin in Summer 2019. Contracts must be executed by July 31, 2019. Exact timelines for execution and release are dependent on number of participants, but are anticipated to be released Q4 2019.

Round 1 was split between an initial cohort and subsequent rolling admissions. The first cohort results were released in a single batch when all vendors in the cohort had completed their evaluation and subsequent review process. The rolling admissions are released as they are completed.

First cohort participants:          Carbon Black, CrowdStrike, CounterTack, Endgame, Microsoft, RSA, SentinelOne
Rolling admission participants:          Cybereason, FireEye, Palo Alto Networks

Round 2 participation is independent of Round 1 participation.

We don’t limit based on market segment. Requirements include:
  • Technology must address the detection of post-compromise behaviors as described by ATT&CK
  • Protections/preventions/responses must be disabled to allow for execution of our emulation, but sensors that drive these actions can still be used as data sources to identify behavior
  • Technology must deploy into the Microsoft Azure environment
  • Sensor/data, beyond those provided by default in Azure, must be provided by the vendor
Vendor participation is subject to applicable legal restrictions, available resources, and other factors.

Email for more information.

Yes. There was significant demand for unbiased ATT&CK evaluations and MITRE needed to create a mechanism to open up evaluations to the security vendor market. Participating companies understand that all results will be publicly released, which is true to MITRE's mission of providing objective insight.

Vendors get a third-party evaluation of their ATT&CK detection capabilities. These evaluations are not ATT&CK certifications, nor are they a guarantee that you are protected against the adversary we are emulating. Adversary behavior changes over time. The evaluations provide vendors with insight and confidence into how their capabilities map to ATT&CK techniques. Equally important, because we are publicly releasing the results, we enable their customers, and potential customers, to understand how to utilize their tools to detect ATT&CK-categorized behaviors.

All vendors received a copy of the techniques to be tested and the general evaluation process overview. The initial cohort, Cybereason, and FireEye did not have access to the detailed procedures or results prior to their evaluation. Cybereason’s and FireEye’s feedback period occurred after the launch of the ATT&CK Evaluations website. Palo Alto Networks had full access to the methodology and previously released results, as described on the ATT&CK Evaluations website, prior to their evaluation.

Let us know your needs, and the current limitations of our methodology. This will help us shape our evaluation road map.

The ATT&CK evaluations are based on a four-phased approach:
    1. Setup
      The vendor installs their tool in a MITRE provided cyber range. The tool is deployed for detect/alert only -- preventions, protections, and responses are prohibited.
    2. Evaluation
      During a joint evaluation session, MITRE adversary emulators ("red team") execute an emulation in the style of an adversary group, technique-by-technique. The vendor being tested will provide the personnel who review tool output to detect each technique ("blue team"). MITRE provides the personnel to oversee the evaluation and facilitate communication between red and blue, as well as capture results ("white team").
    3. Feedback
      Vendors are provided an opportunity to offer feedback on the preliminary results, but the feedback does not obligate MITRE to make any modification to the results.
    4. Release
      MITRE publicly releases the evaluation methodology and results of the tool evaluations. For additional details refer to our methodology.

No, all vendors signing up for the evaluation agree to have their results publicly released upon conclusion of their test.

Public evaluations are the only vendor-paid evaluation provided at this time.

MITRE does not assign scores, rankings, or ratings. The evaluation results are available to the public, so other organizations may provide their own analysis and interpretation - these are not endorsed or validated by MITRE.

The stoplight chart (which uses red, yellow, and green to indicate level of confidence for detection of techniques) has been used since ATT&CK's creation because it is a simple yet powerful way to understand ATT&CK coverage. While a stoplight chart may be useful to show coverage and gaps, we do not use this visualization because it is not granular enough to convey our results.

While we understand the importance of minimizing false positives, they are often tied to environment noise. Without a good source of emulated noise in our testing environment, we don't address false positives directly, but rather address them indirectly in a number of ways:
    1. Vendors are required to define how they configured their capabilities. With that provided configuration and the evaluation's results as a baseline, users can then customize detections to reduce false positives in their unique environment.
    2. We articulate how the tool can perform detection. By releasing how to detect, as well as our methodology, organizations can implement their own tests to determine how the tools operate in their specific environment.

We do not make any judgments about one detection being better than another. We distinguish between different types of detection, and describe our rationale for doing so in Part 1 and Part 2 of a blog series.

  • Feedback. We are always looking for feedback on what works and what doesn’t in our results and methodology. Learning how you use the results and what you want to get out of them helps us shape our work to help you, and your peers.
  • Intel. We frame our evaluations in the context of the known threat to ensure our results are relevant and useful. These emulations are driven by available intel. If you share your insights, you can improve our plans.
          Provide feedback