Initial Thoughts From Round 2 of MITRE’s Enterprise ATT&CK Evaluation

What an amazing year it’s been for the ATT&CK Evals team, going from an initial cohort of seven vendors in round 1 to twenty-one vendors for round 2. The industry adoption of this evaluation has been nothing short of amazing. I’m pleased to once again contribute my thoughts and analysis on the outputs of this evaluation to help clients make the right vendor selections based on the information available.

As I did last year, I’ve released source code on GitHub which breaks out some key metrics for understanding the strengths of these products as well as generating an Excel workbook to make it easier to parse out results. Admittedly, it does feel like getting in the dunk tank by releasing code, knowing it’s going to be poured over by the same engineers whose products I evaluated in a Wave a few months ago. 🙂

Here are a few of my initial thoughts from the evaluation:

The evaluation contains a mini-game of whack-a-mole, but you should ignore all the MSSP stuff.

One of the first things that stood out to me in this evaluation was that some vendors thought this would be a good opportunity to show how good their MSSP offerings are by demonstrating a human detection for everything. The addition of the MSSP detection type was a response to some vendors leveraging hunt teams in the previous evaluation to help provide differentiation between the human and the technology. I’m totally in favor of this, mostly because it exposed this silliness.

Having your MSSP or IR investigators look at a small, clean environment and tell you everything bad they find is closer to the certification they probably took to get the position than a reflection of any reality your organization is going to present. In looking at these results, it’s more of an embarrassment to see these vendors trying so hard and not detecting everything than any victory they are going to get by talking about in their blogs. My advice is to just ignore this noise.

Sometimes the numbers don’t add up to what you expect.

This is a bit of a response to some of the questions I’ve gotten about my code in trying to figure out why the number of “None” detections reported by MITRE doesn’t add up to the number of “None” detections I’m scoring. The reasoning here is that “None” is a default condition for not having a detection, but in some cases this isn’t explicitly called out — such as if there’s only an MSSP detection, or where all the detections included configuration changes (which I ignore). In both these cases, the default condition rises to the top, and there is a “None” you wouldn’t have found by counting the published results.

Why do you hate configuration changes?

These evaluations need to be for the buyers and, therefore, reflect the buyers’ environments. I’m assuming these products are tuned going into the evaluation. You can’t get around that, but if it then requires further tuning in the middle of the evaluation — that’s taking it farther and farther away from what customer experience is going to be. I don’t forsake this datapoint altogether, as I calculate and expose this metric in a property I call ‘dfir’ for digital forensics and incident response (DFIR) because the absolute visibility afforded by these configurations may be interesting to this buyer persona.

Are you looking at X to decide whether or not to give a vendor credit for this step?

It’s not my job to second guess the results that MITRE published. I’ve booked research calls with all the participants to discuss the results and process, but the evaluation happened months ago, and I’m not certainly not making my own judgements on scores based on limited information compared to what MITRE has already reviewed and made judgements on.

So who won?

The end user.

Seriously, one of my biggest regrets from the analysis I performed on round 1 was releasing a “simple score” that’s still being used to demonstrate who has the best product. One truth about this industry is every product has a vision and capability that’s designed with a buyer persona in mind. My goal in any of the research I do is to help buyers figure out which product aligns and delivers best for their needs.

Source link