Performance and security clash yet again in “Collide+Power” attack

Another week, another BWAIN!

As you’ll know if you listened to last week’s podcast (hint, hint!), BWAIN is short for Bug With An Impressive Name:

It’s a nickname we apply when the finders of a new cybersecurity attack get so excited about their discovery that they give it a PR-friendly moniker, register a vanity domain name for it, build it a custom website, and design it a special logo.

This time, the name Collide+Power includes a pesky punctuation character, which is a plus in bug naming, but a minus when registering an internet domain. (Ironically, domain names are allowed to use -, but not +).

So, the domain name had to be abbreviated slightly to https://collidepower.com, but the website will give you an overview of the problem anyway, even with the addition sign subtracted.

Collide cached data, and measure the power required

The researchers behind this new paper are Andreas Kogler, Jonas Juffinger, Lukas Giner, Martin Schwarzl, Daniel Gruss and Stefan Mangard from Graz University in Austria, and Lukas Gerlach and Michael Schwarz of the CISPA Helmholtz Center for Information Security in Germany.

We’re not going to try to explain the various forms of this attack at any length, because the technical details of how to take the measurements, and the mathematical modelling used to make inferences from those measurements, are complex.

But the core of the problem, if you will pardon the partial pun, is that the cache memory that’s buried inside modern processor chips, intended to provide an invisible and automatic performance boost…

…isn’t always quite as invisible as you might think, and may sometimes leak some or all of its content, even to processes that shouldn’t be able to see it.

As the name suggests, cache memory (it’s pronounced cash, as in dollars and cents, not cachet, as in respect and prestige, if you’ve ever wondered), keeps special copies of data values from conventional RAM in hidden locations inside the CPU chip itself.

If the CPU keeps track of the RAM addresses (memory locations) that you’ve used recently, and can guess well enough which ones you’re likely to use again soon, it can keep them temporarily in its cache memory and thus greatly speed up your second access to those values, and the third access, the fourth, and so on.

For example, if you’re looking up a series of data values in a table to convert image pixels from one colour format to another, you might find that most of the time the lookup table tells you to visit either RAM address 0x06ABCC00 (which might be where the special code for “black pixel” is stored) or address 0x3E00A040 (which might be the location of the “transparent pixel” code).

By automatically keeping the values from those two commonly-needed memory addresses in its cache, the CPU can short-circuit (figuratively, not literally!) future attempts to access those addresses, so that there’s no need to send electrical signals outside the processor, across the motherboard, and into the actual RAM chips to read out the master copy of the data that’s stored there.

So, cached data is usually much faster to access than data in motherboard RAM.

Generally speaking, however, you don’t get to choose which cache registers get used to store which RAM addresses, and you don’t get to choose when the CPU decides to stop caching your “transparent pixel code” value and start caching another program’s “super-secret cryptograpic key” instead.

Indeed, the cache may contain a liberal mix of values, from a liberal mixture of RAM addresses, belonging to a liberal mixture of different user accounts and privilege levels, all at the same time.

For this reason, along with reasons of efficiency and performance, even admin-level programs can’t directly peek at the list of addresses currently being cached, or get at their values, to protect the cached data against external snooping.

As a programmer, you still use the machine code instruction “read out the transparent pixel code from address 0x3E00A040”, and the operating system still decides whether you’re supposed to have access to that data based on the numerical adddress 0x3E00A040, even if the data ultimately comes directly from the cache instead of from the true RAM address 0x3E00A040.

The price of a bit-flip

What the Collide+Power researchers discovered, very greatly simplified, is that although you can’t directly peek at the temporary data in cache storage, and therefore can’t sidestep the memory protection that would be applied if you went via its official RAM address…

…you can guess when specific data values are about to be written into specific cache storage registers.

And when one already-cached number is being replaced by another, you can make inferences about both values by measuring how much power the CPU uses in the process.

(Modern processors usually include special internal registers that provide power usage readings for you, so you don’t need to crack open the computer case and attach a physical probe wire somewhere on the motherboard.)

Intriguingly, the power consumption of the CPU itself, when it overwrites a cache value with a new one, depends on how many bits changed between the numbers.

If we simplify matters to individual bytes, then overwriting the binary value 0b00000000 with 0b11111111 (changing decimal 0 to decimal 255) requires flipping all the bits in the byte, which would consume the most power.

Overwriting the ASCII character A (65 in decimal) with Z (90 in decimal) means changing 0b01000001 into 0b01011010, where four bit-positions get flipped, thus consuming a middling amount of power

And if the numbers happen to be the same, no bits need flipping, which would consume the least power.

In general, if you XOR the two numbers together and count the number of 1-bits in the answer, you find the number of flips, because 0 XOR 0 = 0 and 1 XOR 1 = 0 (so zero denotes no flip), while 0 XOR 1 = 1 and 1 XOR 0 = 1 (denoting a flip).

In other words, if you can access a bunch of chosen addresses of your own in a way that primes a specific set of cache registers inside the CPU, and then monitor the power consumption accurately enough when someone else’s code gets its data assigned to those cache locations instead…

…then you can make inferences about how many bits flipped between the old cache contents and the new.

Of course, you get to choose the values stored in the addresses with which you primed the cache registers, so you don’t just know how many bits probably flipped, but you also know what the starting values of those bits were before the flips took place.

That gives you yet more statistical data with which to predict the likely new values in the cache, given that you know what was there before and the likely number of bits that are now different.

You might not be able to figure out exactly what data your victim’s process was using, but even if you can eliminate some bit patterns, you’ve just learned something that you’re not supposed to know.

And if that data were, say, an encryption key of some sort, you might be able to convert a unfeasible brute force attack into an attack where you might just succeed.

For example, if you can predict 70 bits in a 128-bit encryption key, then instead of trying out all combinations of 128 bits, which would be an impossible task, you’d need to try 2⁵⁸ different keys instead (128 – 70 = 58), which might very well be feasible.

No need to panic

Fortunately, this “vulnerability” (now dubbed CVE-2023-20583) is unlikely to be used against you any time soon.

It’s more of a theoretical matter that chip manufacturers need to take into account, on the basis of the truism that cybersecurity attacks “only ever get better and faster”, than an exploitable hole that could be used today.

In fact, the researchers admit, almost sheepishly, that “you do not need to worry.”

They really did write you in italics, and the imprecation not to worry in bold:

In the conclusion of the paper, the researchers ruefully note that some of their best real-world results with this attack, under ideal lab conditions, leaked just 5 bits an hour.

For one of their attack scenarios, in fact, they admitted that they encountered “practical limitations leading to leakage rates of more than [one] year per bit”.

Yes, you read that correctly – we checked it several time in the paper just to make sure we weren’t imagining it.

And that, of course, raises the question, “How long do you have to leave a collection of data transfer tests running before you can reliably measure transmission rates that low?”

By our calculations, one bit per year gives you about 125 bytes per millennium. At that rate, downloading the recently released three-hour blockbuster movie Oppenheimer in IMAX quality, which apparently takes up about half a terabyte, would take approximately 4 billion years. To put that bizarre factoid into perspective, Earth itself is only about 4.54 billion years old, give or take a few hundred million months.

What to do?

The simplest way to deal with CVE-2023-20538 right now is to do nothing, given that the researchers themselves have advised you not to worry.

If you feel the need to do something, both Intel and AMD processors have ways to reduce the accuracy of their power measurement tools on purpose, by adding random noise into the power readings.

This leaves your averages correct but varies individual readings sufficiently to make this already not-really-feasible attack even harder to pull off.

Intel’s power measurement mitigation is known as running average power limit (RAPL) filtering; AMD’s is referred to as performance determinism mode.

This article was originally published by Sophos.com. Read the original article here.