zheyuan.sh

Brief notes on gofetch

gofetch exploit relies on a data memory dependent prefetcher (DMP) present in M-series apple silicon.

  1. DMP prefetches data pointed to by pointer-like values that are filled to L1 cache. Both DRAM-to-L1 and L2-to-L1 fills of pointer-like values trigger DMP activation.
  2. There is do-not-scan hint on L1 cacheline placed on the cacheline containing the pointer after a DMP activation. DMP would not re-activate on this cacheline until this hint is cleared. Evicting the pointer from L1 and L2 cacheline clears the hint.
  3. There are other heuristics like history filter to determine prefetching.
  4. Pointer like values are scanned in 64 bytes chunks.

Some Background

  1. Normally, in modern CPUs, a particular address is mapped to one set of cachelines. When some data is filled into the cacheline, which cacheline it is filled into is dependent on the address of the data. Many addresses are mapped to one set of cachelines.
  2. Existing data would evicted when new data would need to be filled, and the cachelines are full. This is a conflict eviction.
  3. An eviction set (EV) is a set of data which we can use to evict all existing data in a set of cachelines fully.
  4. In current M series chips, L2 cache is inclusive. Such that L2 cache is a superset of L1 and eviction of data from shared cache would also eviction L1 cache. (see this paper for more)
  5. Prime-and-probe is a general attack where one can fill a set of cachelines with EV (prime step) and after victim process runs, access the data in EV, checking if any of those access became slower (probe step). Slower access meant that a victim's data have displaced one or more of the EV data, triggering a L2 cache miss. Inclusive cache policy is a pre-requisite to the attack.

The setup

  1. Victim process runs a constant time swap function ct_swap. It accepts two arrays of uint64_t from attacker while a secret is hidden away from attacker. Attacker's goal is to obtain this secret (0 or 1).
void ct_swap(bool secret, uint64_t *a, uint64_t *b, size_t len) {
    uint64_t delta
    uint64_t mask = ~(secret-1);
    for (size_t i = 0; i < len; i++) {
         delta = (a[i] ^ b[i]) & mask;
         a[i] = a[i] ^ delta;
         b[i] = b[i] ^ delta;
    }
}
  1. Attacker process runs locally, sharing the L2 cache with the victim process (but not the L1 cache).

How it actually works

Attack

Attacker process first dynamically find the EVptr and EVa. With the two EV, attacker can then perform the side channel attack.

Lets start with the attack itself, assuming attacker have already found the two EV. The attacker performs the following steps:

  1. Triggers ct_swap with b filled with an arbitrary pointer ptr. a can be 0s.
  2. Immediately after, prime the cachelines with EVptr.
  3. At the same time as step 1 and 2, attacker continuously loop access over an EVa.
  4. Finally if attacker probes the the EVptr to figure out secret. Slower access to any of the EVptr means secret is 1, otherwise it is 0.

Why attack works

To see why this works let's consider the shared L2 cache at step 1 to 3.

  1. Initially cache contains:

    • a's content
    • b's content
    • ptr's content (via DMP)
  2. Eviction of ptr's content happens, cache contains:

    • a's content
    • b's content
    • EVptr
  3. When looping, for a brief moment, cache contains:

    • EVa
    • b's content
    • EVptr

    Then, while looping of EVa and as ct_swap is being executed, a's content is reloaded in. L2 cache contains:

    • Either
      • ptr's content (via DMP as ptr value is being refilled to L1) when the secret is 1 and swap happened
      • or solely EVptr if secret is 0 and swap didn't happen

Preparation

Gathering the EVptr and EVa is similar.

We claim that EVptr and EVa are both correct if and only if we perform the below steps and find that ptr is indeed in the share L2 cache.

Thus we can first gather all possible EV sets and find the correct two sets by iterating over the possibilities.

  1. Triggers ct_swap with a and b filled with an arbitrary pointer ptr. (eg. a = {ptr, ptr, ptr, ptr} and b = {ptr, ptr, ptr, ptr}, len = 4).
  2. Immediately after, prime the cachelines with EVptr.
  3. At the same time as step 1 and 2, attacker continuously loop access over an EVa (which evicts a's content).
  4. Finally if attacker probes the EVptr to see if ptr's content is in the L2 cache. If the two sets are correct, ptr would be in the L2 cache.

Why preparation works

Again, let's consider the caches.

  1. At first,

    Victim L1 cache contains:

    • a's content, (which is the pointer-like ptr)

    Shared L2 cache contains:

    • a's content
    • ptr's content (via DMP)
  2. After priming on EVptr

    Victim L1 cache contains:

    • a's content, (which is the pointer-like ptr)

    Shared L2 cache contains:

    • a's content
    • EVptr (evicting ptr's content)
  3. While evicting with EVa, for a brief moment,

    Victim L1 cache contains nothing as a's content is evicted by EVa.
    Shared L2 cache contains:

    • EVptr (evicting ptr's content)

    While looping, as ct_swap is being executed,

    Victim L1 cache contains:

    • a's content (again as it is being used by ct_swap)

    Shared L2 cache contains:

    • ptr's content (via DMP as ptr value is being refilled to L1)

Finally to see our claim about EVptr and EVa is correct, consider that

Tags: #security