TIL that Go uses uint32 instead of uint8 or just bool. At first glance, that seems counterintuitive because ideally a bool is just a bit. Why's do we need the additional 31 more bits?

The reason is that - most CPU can perform atomic operations efficiently for 32-bit/64-bit word-sized values. Some CPUs can perform atomic operations even on a single byte, however that may be inefficient in terms of speed and memory.

// A Bool is an atomic boolean value.
// The zero value is false.
//
// Bool must not be copied after first use.
type Bool struct {
	_ noCopy
	v uint32
}

Let's see how CompareAndSwap works for a bool

// CompareAndSwap executes the compare-and-swap operation for the boolean value x.
func (x *Bool) CompareAndSwap(old, new bool) (swapped bool) {
	return CompareAndSwapUint32(&x.v, b32(old), b32(new))
}

Source: https://github.com/golang/go/blob/47a63a331daa96de55562fbe0fa0201757c7d155/src/sync/atomic/type.go#L27C1-L30C2

CompareAndSwapUint32 is a Go-level wrapper around a lower-level atomic CAS operation in the runtime

This function is implemented in assembly.

TEXT ·CompareAndSwapUint32(SB),NOSPLIT,$0
	JMP	internal∕runtime∕atomic·Cas(SB)

On amd64, bool swap is implemented as shown below in assembly.

// bool Cas(int32 *val, int32 old, int32 new)
// Atomically:
// if(*val == old){
//     *val = new;
//     return 1;
// } else
//     return 0;
TEXT runtime∕internal∕atomic·Cas(SB),NOSPLIT,$0-17
    MOVQ ptr+0(FP), BX    // Load pointer address into BX
    MOVL old+8(FP), AX    // Load expected value into AX (required by CMPXCHG)
    MOVL new+12(FP), CX   // Load new value into CX
    LOCK                   // Lock prefix for atomicity
    CMPXCHGL CX, 0(BX)    // Compare AX with [BX], if equal swap with CX
    SETEQ ret+16(FP)      // Set return value based on ZF (zero flag)
    RET

Source: https://github.com/golang/go/blob/47a63a331daa96de55562fbe0fa0201757c7d155/src/internal/runtime/atomic/atomic_amd64.s#L22-L37