The Kyu project

January 25, 2023

Kyu networking -- Launch cores on the Allwinner H3

Why perform experiments with cores?

I had the idea that maybe all the cores need to run and set the bit to enable the D cache to get the shared L2 cache to come alive. There is after all a bit to enable coherency between cores and without all the cores up and running, who can say that it isn't deadlocked waiting for some core that isn't running to handshake or something.

A conclusion up front, just spinning up all the cores has not yet made any difference. It is fun though to get multiple cores running and avoid the strange race conditions that arise.

Run an old test and see if it still works (it does!)

I have some code from 2020 (2 years ago), let's try it:

Test 11: Multiple core test
Kyu (opi), ready> i 11
Kyu (opi), ready> Testing multicore startup
BANG!
BOOM!
*T*J CT orine  1mm fua_isleetd_t ttbor s, tapargte
able: 40040000
 set TTBCR to 0, read back: 00000000
 set TTBR0 to 40040059, read back: 40040059
 set TTBR1 to 40040059, read back: 40040059
 set DACR to ffffffff, read back: ffffffff
GIC cpu_init called
BONK!
BONK!
BONK!
BONK!
BONK!
BONK!
BONK!
BONK!
BONK!
BONK!

A look at the code shows us that this calls test_core() in board/multicore.c This routine will just be a stub in a single core system like the BBB. In the case of the Orange Pi it called a routine that I was endlessly using for experiments back in the day.

The BANG and BOOM just show core 0 sending interrupts to itself via the GIC. Then we launch core 1, which yields a mangled bunch of messages during startup while core 0 and core 1 are both accessing the uart. Core 1 then goes into a spin loop. After some delay, core 0 sends 10 interrupts to core 1, which yields the BONK messages.

Parking cores

There is a protocol for doing this, called ACPI or PSCI or some such. The idea is that a core starts running, then goes into a loop executing the WFI instruction. When and if it wakes up, it examines some memory locations (perhaps called "mailboxes") and can exit the loop if it finds that someone wants its services. Here is some code from the Fire3 bootrom that does exactly this:

     10c:       e3a00103        mov     r0, #-1073741824        ; 0xc0000000
     110:       e3800801        orr     r0, r0, #65536  ; 0x10000
     114:       e3e01000        mvn     r1, #0
     118:       e5801230        str     r1, [r0, #560]  ; 0x230
     11c:       e1a0000b        mov     r0, fp
     .......
     .......
     .......
     .......
; We come here if we are a secondary processor
; r0 = 0xc0010230
     184:       e3a00103        mov     r0, #-1073741824        ; 0xc0000000
     188:       e3800801        orr     r0, r0, #65536          ; 0x10000
     18c:       e3800e23        orr     r0, r0, #560            ; 0x230

; this is where secondary cores park until they get moved elsewhere.
; loop around WFI instruction
; monitors the location r0 is pointing to (0xc0010230)  
; this looks a lot like the PSCI scheme for starting secondary processors
; So, a secondary processor ends up here, spinning around a wfi
; and monitoring 0xc0010230
; Note that cmn adds the #1 to the value, discarding the result and setting flags
; So if the magic location is 0xffffffff, adding 1, wraps to zero,
; meaning to this loop, "nothing to do yet".
; Note that nothing here checks the cpu number, so it must be
; necessary to start the secondary processors one at a time,
; let them park here for a while, then move them elsewhere.

     190:       e320f003        wfi
     194:       e5901000        ldr     r1, [r0]
     198:       e3710001        cmn     r1, #1
     19c:       112fff11        bxne    r1
     1a0:       eafffffa        b       0x190

So all cores are watching the same magic location (as per the comment above). I have seen other twists on this scheme that have the core read the mpid register to figure out who they are and then look at a second word to see if they have been requested for action. Another scheme would be to assign a different mailbox to each core.

The following is similar code from the Rockchip RK3399 bootrom. Note a couple of differences though. One is the magic value 0xdeadbeaf to pull the core out of the loop. Also the wfe instruction is used rather than wfi (either should do). Also note that this ROM runs in 64 bit aarch64 mode, whereas the Fire3 ran in 32 bit mode.

Once again, all the cores get awakened, and will race together to the same posted branch location. Presumably there they will sort themselves out and return to a wait loop like this if not needed.

Note that this works in conjuntion with the SEV instruction, which is described as a "hint" instruction. It signals an event to all processors in a multiprocessor system. It is not clear to me whether "sev" will awaken wfi as well as wfe. It will certainly awaken wfe. One person says:

WFE is used for communication between multiple cores (one core waits by executing WFE and other core wakes it up by executing SEV). WFI is for sleeping until interrupt.

I buy this. So WFI is more like what you would use for the idle loop where only one processor is concerned. However all of this needs careful study (and will SEV awaken the loop in the Fire3 rom?).

; We are some core other than cluster 0, core 0
; Cores other than cluster 0, core 0 will spin here.
; The mechanism for awakening one of these cores is as follows:
;
; Write the address to run to 0xff8c0008 (in SRAM)
; Write 0xdeadbeaf to 0xff8c0004 (in SRAM)
;
; Note that we write 0xdeadbeaf, not 0xdeadbeef
;
; It seems to me that this will wake up all the cores and
; take them to this other address, which will then have to
; sort them all out via the mpidr_el1 register.
; It is a little surprising that we cannot select a single core
; to bring out of this WFE spin loop, but it is what it is.
ffff002c:       58000760        ldr     x0, 0xffff0118  ;  ff8c0004
ffff0030:       52800021        mov     w1, #0x1        ; w1 = 1
ffff0034:       b9000001        str     w1, [x0]        ;  write the "1"
ffff0038:       d503205f        wfe
ffff003c:       18000762        ldr     w2, 0xffff0128  ;  w2 = 0xdeadbeaf
ffff0040:       580006c0        ldr     x0, 0xffff0118  ;  ff8c0004
ffff0044:       b9400001        ldr     w1, [x0]
ffff0048:       6b02003f        cmp     w1, w2
ffff004c:       54ffff61        b.ne    0xffff0038      ; go back to sleep

ffff0050:       58000681        ldr     x1, 0xffff0120  ;  ff8c0008
ffff0054:       b9400020        ldr     w0, [x1]
ffff0058:       d61f0000        br      x0              ; go run at that address

I am a bit surprised, but apparently I have not yet read out and disassembled the bootrom for the Allwinner H5. My bet is that it will be like the H3 -- I will need to turn on clocks and manipulate a reset register to start each core running.

Have any comments? Questions? Drop me a line!

Kyu / [email protected]