The Kyu project

January 4, 2023

BSD TCP/IP for Kyu - Data aborts

After fixing the thr_unblock() bug by adding (and rearranging) locks in Kyu I started a long test run and went to bed. It ran for 5194 transfers and then halted with a pair of data aborts.

Kyu is "all locked up".
The traceback from the first abort looks like:

data abort in thread tcp-input

pc : [<4001ce38>]	   lr : [<400245d0>]
sp : 40573e60  ip : 0000d8a6	 fp : 40573e7c
r10: 00000002  r9 : 5bf42ee8	 r8 : 5ffb4b70
r7 : 5bf452a0  r6 : 40000000	 r5 : 00000002  r4 : 4005c100
r3 : 60000000  r2 : 00000050	 r1 : 60000001  r0 : 5ffffff6
cpsr: 20000013  Flags: nzCv  IRQs on  FIQs on  Mode SVC_32

40573e60  01c82000 00000001 40163600 5ffffff6 
40573e70  4016360b 60000001 40573eac 40573eac 

PC = 4001ce38 ( memcpy+48 )
Leaf routine
Called from tcp_output+116c -- 4002a970
Called from tcp_input+1c54 -- 4002842c
Called from tcp_bsd_process+194 -- 4002451c
Called from tcp_thread+180 -- 40024258
Called from thr_exit -- 4000f89c

The traceback shows us in memcpy and neither r2 nor r3 look like acceptable pointers. Also, there is no direct call to memcpy in tcp_output. Just above the address given is a call to ip_output which does call memcpy (to copy into a Kyu netbuf).

The assembly code shows a store to [r3] where r3 (from above) is 0x60000000, which would certainly cause a data abort.

The traceback from the second abort looks like:

Send blocked, 9030 seconds
 *** unexpected emac Rx int status: 40000300
Unusual desc status: 314b504b

data abort in thread tcp-timer

pc : [<400166dc>]	   lr : [<4000550c>]
sp : 40577f68  ip : 0000001c	 fp : 40577f74
r10: 00000002  r9 : 5bf42ee8	 r8 : 5ffb4b70
r7 : 5bf452a0  r6 : 40000000	 r5 : 00000002  r4 : 4005c100
r3 : 314b504b  r2 : 65000000	 r1 : 503167a8  r0 : 400aba68
cpsr: 200001d3  Flags: nzCv  IRQs off  FIQs off  Mode SVC_32

40577f60  40577f74 400003b4 00003148 314b504b
40577f70  40577f94 4000550c 40577f8c 40000300
40577f80  50313631 00002500 ffc102d9 00000020
40577f90  40577fb4 40005734 40577fa4 00000000

PC = 400166dc ( netbuf_alloc_i+48 )
Called from rx_handler+d4 -- 4000550c
Called from emac_handler+bc -- 40005734
Called from do_irq+ac -- 40001164
Called from irq+34 -- 40000434
Called from thr_block_q+58 -- 4000fc60
Called from thr_exit+60 -- 4000f8fc
Called from sem_unblock+70 -- 40010e24

Notice the messages from the emac driver before we get the data abort. In particular notice the odd value 314b504b which pops up several places.

A look through the code suggests that the netbuf free list has gotten corrupted.

Ideas

This sort of thing is (once again) tricky to debug. It arises only after an hour of intense testing, and the machine is locked up, so we cannot probe at will.

An important point again. Resist every temptation to make code changes while a long test like this is running (or save kyu.dump if you do).

Add an overall network output queue.
Code up a is_valid_ram() test routine to verify addresses.
Ensure all allocators have limits (stuff in mbuf.c)
Add checks on valid ram address in strategic places.
in particular check values returned by netbuf_alloc()
Add a show_alloc() routine to display statistics for all allocators.
Run a test of 500 to 1000 transfers, then check allocators
Run a test along with a ping (or some other regular activity).

Orange Pi RAM addresses

This sort of thing will change from board to board, so a routine like the proposed "is_valid_ram" should live in board.c in the target board directory. It will then show up as missing when I return to a different board (or new board). For the Orange Pi, we can have from 1/2G to 2G of ram. The following are defined in board.h

#define BOARD_RAM_SIZE  0       /* Force probing */
#define BOARD_RAM_MAX   0x80000000      /* 2G */
#define BOARD_RAM_START 0x40000000

It turns out I have been down this road before and have the following routine in armv7/ram.c --

int valid_ram_address ( unsigned long addr );

So I just need to sprinkle calls to this in sensible places -- such as netbuf_alloc_i, netbuf_alloc, and in ip_output before the call to memcpy. Note that this call in ip_output is using a just allocated netbuf, the suspicion starts to center on the netbuf allocator.

The netbuf allocator

This is in net/net_main.c. I found netbuf_show() already in existence and augmented it a bit. Then I added a call to it so it is included in the information displayed by "t 1". Then I started a run limited to 500 transfers. It is now running considerably faster (due to a small amount of debug removed from the tcp-input thread?). Each transfer is now 1.13 seconds. Everything looks fine at the end of the run:

User lock: idle
  mbuf: alloc =  1, free =   7, max =  8
mbufcl: alloc =  0, free = 256, max =  4
  sock: alloc =  1, free =   1, max =  2
 inpcb: alloc =  1, free =   1, max =  2
Kyu input queue size: 0
TCP input queue size: 0
Netbuf head: 406b4cb0
512 netbuf available
512 netbuf on free list
512 netbuf configured

No netbuf leak like I sort of expected. So, we will let it run until it finds trouble. And it doesn't take long. I get:

connection 2078 ..cksum: out of data
 *** *** *** tcp_input 2, bad cksum = 1015
m, tlen, len = 40168778, 12336, 12356
 finished (2078)
Wangdoodle connection 2079 ..netbuf_alloc rv, free = 406b4cb0, free
PANIC: netbuf_alloc_i -- bad next address

I add some more debug and try again. This leads to an "amusing" debug loop spewing out traceback forever and ever. I see this once I pull the power cord:

PC = 4002b5a8 ( tcp_fasttimo+4c )
Called from tcp_timer_func+6c -- 4002410c
Called from thr_exit -- 4000f89c
data abort in thread tcp-timer

CPU: 0
pc : [<4002b5a8>]	   lr : [<4002b56c>]
sp : 40577fe0  ip : 400240a0	 fp : 40577fec
r10: 00000002  r9 : 5bf42ee8	 r8 : 5ffb4b70
r7 : 5bf452a0  r6 : 40000000	 r5 : 00000002  r4 : 4005c100
r3 : abababab  r2 : 40168560	 r1 : 60000113  r0 : 60000113
cpsr: a0000013  Flags: NzCv  IRQs on  FIQs on  Mode SVC_32

40577fe0  abababab 40168560 40577ffc 4002410c
40577ff0  40577ffc 00000000 00000000 4000f89c

I find it sort of odd that I am getting these entirely new sort of things that look like memory corruption. But probably what I ought to do is to buckle down and code up that output queue that I am sure that I need. If nothing else, I can get contention between the tcp-timer thread and the tcp-input thread.

An output queue

First, the path from code in TCP to our IP routines needs to be inspected to ensure it is thread safe and reentrant. TCP code in tcp_subr.c (tcp_notify) and in tcp_output.c both call ip_output, which is a routine I wrote that copies the mbufs into a netbuf. It then passes the netbuf to the Kyu IP layer by calling ip_send() which is in net/net_ip.c.

My impression of things is that the path is reentrant all the way to calling net_send() in net/net_main.c. This routine is simple and is the place to set up the output queue. At this time it just calls board_net_send(), which is by no means guaranteed to be reentrant.

So, I coded this up (which was really no big deal). Now the big surprise is that things are worse! Unless I made some stupid coding error (entirely possible) this has somehow exacerbated some timing issue. Now I cannot even go 10 transfers before some kind of panic takes place.

MSS mysteries

Interestingly, wireshark now shows the Kyu side sending 2 packets back to back before the linux side sends an ACK. Why it doesn't just send one big packet is still a mystery (linux advertises a 6418 byte window according to wireshark.) My guess is that with an output buffer it can quickly send the first packet on the way and have time to send a second before linux can respond with an ACK. All that is great, but why do I get "BSD panic: m_copym 2" now?
I do see this in tcp.h:

#define TCP_MSS 512

But then why am I seeing 500 bytes instead of 512?

At any rate, this MSS business involves options. And as I remember, I bypassed handling TCP options for one reason or other. Typically (I am reading) a linux system will advertise an MSS of 1460 bytes. "The book" says that almost every system they investigates used a 1024 byte MSS size. So the 500 (or 512) value is something to investigate someday, but a rabbit hole distraction as thing time

New debug tools

I rework my bsd_panic routine to display additional information, in particular the information that the "t 1" command would display. I also enhance my mbuf_show() routine and add calls to it before the calls to bsd_panic in the two cases that I have seen thus far (the checksum calculation in tcp_input() and the panic in m_copym()). I also make the checksum failure in tcp_input() call bsd_panic(). This will give me valuable information for now, since some bug in my own code is causing the packet corruption, not trouble over the network. This case should normally just drop the packet, but I want to jam a screwdriver in the spokes and stop everything. I also add some logic to panic() itself that limits panic to just two. If panic() gets called a third time, it just spins. This will avoid some endless runaway that loses the most interesting information at the start.

Have any comments? Questions? Drop me a line!

Kyu / [email protected]