Debugging with Adaptive Partitioning - A Better Mousetrap !
November 30, 2007 by fieldstudyIf you’ve ever tried to debug a process/task on an embedded target, you’ve probably hit this problem: the code you’re debugging goes off into some endless loop somewhere, and eats all of the CPU, so that your repeated (and increasingly more violent !) mouse clicks on the ‘Stop’ button are in vain. Typically, the rogue process is either higher priority than the debugger, or it’s the same priority and running in FIFO mode… The end result is usually a hard reset of the target board, which is a shame, because it’d be really handy to know where the process had got to…
Now that Adaptive Partitioning is included as part of the QNX development seat, there’s no excuse for not using it to address this problem. It’s also one of the best ways to understand and appreciate the value of time-partitioning, and spark your imagination for the possibilities that it might hold if applied to your real system.
Conceptually, adaptive partitioning is quite simple - a partition is an arbitrary group of threads (from one or more processes) that is assigned some percentage of the CPU budget. The total available CPU budget is always 100%, and the sum of the budgets of the partitions in a system always adds up to 100%.
When the system runs, the threads are scheduled using their existing priorities, exactly as they already are if adaptive partitioning is not being used. The one difference is that, as a thread runs, its running time (i.e. CPU consumption) is carefully calculated and subtracted from the current CPU budget for the partition to which it belongs. If that partition budget ever reaches 0 i.e. the partition has consumed all of its budget, a ready thread in another partition (that has budget) will preempt the currently running thread, *EVEN* if the new thread is lower priority than the currently running thread. The clever (adaptive) part is that if there are no ready threads in other partitions, then the running thread (that now has no budget) will continue to run, thus using time from other partitions that those partitions did not need.
So, the way this applies to our debugger case is quite straightforward - we simply ensure that the debugger and associated communications processes (typically just the io-net (or soon to be io-pkt) process in QNX) run in their own partition that has some (usually small) CPU budget associated with it. That way, when you click on the ‘Stop’ button on your host, the message that gets sent to the target will get processed by the network stack and delivered to the debugger, and it will have enough CPU to issue the right commands to manage the rogue process. When you’re not pressing the Stop button, if the process being debugged needs lots of CPU, it can exceed its own partition budget by using the debugger partition’s alloted time, but only until that partition has work to do (the “stop” button being pressed).
So, how do you build a system that does this ?
Make sure APS is built into the kernel.
To do this, you need to change the build file that you use to create the boot image (known as the Initial File System or .ifs file).
This works for any target on any host, but I’ll just show it on a stock X86 self-hosted system for now, from the command line:
To add APS, log in as root, open a shell and type:
cd /boot/build
[Now, you are in the directory where the default build files live.]
As of the 6.3.2 install CD, we’ve thoughtfully added a build file all ready to do what we want - it’s called qnxbasedmaaps.build
Edit this file:
e.g. type:
vi qnxbasedmaaps.build
and you’ll see that line 12 is:
[module=aps] PATH=/proc/boot:/bin:/usr/bin:/opt/bin LD_LIBRARY_PATH=/proc/boot:/lib:/usr/lib:/lib/dll:/opt/lib procnto-instr
The APS part is just the leading:
[module=aps] - so if you add that to any other build file on the line that starts procnto-instr, that will enable APS.
This build file has some extra lines commented out that create and start up a partition, and launch qconn (the debugger agent that we care about) into its own partition. This gives the debugger process some guaranteed CPU time, even if the process being debugged goes into a while (1) ; loop at a higher priority than the debugger.
The lines are:
# Create an example scheduler partition
# Create a 20% p “Debugging”
#sched_aps Debugging 20
# Start qconn in the Debugging partition
#[sched_aps=Debugging]/usr/sbin/qconn
Uncommenting this so it looks like this instead:
# Create an example scheduler partition
# Create a 20% p “Debugging”
sched_aps Debugging 20
# Start qconn in the Debugging partition
[sched_aps=Debugging]/usr/sbin/qconn
If you want, you can change the ‘20′ to any other integer percentage of CPU that you want the debugger to have.
I also like to print a message, if only to remind me that what I booted is a custom boot image:
e.g. I might add
display_msg “Running with a Debug APS partition “
Now we want to start the io-net process and ensure it is in the debug partition too.
You can do that in one of two ways:
1) Start it explicitly using the ‘on’ command (if you know what driver(s) and protocols you want, this is a good way)
e.g. Start io-net in the ‘Debugging’ partition, running the tcp/ip stack and the AMD Lance driver (used by Vmware)
on -X aps=Debugging io-net -ptcpip -dlance
2) If you just want it to run with the default diskboot, it’s a bit more complicated.
First you need to create a small executable file:
Here’s the source code - I called it startnet.c
All it does is run the ‘on’ program, which then starts the io-net process in the ‘Debugging’ partition, and then waits for io-net to start, before moving itself into the background and waiting for 60 seconds.
#include <stdio.h>
#include <sys/procmgr.h>
// start io-net for enumeration purposes, putting it into a partition
main(int argc, char *argv[])
{
system (”/bin/on -X aps=Debugging /sbin/io-net -ptcpip”);
system (”/bin/waitfor /dev/io-net”);
// move to background
procmgr_daemon(0,0);
sleep(60); // wait for 1 minute
exit(0);
}
Compile it with:
qcc startnet.c -o startnet
and then copy it to /sbin
cp startnet /sbin/startnet
Then you need to edit the file:
/etc/system/enum/include/net
This file contains the command to use to start io-net. By default it is:
“
#
# macro definitions for network
#
all
set(IONET_CMD, io-net -ptcpip)
“
If you change this to be:
“
#
# macro definitions for network
#
all
set(IONET_CMD, “startnet”)
“
then the startnet program will get started instead, when the enumerator wants to start a new network interface.
This slightly convoluted operation is required because every time that the enumerator code decides to start a new network interface, it first looks to see if the program pointed to by IONET_CMD is running and starts it if it is not. If it is already running, then instead of starting a new instance, it uses ‘mount -Tio-net’ to add the new interface to the existing io-net process. If startnet did not wait around, then it would get started again, and start up multiple io-net processes, which is not what we want.
The final thing you need to do is to move the line in the build file that creates the partition, so that it is created before io-net gets started by diskboot.
So, the bootfile lines become:
[+script] startup-script = {
# To save memory make everyone use the libc in the boot image!
# For speed (less symbolic lookups) we point to libc.so.2 instead of lib
procmgr_symlink ../../proc/boot/libc.so.2 /usr/lib/ldqnx.so.2
# Create an example scheduler partition
# Create a 20% partition named “Debugging”
sched_aps Debugging 20
# Default user programs to priorty 10, other scheduler (pri=10o)
# Tell “diskboot” this is a hard disk boot (-b1)
# Tell “diskboot” to use DMA on IDE drives (-D1)
# Start 4 text consoles by passing “-n4″ to “devc-con” (-o)
# By adding “-e” linux ext2 filesystem will be mounted as well.
[pri=10o] PATH=/proc/boot diskboot -b1 -D1 -odevc-con,-n4
display_msg “Running with a debug APS partition”
# Start qconn in the Debugging partition
[sched_aps=Debugging]/usr/sbin/qconn
}
If you are running an SMP (aka multicore) system, you can change line 12 to use procnto-smp-instr instead of the default procnto-instr to make this run SMP.
Now, save the modified build file.
To create the boot image from this build file, run:
mkifs qnxbasedmaaps.build <filename>
You can create a file for later use and copy it to the boot image later, or you can specify the main boot image directly
e.g.
mkifs qnxbasedmaaps.build debugaps.ifs
Will create a bootable image called debugaps.ifs that you can use at a later date, but will not affect this machine’s default boot image.
or
mkifs qnxbasedmaaps.build /.boot
will change the default boot image to be what is specified in qnxdmaaps.build
If you perform this latter operation, rebooting should see your image boot and run.
Once you’ve rebooted, if all is well, everything should appear to be the same as before, but if you log in and run ‘aps’, you should see output similar to:
# aps
+---- CPU Time ---+-- Critical Time --
Partition name id | Budget | Used | Budget | Used
------------------------+-----------------+--------------------
System 0 | 80% | 19.70% | 200ms | 0.000ms
Debugging 1 | 20% | 0.17% | 0ms | 0.000ms
------------------------+-----------------+-------------------
Total | 100% | 19.88% | #
This shows that APS is running and that we have 2 partitions - the default ‘System’ partition, and the one we created called ‘Debugging’
If you want to know which APS partition each process/thread is assigned to, you can run:
pidin sched
or for a given process
pidin -p <process name> sched
e.g. to check that qconn is in the Debugging partition:
# pidin -p qconn sched
pid tid name prio cpu ExtSched STATE
196625 1 usr/sbin/qconn 10r 0 Debugging SIGWAITINFO
196625 2 usr/sbin/qconn 10r 0 Debugging CONDVAR
196625 3 usr/sbin/qconn 10r 1 Debugging RECEIVE
196625 4 usr/sbin/qconn 10r 0 Debugging RECEIVE
#
And there we have it.
Now, when you run the debugger from Momentics, the program under load will not stop the debugger and network stack from running (because they will get their partition’s CPU quota made available to them), and you should be able to stop even the most renegade of high priority, badly behave, CPU-gobbling processes !