You are not logged in.

Read the FAQ and Knowledge Base before posting.
WiFi not emulated and not supported!!
We won't make a 3DS/2DS emulator.



#1 2010-12-11 09:02:01

PypeBros
Member
Registered: 2010-12-11
Posts: 16

gdbstub stalling randomly.

Hello, everyone.

I've got some issues with code I recently ported to devkitarm 32 + libnds 1.4.8 and the only version of desmume that manage to run the code (I guess it's something with the FIFO support) is a version of desmume built out of the SVN release 3601. My problem is that it looks like this version stalls randomly when I use the gdb stub and use the "next" or "nexti" command.

I uploaded the troubling .nds file at http://139.165.223.2/~martin/scene/Appl … -buggy.nds . The NDS runs fairly fine and only has "logical" errors (such as background not showing up with some files and similar things that I'd like to investigate using DDD). I haven't tried yet to figure out since when the problem appeared, nor to what extent the problem appears on any DS programs.

I launch with e.g. desmume-cli AppleAssault.nds --gbaslot-rom=AppleAssault.nds --arm9gdb=9999 (under Linux), and then the GDB session would look like:

Reading symbols from /beetle/hobby/DS/dsgametools/trunk/AppleAssault/AppleAssault.elf...done.
(gdb) target remote localhost:9999
Remote debugging using localhost:9999
0x02000000 in _start ()
(gdb) break main
Breakpoint 1 at 0x20005fc: file /beetle/hobby/DS/dsgametools/trunk/AppleAssault/       
source/main.cpp, line 411.
(gdb) c
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x020005fa in main ()
    at /beetle/hobby/DS/dsgametools/trunk/AppleAssault/source/main.cpp:410
410     {
(gdb) n
main () at /beetle/hobby/DS/dsgametools/trunk/AppleAssault/source/main.cpp:414
414       MetaWindow mw;
(gdb) n
411       ge.prepare();
(gdb) n
412       ge.reganim(0);
(gdb) n
413       printf("PPP Team's runbox v 0.3 lite\n (c) sylvain 'pype' martin \n");
(gdb) n
414       MetaWindow mw;
(gdb) n

then the emulator is apparently stalled, gdb non-responding.

I got a few messages exchange with Zeromus already, before I find these forums. He f.i. suggested that

you should have better luck by attaching the gdb stub and letting it start the rom instead of starting the rom by unpausing desmume.

I can only assume that he was speaking about how to inspect desmume state from a second GDB, as I don't see how I could "let the gdb stub start the rom" otherwise.

As far as I can tell, desmume seems trapped into the following loop (in NDSSystem.cpp)

            //trap the debug-stalled condition
            #ifdef DEVELOPER
               singleStep = false;
               //(gdb stub doesnt yet know how to trigger these immediately by calling reschedule)
               while((NDS_ARM9.stalled || NDS_ARM7.stalled) && execute)
               {
=>             driver->EMU_DebugIdleUpdate();
                  nds_debug_continuing[0] = nds_debug_continuing[1] = true;
               }
            #endif

Both NDS_ARM have stalled==true and execute==true, if you ask. I proceed to some more analysis, and as far as I understand, this is a regular situation where NDS_ARM9.stalled should be cleared from the GDB stub thread that runs independently from this code. clearly, we could have use for a pthread condition or some other kind of IPC here so that there's no busy-waiting on the "emulation" thread. I suppose this is part of zeromus' plan when he says

the gdb stub is flaky and i will try to support it (...) but the code is a mess (...)

I still have to figure out what happens behind this "driver->EMU_DebugIdleUpdate()", since it looks not to do anything here. Although I was afraid that the lack of "volatile" on the .stalled fields would have led to the compiler optimizing the test away, but a disassembly of the generated desmume code shows that this isn't the root cause of the problem right now.

Any help, hints or guidance is welcome.


10 years of homebrew thanks to desmume

Offline

#2 2010-12-11 09:35:03

zeromus
Radical Ninja
Registered: 2009-01-05
Posts: 4,663

Re: gdbstub stalling randomly.

i mean let it start by you using it to issue a continue command. but thats moot since ive read your gdb sessoin log.

problem is your gcc doesnt have a control at a time when it needs to have control to issue the continue command or whatever kind of step command gets the equivalent job done..

that one part works the way it should. in windows, that thread suspends itself. at any rate, the gdb thread is still alive.

i suspect that most of the desmume gdb users arent using source level debugging and so they dont have this problem.

Offline

#3 2010-12-11 10:16:20

zeromus
Radical Ninja
Registered: 2009-01-05
Posts: 4,663

Re: gdbstub stalling randomly.

desmume seems to get stuck in that spot when it is receiving a huge amount of commands from gdb, stepping and pausing over and over and over, basically advancing one instruction at a time. this is exacerbated by the checked-in desmume GDB code doing lots of printfs. this overall problem may be caused by some quirk in how desmume is handling certain debugging commands which confuses gdb. nevertheless, emulation is proceeding.. very slowly.

I have a question: how can step possibly work without returning to the debugger over and over and seeing if it thinks it is in the next source line and wants to stop? any instruction may go to another source line and then the debugger needs to stop it there. if the debugger is very slow, then this may just take a huge amount of time and traffic to GDB

indeed--i see this thing running for a million years almost first thing after main runs, while it clears out memory between 0x06000000 and 0x06220000. after every single instruction it bounces out to gdb.

i need some gdb guru to tell me why this happens.

maybe it would be faster if it werent running in another thread. but ehh maybe not it needs to receive some asynchronous signals from gdb. however gdb has been crafted originally in such a way that the only real asynchronous signal is a SIGINT..

Offline

#4 2010-12-11 21:57:49

PypeBros
Member
Registered: 2010-12-11
Posts: 16

Re: gdbstub stalling randomly.

I have instrumented gdbstub so that it reports gdb requests and responses as well. A regular "step" request will look like:

<s
UNSTALL 0xa1054e0@200060c        # unstall_cpu, processPacket_gdb (gdb thread)
now resumed                # NDS_debug_continue(), processPacket_gdb (gdb thread)
un-stalled ^_^            # after loop (nds thread)
STALL 0xa1054e0@200060c            # stall_cpu() check_breaks_gdb|break_execution|step_instruction_watch|activateStub_gdb post-execution function (on NDS thread)
>S05
now breaking 9@200060c, 7@37ffab0    # NDS_debug_break() also in check_break ...
debug-stall condition met. driver@0x9da0348

but when things get locked, we instead have:

<s
UNSTALL 0xa1054e0@2000610
now resumed
un-stalled ^_^
STALL 0xa1054e0@200754c            # so the NDS thread has triggered the stub,
#  what I don't get here is that the reply is sent (indicateCPUStop sent) before NDS_debug_break() is also sent.

>S05                    # and the message has been sent.
# there is no "now breaking" message seen ... yet, it takes place in the same check_breaks_gdb
# (or the other one). So the reply is sent, but the
<g
>00000000 09710002 00407f02 24db0102 00000000 a83c000b 00000000 48960502
00407f02 24db0102 00000000 00000000 2d8a0202 103c000b 0f060002 0c060002
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003f000060
<s
UNSTALL 0xa1054e0@200754c            #armcpu :: unstall_cpu
now resumed                    # gdb thread : NDS_debug_continue
now breaking 9@200754c, 7@37ffab0        #NDSSystem :: NDS_debug_break
debug-stall condition met. driver@0x9da0348

My analysis here is that the gdbstub thread had the time to switch the "stalled" back to 0 before the "nds emulation" thread had the opportunity to enter the "wait stub" loop. It clearly only happens when the debugger replies quickly enough with a new "<s" request and it doesn't occur when a human has to type [ENTER] to proceed.
I haven't a clear idea of how to fix the situation yet.

PS : the output of svn diff of my repository and the elf file.

Last edited by PypeBros (2010-12-11 22:06:50)


10 years of homebrew thanks to desmume

Offline

#5 2010-12-11 22:38:51

zeromus
Radical Ninja
Registered: 2009-01-05
Posts: 4,663

Re: gdbstub stalling randomly.

if the debugger is switching it back to stalled, then desmume is doing what it was asked to do. why is the debugger asking for what its asking for?

Offline

#6 2010-12-12 08:33:41

PypeBros
Member
Registered: 2010-12-11
Posts: 16

Re: gdbstub stalling randomly.

I got confirmation of my analysis. The problem is due to a race condition between the two activities
replacing "ARM9.stalled = ARM7.stalled = 0" by a "sem_post(arm_unstalled)" in NDS_debug_continue() seems to fix it (at least, it does on Ubuntu here), although I suspect another race condition may make the emulator switch to a "continue" state in an uncontrolled fashion.


10 years of homebrew thanks to desmume

Offline

#7 2010-12-12 08:58:48

PypeBros
Member
Registered: 2010-12-11
Posts: 16

Re: gdbstub stalling randomly.

As far as I can tell from the GDB behaviour, btw, a sequence of "s" commands are sent to proceed through the next line if there are just "regular" instructions to be executed. register state and one memory transfer at PC are used to determine whether the next instruction is a jump or a branch. that allows the stub to keep ignoring about line numbers, etc.

When a branch is seen, the debugger automatically adds a new breakpoint past that instruction and uses a "continue" command. Afaik, breakpoints aren't persistent and should be reprogrammed everytime a STOP response has been sent by the stub.


10 years of homebrew thanks to desmume

Offline

#8 2010-12-12 09:14:33

zeromus
Radical Ninja
Registered: 2009-01-05
Posts: 4,663

Re: gdbstub stalling randomly.

It is harmless (with respect to the loop) if the gdb stub thread sets stalled back to 0 before the emulation thread gets to the "wait stub" loop because it will immediately escape that loop. of course, if the gdb stub thread was doing multiple unstalls in a row then it may be confusing the emulator core somewhat and in that case i would expect it to do something like miss its breakpoint and run forever, but not get stuck in that loop.

i dont see what use that semaphore does when there is only one thread calling nds_debug_continue.

your comment is right:
#what I don't get here is that the reply is sent (indicateCPUStop sent) before NDS_debug_break() is also sent.

it is utterly wrong and the code should look like this:

static void
step_instruction_watch( void *data, uint32_t addr, UNUSED_PARM(int thunmb)) {
  struct gdb_stub_state *stub = (struct gdb_stub_state *)data;

  DEBUG_LOG("Step watch: waiting for %08x at %08x\n", stub->step_instr_address,
      addr);

  if ( addr == stub->step_instr_address) {
    DEBUG_LOG("Step hit -> %08x\n", stub->cpu_ctrl->read_reg( stub->cpu_ctrl->data, 15));
    /* stall the processor */
    stub->cpu_ctrl->stall( stub->cpu_ctrl->data);
    NDS_debug_break();

    /* remove the post execution function */
    stub->cpu_ctrl->remove_post_ex_fn( stub->cpu_ctrl->data);

    /* indicate the halt */
    stub->stop_type = STOP_STEP_BREAK;
    indicateCPUStop_gdb( stub);
  }
}

I have been unable to reproduce your exact problems. the only thing that ever happens to me, as ive said, is gdb creates a *TON* of chatter which makes the emulator run slow as molasses.  I wonder if your gdb is sending different commands than mine. In my experience, different GDB may behave quite differently.

That stalled flag really ought to be volatile. otherwise it makes me nervous.

It sounds like what you've found is that the gdb stub can spam lots of commands down at once without waiting for the emulator to indicate that it is stopped. Are we supposed to buffer those??

I tried to catch stalls and unstalls getting mismatched, but couldnt.

suppose the gdb program is responding at a superhuman rate. it would send a step command to the emulator, but it can't do anything until the emulator says to proceed. this would happen in one of the breakpoint handlers in gdbstub.cpp AFTER cpu.stalled = 1; is run. so now the gdb program is aware that the emulator is intending to stop, and superhumanly says step again. this will make cpu.stalled = 0 happen. now it has to wait for desmume to do something else. now tell me: where does desmume get confused and how does it get stuck in a loop checking for cpu.stalled == 1?

Offline

#9 2010-12-12 09:22:42

zeromus
Radical Ninja
Registered: 2009-01-05
Posts: 4,663

Re: gdbstub stalling randomly.

there are two breakpoints, the standard list of any number of breakpoints, and the "step_instr_addr". that may be reset automatically, but the user breakpoints couldnt be or else the debugger would be puny.

when you say a series of 's' commands are sent, do you mean they are sent as a batch in some large number? or that each one of them runs and waits for gdb to get control back? for me, it is the latter. there are synchronous and asynchronous gdb modes and this may make a difference here (we may not even be testing the same thing)

if you are using gdb in asynchronous mode, such that it can send several commands which cause the emulator to run without having to wait for the emulator to return control then this is _never_ going to work because desmume just can't handle it due to there being no internal serialization of commands and emu executions.

and theres no guarantee that anyone has ever used desmume GDB before in asynchronous mode.

and finally, it would be nice if you could confirm that the emulator core is not proceeding AT ALL (print out something each time an instruction executes) because i used to think it didnt proceed at all (since seemed to spend 100% of time in that while-stalled loop) but then i noticed it actually just spent 99% of the time in that loop

Offline

#10 2010-12-12 09:43:25

zeromus
Radical Ninja
Registered: 2009-01-05
Posts: 4,663

Re: gdbstub stalling randomly.

I have received some confirmation that steps are implemented by sending a ton of 's' step instruction packets but theyre supposed to be individually dealt with by the emulator and waited for by the gdb program. it is possible that desmume's implementation of this is legendarily slow due to how it has worked out in another thread (seriously, this stuff is supposed to be done in the main thread)

I put the gdb processing in the main thread by making the main message receiving non-blocking and it didnt revolutionize the speed.

Finally, from looking at your instrumented gdb session, I conclude that since desmume is sending S05 and GDB is ignoring it, that this is GDB's fault.

Offline

#11 2010-12-12 12:58:19

PypeBros
Member
Registered: 2010-12-11
Posts: 16

Re: gdbstub stalling randomly.


10 years of homebrew thanks to desmume

Offline

#12 2010-12-12 13:33:06

PypeBros
Member
Registered: 2010-12-11
Posts: 16

Re: gdbstub stalling randomly.

I took some time yesterday to depict the whole picture... I take good note of your remarks, and I will try to come with a more convincing proof of what I believe to be the root cause. Just some quick back-to-back replies meanwhile:

I conclude that since desmume is sending S05 and GDB is ignoring it, that this is GDB's fault.

That might be jumping on conclusion: all the traces end up in a situation where 's' has been received, and 'S05' isn't emitted regardless of how long we wait.

i dont see what use that semaphore does when there is only one thread calling nds_debug_continue.

I'm not using semaphore as a critical section provider here, but rather as a synchronisation mechanism: it replaces the busy waiting loop by a sem_wait(arm9_unstalled); and the gdbstub "releases" the NDS main thread by a sem_post(arm9_unstalled). That may not be appropriate way to do for non-cli implementation of DESMUME, though, as the emulation thread will be suspended until the stub has received a *s*tep or *c*ontinue command.

when you say a series of 's' commands are sent, do you mean 1) they are sent as a batch in some large number? or 2) that each one of them runs and waits for gdb to get control back?

I mean 2). You could somehow stress your stub with a program that repeatedly sends "s" as soon as it received Sxx from the tested stub.

your comment is right:
#what I don't get here is that the reply is sent (indicateCPUStop sent) before NDS_debug_break() is also sent.
it is utterly wrong (...)

I see. I will try that fix to see whether it makes any other modification unnecessary.

if you are using gdb in asynchronous mode (...)

As far as I can tell, I'm not. My regular setup is to use DDD, but for the purpose of these very tests, I just used the bare arm-eabi-gdb without any tweaks or preference modifications.

it would be nice if you could confirm that the emulator core is not proceeding AT ALL (print out something each time an instruction executes)

The instrumented NDSSystem.cpp stall-cactching-loop looks like:

            //trap the debug-stalled condition
singleStep = false;
if (NDS_ARM9.stalled || NDS_ARM7.stalled) { 
      fprintf(stderr,"debug-stall condition met. driver@%p\n",driver);
    //(gdb stub doesnt yet know how to trigger these immediately by calling reschedule)
#ifdef __UNSTALL_WITH_SEMAPHORES__
      if (NDS_ARM9.stalled) sem_wait(&arm9_unstalled);
      if (NDS_ARM7.stalled) sem_wait(&arm7_unstalled);
      NDS_ARM9.stalled = NDS_ARM7.stalled = 0;
      nds_debug_continuing[0] = nds_debug_continuing[1] = true;
#else
      while((NDS_ARM9.stalled || NDS_ARM7.stalled) && execute)
      {
          driver->EMU_DebugIdleUpdate(); noop anyway :P
          nds_debug_continuing[0] = nds_debug_continuing[1] = true;
    }
#endif            
      fprintf(stderr,"un-stalled ^_^\n");
    }

    nds.cpuloopIterationCount++;
    sequencer.execHardware();
// ...

If anything had occurred out of the busy-waiting loop, I should have seen at least an "un-stalled" message on the trace. I used stderr exactly for that purpose.


10 years of homebrew thanks to desmume

Offline

#13 2010-12-12 15:40:29

PypeBros
Member
Registered: 2010-12-11
Posts: 16

Re: gdbstub stalling randomly.

there are two breakpoints, the standard list of any number of breakpoints, and the "step_instr_addr". that may be reset automatically, but the user breakpoints couldnt be or else the debugger would be puny.

That would make sense, but I only see "Z0,<address>,2" commands flowing between the debugger and the stub, regardless of whether the address is the main() function where I set a breakpoint manually in GDB of the next instruction where the breakpoint has been inserted automatically to step-over a function call.


10 years of homebrew thanks to desmume

Offline

#14 2010-12-12 19:38:07

zeromus
Radical Ninja
Registered: 2009-01-05
Posts: 4,663

Re: gdbstub stalling randomly.

im pretty sure that code fix is all that its going to take because it is the only thing resembling a race condition i see in this entire system, and because this bug is happening to you and not to me on different OSes which would be the hallmark of an actively nondeterministic thing

Offline

#15 2010-12-13 19:54:05

PypeBros
Member
Registered: 2010-12-11
Posts: 16

Re: gdbstub stalling randomly.

Hello again.

I tried the fix you proposed for step_instruction_watch on a fresh checkout of the svn and it seems to fix the "hanging GDB stub" problem. It is very likely that the race condition only occur with specific OS as it will clearly be sensitive to things such as priority inversion policies or network stack implementation.

That being said, it seems that this freshly-rebuild desmume remains more sensitive to a "reverse" issue that makes "step" behave as "cont" from times to times. Could there be a similar race condition with the "post_exec_function" ? I haven't investigated that yet.

Thanks for your help anyway.


10 years of homebrew thanks to desmume

Offline

#16 2010-12-13 20:21:30

zeromus
Radical Ninja
Registered: 2009-01-05
Posts: 4,663

Re: gdbstub stalling randomly.

look for it. i broke this thing when i added nds_debug_break and nds_debug_continue. there could be other similar bugs.

the goal was to unify the operations of the gdb stub with non-gdb debugging capabilities (in preparation for an integrated debugger one day). as you can see, we havent got very far.

Offline

#17 2010-12-14 08:45:47

PypeBros
Member
Registered: 2010-12-11
Posts: 16

Re: gdbstub stalling randomly.

I noted that there were redundant unstalls in the handling of "s" instruction, for instance. arm_cpu->unstall being called, and then nds_debug_continue. It doesn't seem correct from my point of view, although chances are very *very* slim that it could trigger a fault.

I'll keep you updated if I find another oddity.


10 years of homebrew thanks to desmume

Offline

Board footer

Powered by FluxBB