You are not logged in.
Yes. But before it makes sense to make a commit or patch, I have to do this properly
- Make it work on the Mac/Cocoa port as well
- Maybe test on Windows.
After those things and the commit (patch submission), next in line could be adding event/signal based waiting to replace the current busyloop. I think the original idea might have been to put the emulator into some kind of idle mode while it's waiting, but it doesn't seem to be doing that.
I have a Windows box that has Visual Studio Express 2008 and 2010. I guess one of those will do? Or was it Visual C Express. It's the variety that produces plain old x86 code for "desktop" applications, without any .net stuff.
Maybe this forum isn't a good way to do this, but here's an "svn diff" report from adding only the mutex fix on top of r5068. Now source-stepping and everything works with the GDB stub, without jamming the emulator.
I haven't tested this on the Mac, let alone Windows, and it will probably break on Mac, because the main program doesn't declare or init the mutex variable. Looking at it now I think it would have been better to place the mutex declaration and allocation in a central location like gdbstub.cpp, and make the main programs just call some functions. There was already a bunch of structure clean-ups between 5067-5068. I'm also not particularly fond of spreading platform-named #ifdefs all over the place, but at least this shows the main idea of how the fix works.
Index: src/gdbstub/gdbstub.cpp
===================================================================
--- src/gdbstub/gdbstub.cpp (revision 5068)
+++ src/gdbstub/gdbstub.cpp (working copy)
@@ -527,6 +527,9 @@
uint32_t send_size = 0;
DEBUG_LOG("Processing packet %c\n", packet[0]);
+ #ifndef HOST_WINDOWS
+ pthread_mutex_lock(&cpu_mutex);
+ #endif
switch( packet[0]) {
case 3:
@@ -899,6 +902,10 @@
break;
}
+ #ifndef HOST_WINDOWS
+ pthread_mutex_unlock(&cpu_mutex);
+ #endif
+
if ( send_reply) {
return putpacket( sock, out_packet, send_size);
}
Index: src/cli/main.cpp
===================================================================
--- src/cli/main.cpp (revision 5068)
+++ src/cli/main.cpp (working copy)
@@ -62,8 +62,17 @@
#ifdef GDB_STUB
#include "../armcpu.h"
#include "../gdbstub.h"
+
+#ifndef HOST_WINDOWS
+// Now both the GTK main and CLI main have this mutex variable defined, allocated
+// and destroyed, just like all mains (Cococa and Windows included) also have code
+// for creation and destruction of the GDB stubs. It would be better to place all
+// that in a common location to avoid unnecessary duplication of logic.
+ pthread_mutex_t cpu_mutex; // to access the CPUs in any way, a thread has to get a lock on this first
#endif
+#endif
+
volatile bool execute = false;
static float nds_screen_size_ratio = 1.0f;
@@ -596,6 +605,11 @@
driver = new BaseDriver();
#ifdef GDB_STUB
+
+#ifndef HOST_WINDOWS
+ pthread_mutex_init(&cpu_mutex, NULL);
+#endif
+
/*
* Activate the GDB stubs
* This has to come after NDS_Init() where the CPUs are set up.
@@ -826,7 +840,12 @@
destroyStub_gdb( arm7_gdb_stub);
arm7_gdb_stub = NULL;
+
+#ifndef HOST_WINDOWS
+ pthread_mutex_destroy(&cpu_mutex);
#endif
+
+#endif
SDL_Quit();
NDS_DeInit();
Index: src/gdbstub.h
===================================================================
--- src/gdbstub.h (revision 5068)
+++ src/gdbstub.h (working copy)
@@ -19,12 +19,29 @@
#ifndef _GDBSTUB_H_
#define _GDBSTUB_H_ 1
+// For cpu_mutex
+#ifdef HOST_WINDOWS
+#include <windows.h>
+#else
+#include <pthread.h>
+#if defined HOST_LINUX
+#include <unistd.h>
+#elif defined HOST_BSD || defined HOST_DARWIN
+#include <sys/sysctl.h>
+#endif
+#endif // HOST_WINDOWS
+
+
#include "types.h"
typedef void *gdbstub_handle_t;
struct armcpu_t;
struct armcpu_memory_iface;
+#ifndef HOST_WINDOWS
+extern pthread_mutex_t cpu_mutex;
+#endif
+
/*
* The function interface
*/
Index: src/gtk/main.cpp
===================================================================
--- src/gtk/main.cpp (revision 5068)
+++ src/gtk/main.cpp (working copy)
@@ -67,8 +67,17 @@
#ifdef GDB_STUB
#include "armcpu.h"
#include "gdbstub.h"
+
+#ifndef HOST_WINDOWS
+// Now both the GTK main and CLI main have this mutex variable defined, allocated
+// and destroyed, just like all mains (Cococa and Windows included) also have code
+// for creation and destruction of the GDB stubs. It would be better to place all
+// that in a common location to avoid unnecessary duplication of logic.
+ pthread_mutex_t cpu_mutex; // to access the CPUs in any way, a thread has to get a lock on this first
#endif
+#endif
+
#if defined(HAVE_LIBOSMESA) || defined(HAVE_GL_GLX)
#include <GL/gl.h>
#include <GL/glu.h>
@@ -2919,6 +2928,11 @@
* where the cpus are set up.
*/
#ifdef GDB_STUB
+
+#ifndef HOST_WINDOWS
+ pthread_mutex_init(&cpu_mutex, NULL);
+#endif
+
gdbstub_handle_t arm9_gdb_stub = NULL;
gdbstub_handle_t arm7_gdb_stub = NULL;
@@ -3277,8 +3291,13 @@
destroyStub_gdb( arm7_gdb_stub);
arm7_gdb_stub = NULL;
+
+#ifndef HOST_WINDOWS
+ pthread_mutex_destroy(&cpu_mutex);
#endif
+#endif
+
return EXIT_SUCCESS;
}
Index: src/NDSSystem.cpp
===================================================================
--- src/NDSSystem.cpp (revision 5068)
+++ src/NDSSystem.cpp (working copy)
@@ -55,6 +55,10 @@
#include "SPU.h"
#include "wifi.h"
+#ifdef GDB_STUB
+#include "gdbstub.h"
+#endif
+
//int xxctr=0;
//#define LOG_ARM9
//#define LOG_ARM7
@@ -1828,6 +1832,13 @@
template<bool FORCE>
void NDS_exec(s32 nb)
{
+ #ifdef GDB_STUB
+ #ifndef HOST_WINDOWS
+ pthread_mutex_lock(&cpu_mutex);
+ #endif
+ #endif
+
+
LagFrameFlag=1;
sequencer.nds_vblankEnded = false;
@@ -1860,7 +1871,17 @@
while((NDS_ARM9.stalled || NDS_ARM7.stalled) && execute)
{
+ #ifdef GDB_STUB
+ #ifndef HOST_WINDOWS
+ pthread_mutex_unlock(&cpu_mutex);
+ #endif
+ #endif
driver->EMU_DebugIdleUpdate();
+ #ifdef GDB_STUB
+ #ifndef HOST_WINDOWS
+ pthread_mutex_lock(&cpu_mutex);
+ #endif
+ #endif
nds_debug_continuing[0] = nds_debug_continuing[1] = true;
}
@@ -1961,6 +1982,12 @@
DEBUG_Notify.NextFrame();
if (cheats)
cheats->process();
+
+ #ifdef GDB_STUB
+ #ifndef HOST_WINDOWS
+ pthread_mutex_unlock(&cpu_mutex);
+ #endif
+ #endif
}
template<int PROCNUM> static void execHardware_interrupts_core()
LOL now that think of it, that was alternative C) because I didn't use a whole set of "well-defined" events, and I didn't make a full-fledged synchronization message/callback thingy either. At first I thought of a "raise your hand when you want to talk" system, but then I thought why do something even that complicated, just keep the GDB stub from performing any NDS CPU related operations while the emulation engine is in the middle of executing an instruction. Instructions run pretty fast, so the stub won't have to wait too long anyway.
Edit: well, NDS_Exec() runs until the next NDS vblank (if I understood correctly), so I guess that sets some kind of a granularity boundary for the GDB stub. GDB commands are only handled once per frame. Unless it's single-stepping, in which case the GDB stub is allowed to do something in the busy-loop even once per NDS instruction, if it wants to. I guess this shouldn't be any kind of a problem. By default, GDB's time-out for remote command execution is two seconds, so one NDS frame is nothing.
Whoah! It seems that I managed to fix it with a mutex that governs which party is allowed to do something. I made a mutex "cpu_mutex", which the function NDS_exec() locks as the first thing it does, and unlocks when it's done, and also unlocks it in the waiting loop. Maybe it's pretty coarse-grained control from the GDB stub's point of view, but it seems to work.
template<bool FORCE>
void NDS_exec(s32 nb)
{
#ifndef HOST_WINDOWS
pthread_mutex_lock(&cpu_mutex);
#endif
...
for(;;)
{
//trap the debug-stalled condition
#ifdef DEVELOPER
singleStep = false;
//(gdb stub doesnt yet know how to trigger these immediately by calling reschedule)
if ((NDS_ARM9.stalled || NDS_ARM7.stalled) && execute)
{
driver->EMU_DebugIdleEnter();
while((NDS_ARM9.stalled || NDS_ARM7.stalled) && execute)
{
#ifndef HOST_WINDOWS
pthread_mutex_unlock(&cpu_mutex);
#endif
driver->EMU_DebugIdleUpdate();
#ifndef HOST_WINDOWS
pthread_mutex_lock(&cpu_mutex);
#endif
nds_debug_continuing[0] = nds_debug_continuing[1] = true;
}
driver->EMU_DebugIdleWakeUp();
}
#endif
...
if (cheats)
cheats->process();
#ifndef HOST_WINDOWS
pthread_mutex_unlock(&cpu_mutex);
#endif
}
Then in the GDB stub, I put lock/unlock around the processPacket_gdb() routine, like this
/**
* Returns -1 if there is a socket error.
*/
static int
processPacket_gdb( SOCKET_TYPE sock, const uint8_t *packet,
struct gdb_stub_state *stub) {
// uint8_t remcomOutBuffer[BUFMAX_GDB];
struct debug_out_packet *out_packet = getOutPacket();
uint8_t *out_ptr = out_packet->start_ptr;
int send_reply = 1;
uint32_t send_size = 0;
DEBUG_LOG("Processing packet %c\n", packet[0]);
#ifndef HOST_WINDOWS
pthread_mutex_lock(&cpu_mutex);
#endif
switch( packet[0]) {
case 3:
...
#ifndef HOST_WINDOWS
pthread_mutex_unlock(&cpu_mutex);
#endif
if ( send_reply) {
return putpacket( sock, out_packet, send_size);
}
return 0;
}
There's still some code outside these two routines that might access the CPU structures, but even the code above seemed enough to separate the fighters. Now I'm able to source-step for as long as I want, and DeSmuME keeps running happily. I said 's' and held down Enter for several minutes and it just went on correctly. I'm able to set source breakpoints and everything, and it works. Set breakpoint to P$ALLOCATIONTEST_UPDATESPRITES, Continue and hold down Enter, and the sprites start moving slowly on the emulator screen. Great.
I think I'll check out a fresh SVN trunk and make a minimal fix-pack on top of that.
The mutex doesn't address the busy-loop waiting issue. If that's some sort of an issue... I guess it just wastes some watts, compared to a proper event signaling system.
Btw, like you said, it's easy to break a system like that if you don't understand the principles of how it's supposed to work, the governing rules and roles between components, or what the components and domains really are. Each function, file, variable, struct etc. has rules, and if the design principles aren't clearly written out and explained, it's easy to miss them. It took me quite many hours of tinkering and trial and error to make that mutex duct-tape fix.
Another remark: if I step through DeSmuME's code in the debugger IDE Code::Blocks I'm using now, then occasionally it happens that DeSmuME misses the instruction step breakpoint and continues running forever. A bit like suggested by Zeromus in this post:
http://forums.desmume.org/viewtopic.php … 649#p13649
Anyway, I'm coming to the conclusion that the interaction system between the emulation main loop and the GDB stub threads is broken, because there's no real synchronization. There's just a bunch of flags that both threads read and write, and there are some waiting loops. I think that when it works, on the platforms where it does work, it is only by chance.
I see two alternative fixes:
A) with semaphores or events that have well-defined meanings, like PypeBros did in the thread linked above (if I understood correctly).
B) with a message queue system where the auxiliary threads (GDB stub in ths case) ask the main thread (emulation main thread) to execute small pieces of code for them, while the auxiliary thread (sender of the message) waits until its request has been fulfilled by the main thread. And the GDB stub code must not access or touch the main thread's memory in any way outside the "please execute this piece of code in the main thread" functions.
Now I see that the first fix I already tried (without understanding even what little I do now), protecting the stalled flag with mutexes, didn't address the real problem. As I see it, the problem is that because threads poke at the same set of variables, the state in which e.g. the main thread is, is ambiguous. Inside each thread, the program code is supposed to be set of transitions from one state to another, and now I think the main loop's code regarding e.g. the wait-for-unstall loop is based on some assumptions that aren't reliable. This cannot be fixed by thread-protecting individual variables, because the set of variables is a whole, i.e. the thread state, that's expected to be consistent.
For every piece of code in the GDB stub that reads any of the common memory, there should be a clearly visible explicit explanation for why it is sure and clear that the memory access can be done safely, and that the main thread's state is not messed up. For example, the main thread must be waiting at a known location - and I guess that was the intention with some of those "debug break" calls, but there's no actual synchronization. In gdbstub.cpp / break_execution(), for example, there's a call to NDS_debug_break(), but it doesn't actually wait for the NDS to break? Did miss something, is there an inter-thread sync wait somewhere?
After adding some more log messages, I was even able to run a source-step once. But then the next one jammed. Maybe it's a race condition on the CPU registers and instruct_adr. My earlier analysis log with Valgrind's DRD tool has stuff like the following:
==13042== Conflicting load by thread 3 at 0x082f5428 size 4
==13042== at 0x8053DD4: read_cpu_reg(void*, unsigned int) (armcpu.cpp:97)
==13042== Location 0x82f5428 is 0 bytes inside NDS_ARM9.instruct_adr,
==13042== a global variable declared at armcpu.cpp:46
==13042== Other segment start (thread 1)
==13042== at 0x402CD94: pthread_mutex_unlock (drd_pthread_intercepts.c:667)
==13042== by 0x7B6AED6: _nss_nis_endpwent (nis-pwd.c:137)
==13042== by 0x7B60060: internal_endpwent (compat-pwd.c:316)
==13042== by 0x7B60C13: _nss_compat_getpwnam_r (compat-pwd.c:875)
==13042== by 0x45C6E34: getpwnam_r@@GLIBC_2.1.2 (getXXbyYY_r.c:256)
==13042== by 0x425FE87: ??? (in /lib/i386-linux-gnu/libglib-2.0.so.0.3200.4)
==13042== by 0x444E4956: ???
==13042== Other segment end (thread 1)
==13042== at 0x402C597: pthread_mutex_lock (drd_pthread_intercepts.c:615)
==13042== by 0x812FF64: Task::Impl::finish() (task.cpp:296)
==13042== by 0xFFE43403: ???
==13042==
==13042== Conflicting load by thread 3 at 0x082f5470 size 4
==13042== at 0x8053DC4: read_cpu_reg(void*, unsigned int) (armcpu.cpp:101)
==13042== Location 0x82f5470 is 0 bytes inside NDS_ARM9.CPSR,
==13042== a global variable declared at armcpu.cpp:46
==13042== Other segment start (thread 1)
==13042== at 0x402CD94: pthread_mutex_unlock (drd_pthread_intercepts.c:667)
==13042== by 0x7B6AED6: _nss_nis_endpwent (nis-pwd.c:137)
==13042== by 0x7B60060: internal_endpwent (compat-pwd.c:316)
==13042== by 0x7B60C13: _nss_compat_getpwnam_r (compat-pwd.c:875)
==13042== by 0x45C6E34: getpwnam_r@@GLIBC_2.1.2 (getXXbyYY_r.c:256)
==13042== by 0x425FE87: ??? (in /lib/i386-linux-gnu/libglib-2.0.so.0.3200.4)
==13042== by 0x444E4956: ???
==13042== Other segment end (thread 1)
==13042== at 0x402C597: pthread_mutex_lock (drd_pthread_intercepts.c:615)
==13042== by 0x812FF64: Task::Impl::finish() (task.cpp:296)
==13042== by 0xFFE43403: ???
==13042==
==13042== Conflicting load by thread 3 at 0x082f5430 size 4
==13042== at 0x8053DB4: read_cpu_reg(void*, unsigned int) (armcpu.cpp:94)
==13042== Location 0x82f5430 is 0 bytes inside NDS_ARM9.R[0],
==13042== a global variable declared at armcpu.cpp:46
==13042== Other segment start (thread 1)
==13042== at 0x402CD94: pthread_mutex_unlock (drd_pthread_intercepts.c:667)
==13042== by 0x7B6AED6: _nss_nis_endpwent (nis-pwd.c:137)
==13042== by 0x7B60060: internal_endpwent (compat-pwd.c:316)
==13042== by 0x7B60C13: _nss_compat_getpwnam_r (compat-pwd.c:875)
==13042== by 0x45C6E34: getpwnam_r@@GLIBC_2.1.2 (getXXbyYY_r.c:256)
==13042== by 0x425FE87: ??? (in /lib/i386-linux-gnu/libglib-2.0.so.0.3200.4)
==13042== by 0x444E4956: ???
==13042== Other segment end (thread 1)
==13042== at 0x402C597: pthread_mutex_lock (drd_pthread_intercepts.c:615)
==13042== by 0x812FF64: Task::Impl::finish() (task.cpp:296)
==13042== by 0xFFE43403: ???
Add logging and bug disappears: http://en.wikipedia.org/wiki/Heisenbug
More info. For some reason, on Linux, DesMuME basically tells itself to enter an endless loop waiting for someone to unstall it from the outside, but that someone has no idea about the situation. Now I'm trying to understand how the system is supposed to work as a whole, and what are the assumed unwritten preconditions, postconditions and invariants over all the variables, and what states the emulator and the stub are supposed to have if you think of them as a pair of state machines. Here's a comparison of what happens on the Mac vs. Linux, when giving command 's' in the controlling GDB. On the Mac it works, and is able to run until the next source line, but on Linux it enters an endless waiting loop. All lines prior to these snippets are identical on both systems. Source code version is the same, the GDB "arm-none-eabi-gdb" is taken from the newest DevkitPro package on both systems, printing version name "GNU gdb (GDB) 7.7.1".
Mac:
--- Mac ------------------ NOW GIVE COMMAND 's' IN GDB
Processing packet v
Processing packet H
Processing packet s
Stepping instruction at 02000000
UNSTALL
Step watch: waiting for 02000000 at 02000000
Step hit -> 02000000
STALL
Break from Emulation
Processing packet g
'g' command PC = 02000004
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02000004
UNSTALL
Step watch: waiting for 02000004 at 02000004
Step hit -> 02000004
STALL
Break from Emulation
Processing packet g
'g' command PC = 02000008
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02000008
UNSTALL
Step watch: waiting for 02000008 at 02000008
Step hit -> 02000008
STALL
Break from Emulation
Processing packet g
'g' command PC = 0200000c
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 0200000c
UNSTALL
Step watch: waiting for 0200000c at 0200000c
Step hit -> 0200000c
STALL
Break from Emulation
Processing packet g
'g' command PC = 02000010
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02000010
UNSTALL
Step watch: waiting for 02000010 at 02000010
Step hit -> 02000010
STALL
Break from Emulation
Processing packet g
'g' command PC = 02000014
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02000014
UNSTALL
Step watch: waiting for 02000014 at 02000014
Step hit -> 02000014
STALL
Break from Emulation
Processing packet g
'g' command PC = 02000018
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02000018
UNSTALL
Step watch: waiting for 02000018 at 02000018
Step hit -> 02000018
STALL
Break from Emulation
Processing packet g
'g' command PC = 0200001c
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 0200001c
UNSTALL
Step watch: waiting for 0200001c at 0200001c
Step hit -> 0200001c
STALL
Break from Emulation
Processing packet g
'g' command PC = 02000020
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02000020
UNSTALL
Step watch: waiting for 02000020 at 02000020
Step hit -> 02000020
STALL
Break from Emulation
Processing packet g
'g' command PC = 02000024
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02000024
UNSTALL
Step watch: waiting for 02000024 at 02000024
Step hit -> 02000024
STALL
Break from Emulation
Processing packet g
'g' command PC = 02000028
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02000028
UNSTALL
Step watch: waiting for 02000028 at 02000028
Step hit -> 02000028
STALL
Break from Emulation
Processing packet g
'g' command PC = 0200002c
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 0200002c
UNSTALL
Step watch: waiting for 0200002c at 0200002c
Step hit -> 0200002c
STALL
Break from Emulation
Processing packet g
'g' command PC = 02000030
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02000030
UNSTALL
Step watch: waiting for 02000030 at 02000030
Step hit -> 02000030
STALL
Break from Emulation
Processing packet g
'g' command PC = 02003d68
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
On Linux we notice differences very soon:
--- Linux ------------------------ NOW GIVE COMMAND 's' IN GDB
Processing packet v
Processing packet H
Processing packet s
Stepping instruction at 02000000
UNSTALL
Step watch: waiting for 02000000 at 02000000
Step hit -> 02000000
STALL <--------- Here! After this something goes wrong
Break from Emulation
Processing packet g
'g' command PC = 02000000
Processing packet m
Processing packet s
Stepping instruction at 02000000
UNSTALL
For some reason, the program counter (PC) is still at 0x02000000, even though on the Mac it proceeded to 0x02000004 at that point. If I understand this correctly, DeSmuME is supposed to run instruction-by-instruction, reporting the current PC counter to GDB and let it decide if that PC location corresponds to a source code line or not, and if it is a source code line, let GDB break execution. And that's not happening, because the PC doesn't move anywhere.
By the way, even though the last log line is "UNSTALL", after printing that, DeSmuME internally calls NDS_debug_break(), which stalls both CPUs and after that it's jammed.
Edit: I added lots of debug logging (which isn't shown above yet) and changed the gdbstub.cpp / DEBUG_LOG macro to also always write out ARM9's program counter value like this:
#define DEBUG_LOG( fmt, ...) fprintf(stdout, "R[15]:%x, instruct_adr:%x ", NDS_ARM9.R[15], NDS_ARM9.instruct_adr); fprintf(stdout, fmt, ##__VA_ARGS__)
After that, the behaviour changed a little bit. Now the instruction-stepping proceeds a few times, until PC = 0x0200001c, and then it jams. I guess there's a timing-dependent race condition or something - one of the threads is sometimes able to write some value somewhere before a critical moment and sometimes it is not. At least it happens after the STALL log row I marked with an arrow above.
I'm now trying to find out what's wrong with source-stepping in Linux. On the Mac port it seems to work. I had a suspicion it's a thread-safety issue, and maybe the stalled flag is being written from two places simultaneously or something, so I made a test, I changed all writes of the stalled flag into Stall()/Unstall() function calls, and in the functions I protected the writing of the variable with mutexes...But no change in behaviour, it still jams like before. Then I replaced even all _reads_ of the stalled flag with mutex-protected functions, which shouldn't be needed, but still nothing. After that I made some quick tests running DeSmuME in a debugger (command-line GDB actually, which made the situation feel a bit weird, having GDB debug a GDB stub, controlled by another GDB), to see what the program is actually doing when it jams. Depending on how and when I set my breakpoints (on DeSmuME, not the emulated NDS CPUs), I got different phenomena. If I just let it run to the jam without breaking, then it seems that both emulated ARM CPUs are stalled and it runs in the NDS_exec() / for(;;) loop endlessly, waiting for the CPUs to become unstalled. I'll have to try and figure out the mechanism that's supposed to be triggered here: when and by which piece of code are the CPUs supposed to become unstalled. But I'll continue with this tomorrow.
FWIW, probably not much, I tried running the program in Valgrind's Data Race Detection tool DRD. Here's the command I used to start Valgrind:
valgrind --log-file=./valgrind_log2.txt --error-limit=no --tool=drd --read-var-info=yes desmume-cli --arm9gdb=20000 AllocationTest.nds
After DeSmuME had loaded up and displayed its graphics window, I started GDB in another terminal, with commands "target remote :20000" etc and finally I gave the Step command.
As expeted it's giving tons of error messages (gdb-related excerpt from the log file)
...
==13042== Conflicting store by thread 1 at 0x082f54ec size 4
==13042== at 0x8053D73: stall_cpu(void*) (armcpu.cpp:62)
==13042== by 0x827300F: step_instruction_watch(void*, unsigned int, int) (gdbstub.cpp:214)
==13042== by 0x8055576: unsigned int armcpu_exec<0>() (armcpu.cpp:689)
==13042== by 0x80FFB94: std::pair<int, int> armInnerLoop<true, true, false>(unsigned long long, int, int, int) (NDSSystem.cpp:1725)
==13042== by 0x81073C4: void NDS_exec<false>(int) (NDSSystem.cpp:1878)
==13042== by 0x80526C4: desmume_cycle(ctrls_event_config*) (main.cpp:482)
==13042== by 0x453DE45: (below main) (libc-start.c:244)
==13042== Location 0x82f54ec is 0 bytes inside NDS_ARM9.stalled,
==13042== a global variable declared at armcpu.cpp:46
==13042== Other segment start (thread 3)
==13042== at 0x40309DA: sem_post@* (drd_pthread_intercepts.c:1059)
==13042== by 0x40AE26E: SDL_SemPost (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40644B7: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40ADD3A: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042== Other segment end (thread 3)
==13042== at 0x45F7D01: ??? (syscall-template.S:82)
==13042== Other segment start (thread 3)
==13042== at 0x40309DA: sem_post@* (drd_pthread_intercepts.c:1059)
==13042== by 0x40AE26E: SDL_SemPost (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40644B7: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40ADD3A: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042== Other segment end (thread 3)
==13042== at 0x45F7D01: ??? (syscall-template.S:82)
==13042==
==13042== Conflicting store by thread 1 at 0x082f54f4 size 4
==13042== at 0x8053CF4: remove_post_exec_fn(void*) (armcpu.cpp:86)
==13042== by 0x827301B: step_instruction_watch(void*, unsigned int, int) (gdbstub.cpp:217)
==13042== by 0x8055576: unsigned int armcpu_exec<0>() (armcpu.cpp:689)
==13042== by 0x80FFB94: std::pair<int, int> armInnerLoop<true, true, false>(unsigned long long, int, int, int) (NDSSystem.cpp:1725)
==13042== by 0x81073C4: void NDS_exec<false>(int) (NDSSystem.cpp:1878)
==13042== by 0x80526C4: desmume_cycle(ctrls_event_config*) (main.cpp:482)
==13042== by 0x453DE45: (below main) (libc-start.c:244)
==13042== Location 0x82f54f4 is 0 bytes inside NDS_ARM9.post_ex_fn,
==13042== a global variable declared at armcpu.cpp:46
==13042== Other segment start (thread 3)
==13042== at 0x40309DA: sem_post@* (drd_pthread_intercepts.c:1059)
==13042== by 0x40AE26E: SDL_SemPost (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40644B7: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40ADD3A: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042== Other segment end (thread 3)
==13042== at 0x45F7D01: ??? (syscall-template.S:82)
==13042==
==13042== Conflicting store by thread 1 at 0x04ffc8ec size 4
==13042== at 0x8273022: step_instruction_watch(void*, unsigned int, int) (gdbstub.cpp:220)
==13042== by 0x8055576: unsigned int armcpu_exec<0>() (armcpu.cpp:689)
==13042== by 0x80FFB94: std::pair<int, int> armInnerLoop<true, true, false>(unsigned long long, int, int, int) (NDSSystem.cpp:1725)
==13042== by 0x81073C4: void NDS_exec<false>(int) (NDSSystem.cpp:1878)
==13042== by 0x80526C4: desmume_cycle(ctrls_event_config*) (main.cpp:482)
==13042== by 0x453DE45: (below main) (libc-start.c:244)
==13042== Address 0x4ffc8ec is at offset 2196 from 0x4ffc058. Allocation context:
==13042== at 0x402A2B8: malloc (vg_replace_malloc.c:263)
==13042== by 0x8274B17: createStub_gdb(unsigned short, armcpu_t*, armcpu_memory_iface const*) (gdbstub.cpp:1456)
==13042== by 0x804EFDC: main (main.cpp:605)
==13042== Other segment start (thread 3)
==13042== at 0x40309DA: sem_post@* (drd_pthread_intercepts.c:1059)
==13042== by 0x40AE26E: SDL_SemPost (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40644B7: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40ADD3A: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042== Other segment end (thread 3)
==13042== at 0x45F7D01: ??? (syscall-template.S:82)
==13042==
==13042== Conflicting store by thread 1 at 0x082f560c size 4
==13042== at 0x8103577: NDS_debug_break() (NDSSystem.cpp:1781)
==13042== by 0x8055576: unsigned int armcpu_exec<0>() (armcpu.cpp:689)
==13042== by 0x80FFB94: std::pair<int, int> armInnerLoop<true, true, false>(unsigned long long, int, int, int) (NDSSystem.cpp:1725)
==13042== by 0x81073C4: void NDS_exec<false>(int) (NDSSystem.cpp:1878)
==13042== by 0x80526C4: desmume_cycle(ctrls_event_config*) (main.cpp:482)
==13042== by 0x453DE45: (below main) (libc-start.c:244)
==13042== Location 0x82f560c is 0 bytes inside NDS_ARM7.stalled,
==13042== a global variable declared at armcpu.cpp:45
==13042== Other segment start (thread 3)
==13042== at 0x40309DA: sem_post@* (drd_pthread_intercepts.c:1059)
==13042== by 0x40AE26E: SDL_SemPost (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40644B7: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40ADD3A: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042== Other segment end (thread 3)
==13042== at 0x45F7D01: ??? (syscall-template.S:82)
==13042==
==13042== Conflicting store by thread 1 at 0x082f54ec size 4
==13042== at 0x8103581: NDS_debug_break() (NDSSystem.cpp:1781)
==13042== by 0x8055576: unsigned int armcpu_exec<0>() (armcpu.cpp:689)
==13042== by 0x80FFB94: std::pair<int, int> armInnerLoop<true, true, false>(unsigned long long, int, int, int) (NDSSystem.cpp:1725)
==13042== by 0x81073C4: void NDS_exec<false>(int) (NDSSystem.cpp:1878)
==13042== by 0x80526C4: desmume_cycle(ctrls_event_config*) (main.cpp:482)
==13042== by 0x453DE45: (below main) (libc-start.c:244)
==13042== Location 0x82f54ec is 0 bytes inside NDS_ARM9.stalled,
==13042== a global variable declared at armcpu.cpp:46
==13042== Other segment start (thread 3)
==13042== at 0x40309DA: sem_post@* (drd_pthread_intercepts.c:1059)
==13042== by 0x40AE26E: SDL_SemPost (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40644B7: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40ADD3A: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042== Other segment end (thread 3)
==13042== at 0x45F7D01: ??? (syscall-template.S:82)
==13042== Other segment start (thread 3)
==13042== at 0x40309DA: sem_post@* (drd_pthread_intercepts.c:1059)
==13042== by 0x40AE26E: SDL_SemPost (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40644B7: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40ADD3A: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042== Other segment end (thread 3)
==13042== at 0x45F7D01: ??? (syscall-template.S:82)
==13042==
==13042== Conflicting store by thread 1 at 0x082f5428 size 4
==13042== at 0x80555A3: unsigned int armcpu_exec<0>() (armcpu.cpp:416)
==13042== by 0x80FFB94: std::pair<int, int> armInnerLoop<true, true, false>(unsigned long long, int, int, int) (NDSSystem.cpp:1725)
==13042== by 0x81073C4: void NDS_exec<false>(int) (NDSSystem.cpp:1878)
==13042== by 0x80526C4: desmume_cycle(ctrls_event_config*) (main.cpp:482)
==13042== by 0x453DE45: (below main) (libc-start.c:244)
==13042== Location 0x82f5428 is 0 bytes inside NDS_ARM9.instruct_adr,
==13042== a global variable declared at armcpu.cpp:46
==13042== Other segment start (thread 3)
==13042== at 0x40309DA: sem_post@* (drd_pthread_intercepts.c:1059)
==13042== by 0x40AE26E: SDL_SemPost (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40644B7: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40ADD3A: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042== Other segment end (thread 3)
==13042== at 0x45F7D01: ??? (syscall-template.S:82)
==13042== Other segment start (thread 3)
==13042== at 0x40309DA: sem_post@* (drd_pthread_intercepts.c:1059)
==13042== by 0x40AE26E: SDL_SemPost (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40644B7: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40ADD3A: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042== Other segment end (thread 3)
==13042== at 0x45F7D01: ??? (syscall-template.S:82)
==13042==
But it's really hard to say what if anything any of that actually means. If a multi-threaded application is 100% perfect, then you don't get any errors?
I don't know, I haven't used the tool that much. Many of the detected errors don't seem to have necessarily anything to do with the GDB stub functionality, like this from the very beginning of the log:
==13042== Conflicting load by thread 1 at 0x04abcb48 size 4
==13042== at 0x4A872B5: pa_once_begin (in /usr/lib/i386-linux-gnu/pulseaudio/libpulsecommon-2.0.so)
==13042== by 0x4A87495: pa_run_once (in /usr/lib/i386-linux-gnu/pulseaudio/libpulsecommon-2.0.so)
==13042== by 0x4A9E115: pa_thread_self (in /usr/lib/i386-linux-gnu/pulseaudio/libpulsecommon-2.0.so)
==13042== by 0x46C9CB3: pa_threaded_mainloop_lock (in /usr/lib/i386-linux-gnu/libpulse.so.0.14.2)
==13042== by 0x4045F89: pulse_connect (in /usr/lib/i386-linux-gnu/alsa-lib/libasound_module_pcm_pulse.so)
==13042== by 0x40456EE: _snd_pcm_pulse_open (in /usr/lib/i386-linux-gnu/alsa-lib/libasound_module_pcm_pulse.so)
==13042== by 0x4138707: ??? (in /usr/lib/i386-linux-gnu/libasound.so.2.0.0)
==13042== by 0x4138D3D: ??? (in /usr/lib/i386-linux-gnu/libasound.so.2.0.0)
==13042== by 0x825598A: Mic_Init() (mic_alsa.cpp:48)
==13042== by 0x80F4EB4: MMU_Init() (MMU.cpp:932)
==13042== by 0x453DE45: (below main) (libc-start.c:244)
==13042== Allocation context: BSS section of /usr/lib/i386-linux-gnu/pulseaudio/libpulsecommon-2.0.so
==13042== Other segment start (thread 2)
==13042== at 0x402C597: pthread_mutex_lock (drd_pthread_intercepts.c:615)
==13042== by 0x4A9CF5E: pa_mutex_lock (in /usr/lib/i386-linux-gnu/pulseaudio/libpulsecommon-2.0.so)
==13042== by 0x4A872F1: pa_once_begin (in /usr/lib/i386-linux-gnu/pulseaudio/libpulsecommon-2.0.so)
==13042== by 0x4A87495: pa_run_once (in /usr/lib/i386-linux-gnu/pulseaudio/libpulsecommon-2.0.so)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042== Other segment end (thread 2)
==13042== at 0x402CD94: pthread_mutex_unlock (drd_pthread_intercepts.c:667)
==13042== by 0x4A9D12E: pa_mutex_unlock (in /usr/lib/i386-linux-gnu/pulseaudio/libpulsecommon-2.0.so)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042==
==13042== Conflicting load by thread 1 at 0x04abcb4c size 4
==13042== at 0x4A9E116: pa_thread_self (in /usr/lib/i386-linux-gnu/pulseaudio/libpulsecommon-2.0.so)
==13042== by 0x46C9CB3: pa_threaded_mainloop_lock (in /usr/lib/i386-linux-gnu/libpulse.so.0.14.2)
==13042== by 0x4045F89: pulse_connect (in /usr/lib/i386-linux-gnu/alsa-lib/libasound_module_pcm_pulse.so)
==13042== by 0x40456EE: _snd_pcm_pulse_open (in /usr/lib/i386-linux-gnu/alsa-lib/libasound_module_pcm_pulse.so)
==13042== by 0x4138707: ??? (in /usr/lib/i386-linux-gnu/libasound.so.2.0.0)
==13042== by 0x4138D3D: ??? (in /usr/lib/i386-linux-gnu/libasound.so.2.0.0)
==13042== by 0x825598A: Mic_Init() (mic_alsa.cpp:48)
==13042== by 0x80F4EB4: MMU_Init() (MMU.cpp:932)
==13042== by 0x453DE45: (below main) (libc-start.c:244)
==13042== Allocation context: BSS section of /usr/lib/i386-linux-gnu/pulseaudio/libpulsecommon-2.0.so
==13042== Other segment start (thread 2)
==13042== at 0x402C597: pthread_mutex_lock (drd_pthread_intercepts.c:615)
==13042== by 0x4A9CF5E: pa_mutex_lock (in /usr/lib/i386-linux-gnu/pulseaudio/libpulsecommon-2.0.so)
==13042== by 0x4A872F1: pa_once_begin (in /usr/lib/i386-linux-gnu/pulseaudio/libpulsecommon-2.0.so)
==13042== by 0x4A87495: pa_run_once (in /usr/lib/i386-linux-gnu/pulseaudio/libpulsecommon-2.0.so)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042== Other segment end (thread 2)
==13042== at 0x402CD94: pthread_mutex_unlock (drd_pthread_intercepts.c:667)
==13042== by 0x4A9D12E: pa_mutex_unlock (in /usr/lib/i386-linux-gnu/pulseaudio/libpulsecommon-2.0.so)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
...bug in ALSA or PulseAudio or whatever, or in the way they are used in DeSmuME? Or just a false alert, DRD being trigger happy?
More errors that come after the GDB excerpt:
==13042== Conflicting store by thread 1 at 0x082f5434 size 4
==13042== at 0x805B503: unsigned int OP_SUB_IMM_VAL<0>(unsigned int) (arm_instructions.cpp:620)
==13042== by 0x805554E: unsigned int armcpu_exec<0>() (armcpu.cpp:682)
==13042== by 0x80FFB94: std::pair<int, int> armInnerLoop<true, true, false>(unsigned long long, int, int, int) (NDSSystem.cpp:1725)
==13042== by 0x81073C4: void NDS_exec<false>(int) (NDSSystem.cpp:1878)
==13042== by 0x80526C4: desmume_cycle(ctrls_event_config*) (main.cpp:482)
==13042== by 0x453DE45: (below main) (libc-start.c:244)
==13042== Location 0x82f5434 is 0 bytes inside NDS_ARM9.R[1],
==13042== a global variable declared at armcpu.cpp:46
==13042== Other segment start (thread 3)
==13042== at 0x40309DA: sem_post@* (drd_pthread_intercepts.c:1059)
==13042== by 0x40AE26E: SDL_SemPost (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40644B7: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40ADD3A: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042== Other segment end (thread 3)
==13042== at 0x45F7D01: ??? (syscall-template.S:82)
==13042== Other segment start (thread 3)
==13042== at 0x40309DA: sem_post@* (drd_pthread_intercepts.c:1059)
==13042== by 0x40AE26E: SDL_SemPost (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40644B7: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40ADD3A: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042== Other segment end (thread 3)
==13042== at 0x45F7D01: ??? (syscall-template.S:82)
==13042==
==13042== Conflicting store by thread 1 at 0x082f5464 size 4
==13042== at 0x805AAA3: unsigned int OP_MOV_LSL_IMM<0>(unsigned int) (arm_instructions.cpp:1911)
==13042== by 0x805554E: unsigned int armcpu_exec<0>() (armcpu.cpp:682)
==13042== by 0x80FFB94: std::pair<int, int> armInnerLoop<true, true, false>(unsigned long long, int, int, int) (NDSSystem.cpp:1725)
==13042== by 0x81073C4: void NDS_exec<false>(int) (NDSSystem.cpp:1878)
==13042== by 0x80526C4: desmume_cycle(ctrls_event_config*) (main.cpp:482)
==13042== by 0x453DE45: (below main) (libc-start.c:244)
==13042== Location 0x82f5464 is 0 bytes inside NDS_ARM9.R[13],
==13042== a global variable declared at armcpu.cpp:46
==13042== Other segment start (thread 3)
==13042== at 0x40309DA: sem_post@* (drd_pthread_intercepts.c:1059)
==13042== by 0x40AE26E: SDL_SemPost (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40644B7: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40ADD3A: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042== Other segment end (thread 3)
==13042== at 0x45F7D01: ??? (syscall-template.S:82)
==13042== Other segment start (thread 3)
==13042== at 0x40309DA: sem_post@* (drd_pthread_intercepts.c:1059)
==13042== by 0x40AE26E: SDL_SemPost (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40644B7: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40ADD3A: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042== Other segment end (thread 3)
==13042== at 0x45F7D01: ??? (syscall-template.S:82)
==13042==
==13042== Conflicting store by thread 1 at 0x082f5464 size 4
==13042== at 0x80546A3: armcpu_switchMode(armcpu_t*, unsigned char) (armcpu.cpp:330)
==13042== by 0x8071F16: unsigned int OP_MSR_CPSR<0>(unsigned int) (arm_instructions.cpp:3047)
==13042== by 0x805554E: unsigned int armcpu_exec<0>() (armcpu.cpp:682)
==13042== by 0x80FFB94: std::pair<int, int> armInnerLoop<true, true, false>(unsigned long long, int, int, int) (NDSSystem.cpp:1725)
==13042== by 0x81073C4: void NDS_exec<false>(int) (NDSSystem.cpp:1878)
==13042== by 0x80526C4: desmume_cycle(ctrls_event_config*) (main.cpp:482)
==13042== by 0x453DE45: (below main) (libc-start.c:244)
==13042== Location 0x82f5464 is 0 bytes inside NDS_ARM9.R[13],
==13042== a global variable declared at armcpu.cpp:46
==13042== Other segment start (thread 3)
==13042== at 0x40309DA: sem_post@* (drd_pthread_intercepts.c:1059)
==13042== by 0x40AE26E: SDL_SemPost (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40644B7: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40ADD3A: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042== Other segment end (thread 3)
==13042== at 0x45F7D01: ??? (syscall-template.S:82)
==13042== Other segment start (thread 3)
==13042== at 0x40309DA: sem_post@* (drd_pthread_intercepts.c:1059)
==13042== by 0x40AE26E: SDL_SemPost (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40644B7: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x40ADD3A: ??? (in /usr/lib/i386-linux-gnu/libSDL-1.2.so.0.11.4)
==13042== by 0x4513C38: start_thread (pthread_create.c:304)
==13042== by 0x45FE9FD: clone (clone.S:130)
==13042== Other segment end (thread 3)
==13042== at 0x45F7D01: ??? (syscall-template.S:82)
DeSmuME didn't enter the jam state under Valgrind, maybe because things are just happening so slowly in its simulated CPU, and the other side of the equation, the GDB client program, was running normally in the real CPU. In the time I waited, the AllocationTest program inside DeSmuME didn't even get far from the initial address. These are the last lines of DeSmuME's console output before I pressed Ctrl-C
...
Break from Emulation
Processing packet g
'g' command PC = 02000020
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02000020
UNSTALL
Step watch: waiting for 02000020 at 02000020
Step hit -> 02000020
STALL
Break from Emulation
Processing packet g
'g' command PC = 02000024
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02000024
UNSTALL
Step watch: waiting for 02000024 at 02000024
Step hit -> 02000024
STALL
Break from Emulation
Processing packet g
'g' command PC = 02000028
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02000028
UNSTALL
Step watch: waiting for 02000028 at 02000028
Step hit -> 02000028
STALL
Break from Emulation
Processing packet g
'g' command PC = 0200002c
Processing packet m
Processing packet m
...
I also tried making with optimization level -O1 in CFLAGS, but it didn't make any difference, source-level stepping still jams the emulator. On the Mac it works and is very fast, you can just type S and hold down Enter, it will print new source lines at the keyrepeat rate.
Now it compiles from SVN trunk, and breakpoints work ... sort of, but e.g. source-level stepping puts it in a jam straight away, just like repeating the "si" command long enough. Missing mutex or something?
I checked that it's really the newest version
...
DeSmuME 0.9.11 svn5067 dev+ x86-JIT
Created GDB stub on port 20000
STALL
SoftRast Initialized with cores=2
...
CPU mode: Interpreter
Processing packet q
Processing packet H
Processing packet q
Processing packet ?
Processing packet H
Processing packet q
Processing packet q
Processing packet g
'g' command PC = 02000000
...
My build procedure is like this
svn checkout svn://svn.code.sf.net/p/desmume/code/trunk/desmume desmume-svn
cd desmume-svn
./autogen.sh
CFLAGS='-O2 -march=native' CXXFLAGS=$CFLAGS ./configure --enable-gdb-stub ; make ; sudo make install
Anyway, I'll try to find out what's happening. I learned my way around the source code pretty well already when trying to understand why breakpoints and source-line stepping don't work on the Mac port.
Thanks now the Mac dev++ version seems to work with breakpoints and all straight from svn trunk. Your most essential fix seems to be this bit in activateStub_gdb():
armcpu_t *theCPU = (armcpu_t *)stub->arm_cpu_object;
theCPU->SetCurrentMemoryInterface(&stub->gdb_memio);
The old code didn't make any changes to the actual CPU objects' function table pointers, so as far as I can see, it cannot have worked in any port, maybe for a long time.
However, now the Linux port seems to be a little bit broken, and it doesn't even compile. gtk/desmume.h still has the old definition of desmume_init(), which you had changed in r5061.
We must be looking at different code, and/or different definitions of the language syntax.
Any way I look at it, ARMPROC.mem_if (which is a pointer) has not been setup by the line you quote. The quoted line only fills in the stub's own sub-struct. A pointer into the sub-struct is passed to the local variable like you say, but ARMPROC.mem_if still points to the original memory interface, e.g. arm9_base_memory_iface. Only the local variable is changed, and the local variable is not passed anywhere, certainly not NDS_Init(), which is called _before_ calling createStub_gdb(), and doesn't even take parameters.
edit: I didn't see Rogerman's post, sorry.
Anyway, great that it's sorted out. I thought, I'm going crazy or how the heck can this work on Windows or anywhere. So nevermind, maybe we actually were looking at different code.
Yes, the system has clearly been modified in several ways, and the current svn trunk version seems to contain both an old and a new architecture, and the new one isn't completely done - if I understand it correctly.
The "old" way - if I understand it correctly - was to use the GDB stub as a replacement memory interface, so that it calls the original one at each prefetchX call.
static uint32_t FASTCALL gdb_prefetch32( void *data, uint32_t adr) {
struct gdb_stub_state *stub = (struct gdb_stub_state *)data;
int breakpoint;
breakpoint = check_breaks_gdb( stub, stub->instr_breakpoints, adr, 4,
STOP_BREAKPOINT);
//return stub->real_cpu_memio->prefetch32( stub->real_cpu_memio->data, adr);
return 0;
}
It once used to call the original routine real_cpu_memio->prefetch32? At least now it just returns 0, and the call to the saved original prefetch32 is commented out.
I tried to follow the code, and find out where the pointer to gdb_prefetch32() would end up being called, and I couldn't see that happening. The function createStub_gdb() tries to write its own interface as an "out" parameter to **cpu_memio
*cpu_memio = &stub->cpu_memio;
...but look at where that assignment is actually going - practically nowhere that would matter. activateStub_gdb() that's run after creating the stub only seems to manipulate the control interface, not memory interface.
I thought so too. I took a look at the Windows port's main.cpp, and the code around createStub_gdb() looks pretty much the same there. I did triple-check that GDB_STUB was globally defined. Maybe someone who knows more about the code, and can test the breakpoints as well, can find out what if anything might be wrong.
Zeromus: I didn't look at the Windows port, but the Mac port's call to createStub_gdb() is in a file called cocoa_core.mm, and I have a suspicion that Windows folks have something different.
Ok, I managed to get breakpoints and source line stepping to work with some ugly patchwork. In this example I set a breakpoint, continue, and the breakpoint is triggered, and then I can just Continue, hold down Enter, and eventually the sprite boxes are updated in the DeSmuME window.
(gdb) break P$ALLOCATIONTEST_UPDATESPRITES
Breakpoint 1 at 0x201559c
(gdb) c
Continuing.
Breakpoint 1, 0x0201559c in P$ALLOCATIONTEST_UPDATESPRITES ()
1: x/i $pc
=> 0x201559c <P$ALLOCATIONTEST_UPDATESPRITES+16>: ldr r0, [pc, #304] ; 0x20156d4 <P$ALLOCATIONTEST_UPDATESPRITES+328>
(gdb) c
Continuing.
...
I got it to work with a chewing gum hack, calling the GDB stub functions from MMU.cpp for the prefetchX functions directly, exposing global functions and variables from modules. I don't try to guess what someone had in mind when making the MMU etc. thing, but at least this kludge seems to give me working breakpoints and source line stepping. I didn't try to understand the design principles behind all this code, and what sort of architecture someone had in mind, what was the old way, what is the new way, and so on.
In gdbstub.cpp I removed the "static" declarations from gdb_prefetchX so the linker can connect the things. Then in MMU.cpp, I added calls from armN_prefetchX I added calls to the gdb_prefetchX functions, giving the stub pointers in the data parameter.
////////////////////////////////////////////////////////////
//function pointer handlers for gdb stub stuff
void *arm7_gdb_stub = NULL;
void *arm9_gdb_stub = NULL;
extern uint16_t FASTCALL gdb_prefetch16( void *data, uint32_t adr);
extern uint32_t FASTCALL gdb_prefetch32( void *data, uint32_t adr);
static u16 FASTCALL arm9_prefetch16( void *data, u32 adr) {
if (arm9_gdb_stub)
gdb_prefetch16(arm9_gdb_stub, adr);
return _MMU_read16<ARMCPU_ARM9,MMU_AT_CODE>(adr);
}
static u32 FASTCALL arm9_prefetch32( void *data, u32 adr) {
if (arm9_gdb_stub)
gdb_prefetch32(arm9_gdb_stub, adr);
return _MMU_read32<ARMCPU_ARM9,MMU_AT_CODE>(adr);
}
...
static u16 FASTCALL arm7_prefetch16( void *data, u32 adr) {
if (arm7_gdb_stub)
gdb_prefetch16(arm7_gdb_stub, adr);
return _MMU_read16<ARMCPU_ARM7,MMU_AT_CODE>(adr);
}
static u32 FASTCALL arm7_prefetch32( void *data, u32 adr) {
if (arm7_gdb_stub)
gdb_prefetch32(arm7_gdb_stub, adr);
return _MMU_read32<ARMCPU_ARM7,MMU_AT_CODE>(adr);
}
...
In cocoa_core.h, I introduce the gdbstub.cpp variables so I can write to them
extern void *arm7_gdb_stub;
extern void *arm9_gdb_stub;
In cocoa_core.mm, I write the created stub handles to the void pointers like this.
arm9_gdb_stub = (void *)gdbStubHandleARM9;
...
arm7_gdb_stub = (void *)gdbStubHandleARM7;
...
This is of course a very ugly spaghetti style way to pass things from one module to another. Someone who knows the actual design principles and transition plans can do it properly. I didn't even test the kludge with the ARM7 CPU at all, just with ARM9.
But now I can finally proceed trying to get the Lazarus IDE patched up with the emulator and arm-none-eabi-gdb etc.
What does createStub_gdb() actually try to accomplish with this line of code
*cpu_memio = &stub->cpu_memio;
You might think that it's redirecting/hooking some interface pointer to its own implementation, so that it can act as a middle-man between the caller and the original implementation, passing through the calls after doing something extra, and that's why it's also saving the original interface pointer. But it's not, because it's actually only changing a local temporary variable in cocoa_core.mm / setIsGdbStubStarted(), right? Writing NULL instead if &stub->cpu_memio works just the same.
MMU.cpp has a block of code titled like this:
////////////////////////////////////////////////////////////
//function pointer handlers for gdb stub stuff
But tracing the program's execution into the _MMU_readX functions I just can't understand how any of that stuff could currently result in checking the breakpoint lists.
Thanks for your trouble. I'm getting some progress now.
I was able to compile the Cocoa dev+ version from the svn trunk straight away, and get the GDB stub functionality working, after I realized I had the dynamic recompiler emulation switched on, and so the relevant version of armcpu_exec() was never called, and the post-exec function pointers that the GDB commands were setting had no effect.
I can now keep repeating the "step instruction" GDB command for a very long time, without getting a deadlock situation like what happens with the Linux version. Though to me the whole thing feels suspicious - several threads seem to be reading and manipulating the same variables without any proper synchronization mechanism, and without even declaring the variables as volatile. And setting the stalled flag is in many places coupled with seemingly redundant calls to functions that also set the same flag.
However, I still can't get breakpoints to do anything. It seems that the gdbstub.cpp / check_breaks_gdb() function is never actually called. I guess it _should_ be called because that's the place in the code that's reading the breakpoint lists created by the GDB command handling. And it seems that the only place that would call check_breaks_gdb() for the instruction breakpoint list are the gdb_prefetch32() and gdb_prefetch16() functions in gdbstub.cpp.
Is there some kind of half-done code structure change? Should MMU.h / CheckMemoryDebugEvent() call the GDB stuff, or should one of the _MMU_read32 calls actually go to the GDB prefetch functions, but some function pointer somewhere hasn't been set even though it should have?
In armcpu.cpp / armcpu_prefetch(), there are blocks of #ifdef GDB_STUB code that have been commented out, but no explanation why, and should the _MMU_read32() calls now handle both GDB and non-GDB cases.
In MMU.h, there's this comment:
// Use this macros for reading/writing, so the GDB stub isn't broken
Has that got something to do with something?
By the way, a proper CLI version would also be nice. My final goal is to be able to integrate starting the emulator from within an IDE environment, just by clicking "run". But for now I'll still try and understand what would be the correct way to get the breakpoints to break. The same bug apparently applies to the Linux version as well. Anyway, my Mac mini's i7 processor can actually run the emulation at full speed, unlike the Linux machine's poor old 1.2 GHz Celeron.
I wonder why both of these routines in armcpu.cpp print "UNSTALL"? I think it would feel kind of more logical if stall_cpu() printed "STALL".
static void
stall_cpu( void *instance) {
armcpu_t *armcpu = (armcpu_t *)instance;
printf("UNSTALL\n");
armcpu->stalled = 1;
}
static void
unstall_cpu( void *instance) {
armcpu_t *armcpu = (armcpu_t *)instance;
printf("UNSTALL\n");
armcpu->stalled = 0;
}
I'll try looking into the whole GDB debugging protocol. Maybe the things talked about in this other thread still exist
http://forums.desmume.org/viewtopic.php?id=6106
Without knowing pretty much anything about GDB or DeSmuME, this randomly occurring locking feels like a potential concurrency issue. Something that might require using a mutual exclusion mechanism.
Ok. I made a post about it in the technical area, hoping that some GDB stub user happens to notice it.
http://forums.desmume.org/viewtopic.php?pid=23827
I have been trying out DeSmuME, compiling the newest 0.9.11 trunk version as well as the 0.9.10 version from sources. I have devkitPro's devkitARM tools and all the libnds stuff. (devkitARM_r43-i686-linux, libnds-1.5.9, default_arm7-0.5.26, libfat-nds-1.0.13, dswifi-0.3.16, maxmod-nds-1.0.9, libfilesystem-0.9.11, nds-examples-2014-04-01) I compiled DeSmuME like this, just in case it makes a difference
CFLAGS='-O2 -march=native' CXXFLAGS=$CFLAGS ./configure --enable-gdb-stub ; make ; sudo make install
I'm able to compile and run the libnds examples just fine, but DeSmuME's GDB stub functionality seems to be a bit less sturdy than I'd like. For example, using /examples/Graphics/Sprites/allocation_test, I do this
In a shell, let's call it Shell 1, I start desmume-cli
$ desmume-cli --arm9gdb=20000 allocation_test.nds
Failed to set format: Invalid argument
Microphone init failed.
DeSmuME 0.9.10 svn0 dev+ x86-JIT
Created stub on port 20000
SoftRast Initialized with cores=2
ROM game code: ####
ROM developer: Homebrew
Slot1 auto-selected device type: Retail MC+ROM
Slot2 auto-selected device type: PassME (0x08)
CPU mode: Interpreter
UNSTALL
Desmume SDL window opens, and then in another shell, let's call it Shell 2, I start the GDB from the devkitARM toolset, and tell it to connect to the DeSmuME GDB stub on port 20000
$ arm-none-eabi-gdb allocation_test.elf
GNU gdb (GDB) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=i686-pc-linux-gnu --target=arm-none-eabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from allocation_test.elf...done.
(gdb) target remote :20000
Remote debugging using :20000
0x02000000 in _start ()
After giving the command "target remote :20000", the following lines are printed in Shell 1:
Processing packet q
Processing packet H
Processing packet q
Processing packet ?
Processing packet H
Processing packet q
Processing packet q
Processing packet q
Processing packet g
'g' command PC = 02000000
Processing packet m
Processing packet q
Then in Shell 2, I set GDB to automatically show the instruction at the current PC position, and I start stepping instructions with the SI command. I only give the SI command once and after that I repeat it by just pressing Return repeatedly.
(gdb) display/i $pc
1: x/i $pc
=> 0x2000000 <_start>: mov r0, #67108864 ; 0x4000000
(gdb) si
0x02000000 in _start ()
1: x/i $pc
=> 0x2000000 <_start>: mov r0, #67108864 ; 0x4000000
(gdb)
0x02000004 in _start ()
1: x/i $pc
=> 0x2000004 <_start+4>: str r0, [r0, #520] ; 0x208
(gdb)
0x02000008 in _start ()
1: x/i $pc
=> 0x2000008 <_start+8>: mov r0, #19
(gdb)
0x0200000c in _start ()
1: x/i $pc
=> 0x200000c <_start+12>: msr CPSR_fc, r0
(gdb)
0x02000010 in _start ()
1: x/i $pc
=> 0x2000010 <_start+16>: mov r1, #50331648 ; 0x3000000
(gdb)
0x02000014 in _start ()
1: x/i $pc
=> 0x2000014 <_start+20>: sub r1, r1, #4096 ; 0x1000
(gdb)
0x02000018 in _start ()
1: x/i $pc
=> 0x2000018 <_start+24>: mov sp, r1
This results in the following output in Shell 1
Processing packet m
Processing packet v
Processing packet H
Processing packet s
Stepping instruction at 02000000
UNSTALL
Step watch: waiting for 02000000 at 02000000
Step hit -> 02000000
UNSTALL
Break from Emulation
Processing packet g
'g' command PC = 02000000
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02000004
UNSTALL
Step watch: waiting for 02000004 at 02000004
Step hit -> 02000004
UNSTALL
Break from Emulation
Processing packet g
'g' command PC = 02000004
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02000008
UNSTALL
Step watch: waiting for 02000008 at 02000008
Step hit -> 02000008
UNSTALL
Break from Emulation
Processing packet g
'g' command PC = 02000008
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 0200000c
UNSTALL
Step watch: waiting for 0200000c at 0200000c
Step hit -> 0200000c
UNSTALL
Break from Emulation
Processing packet g
'g' command PC = 0200000c
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02000010
UNSTALL
Step watch: waiting for 02000010 at 02000010
Step hit -> 02000010
UNSTALL
Break from Emulation
Processing packet g
'g' command PC = 02000010
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02000014
UNSTALL
Step watch: waiting for 02000014 at 02000014
Step hit -> 02000014
UNSTALL
Break from Emulation
Processing packet g
'g' command PC = 02000014
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02000018
UNSTALL
Step watch: waiting for 02000018 at 02000018
Step hit -> 02000018
UNSTALL
Break from Emulation
Processing packet g
'g' command PC = 02000018
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
Processing packet m
...and so on. So far so good. BUT the problem comes when I just press and hold Return in Shell 2. I won't copy-paste the output here, but eventually, when I keep stepping long enough, DeSmuME goes to some kind of a deadlock state from which it doesn't recover. How long it takes for this to happen, is different every time, but in this example, the situation is like this in Shell 2. I finally press Ctrl-C.
0x02014310 in build_argv ()
1: x/i $pc
=> 0x2014310 <build_argv+44>: beq.n 0x2014318 <build_argv+52>
(gdb)
^C
Shell 1 does show a text "Breaking execution", so it has received the signal resulting from my pressing Ctrl-C, but aside from that, there's no sign of life anymore.
Processing packet m
Processing packet m
Processing packet s
Stepping instruction at 02014310
UNSTALL
Breaking execution
In Shell 2, pressing Ctrl-C again makes GDB ask if we want to give up waiting.
^C
^CInterrupted while waiting for the program.
Give up (and stop debugging it)? (y or n) y
Quit
(gdb)
If I now try to get GDB connected to port 20000 again, it will complain about packet errors
(gdb) target remote :20000
Remote debugging using :20000
Ignoring packet error, continuing...
warning: unrecognized item "timeout" in "qSupported" response
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Bogus trace status reply from target: timeout
(gdb) target remote :20000
Remote debugging using :20000
Ignoring packet error, continuing...
warning: unrecognized item "timeout" in "qSupported" response
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Bogus trace status reply from target: timeout
(gdb)
This reconnection attempt generates the following output in Shell 1
Processing packet q
Processing packet q
Processing packet H
Processing packet H
Processing packet q
Processing packet q
Processing packet q
Processing packet q
Processing packet H
Processing packet H
Processing packet q
Processing packet q
I haven't found a way to get DeSmuME back on track again. I can break away from the situation by suspending desmume-cli by pressing Ctrl-Z, and killing it with "killall -9 desmume-cli".
My question is, is this how it works for everybody, and am I doing something wrong?
I won't go to source-level stepping with GDB's S command at all, because it doesn't work even as well as instruction stepping. I also cannot get breakpoints to trigger at all. I can set breakpoints at functions, and e.g. tab-completion for function names works (which has nothing to do with DeSmuME as such), but the program execution just passes through the breakpoints.
I'm not saying this has to be a bug in DeSmuME at all. Maybe GDB is sending it commands too fast? I don't know.
Update. I dug into FPC source code, added debug printouts and compared it to how devkitARM's g++ linker phase does it, and guess what... the ".nef", or "not-executable-file" file that it's producing _is_ actually en ELF file. devkitARM's gdb is able to open it and load the symbols. Doh! Now it's supposed to work like this:
I have a "hello world" Pascal program like this
program HelloWorld;
{$mode objfpc}
uses
nds9;
procedure MyProcedure;
begin
printf('Hello World');
end;
procedure Jam;
begin
while true do ;
end;
begin
consoleDemoInit();
MyProcedure;
Jam;
end.
I compile the program and load the resulting ELF file "hello.nef" into GDB like this
ppcrossarm -g hello.pp
arm-none-eabi-gdb hello.nef
Then I give these commands in GDB (if you wonder why MyProcedure is written in ALL CAPS, that's because of FPC's name mangling - maybe there's a way to get rid of that style)
...
(gdb) break MYPROCEDURE
Breakpoint 1 at 0x2013c88
(gdb) target remote :20000
After that, GDB is waiting for a GDB server to show up at port 20000. Then I launch DeSmuME with a command like this:
desmume-cli --arm9gdb=20000 hello.nds
When DeSmuME is up and running, GDB will go on and let me give more commands. Or at least that's the theory. Right now I'm trying to find out if the breakpoint is actually being triggered like I hoped.
EDIT: I am able to set breakpoints, and DeSmuMe recognizes the commands, but it doesn't seem to actually break the execution. I tried this with the graphics/Sprites/allocation_test program, and the random sprite boxes just keep flying in the emulator window. If I press Ctrl-C in GDB, it will pause the emulator and show where it was going. But the breakpoints will not get triggered. In the snippet below, I tried to give a complete function name P$ALLOCATIONTEST_UPDATESPRITES instead of just UPDATESPRITES, but no luck.
(gdb) break UPDATESPRITES
Breakpoint 2 at 0x20153f0
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x02015434 in P$ALLOCATIONTEST_UPDATESPRITES ()
(gdb) break P$ALLOCATIONTEST_UPDATESPRITES
Note: breakpoint 2 also set at pc 0x20153f0.
Breakpoint 3 at 0x20153f0
(gdb) c
Continuing.
Anyway, this has now gone off topic.
EDIT2: for comparison, I tried the same thing with the C versions of the libnds examples, and the GDB functionality seems to behave in the same way as on the Pascal side. I can set breakpoints, but they don't get triggered. Source line stepping with the S command usually gets the whole thing in some sort of deadlock, and the only way I could find to get DeSmuME to break away from this jam situation was to press Ctrl-Z and "killall -9 desmume-cli". Instruction level stepping, with the SI command seems to work without getting stuck, but what good is it if I can't even get breakpoints working... I wonder if anyone's actively using the GDB debugging, and with what kind of workflow it works without getting stuck. Would it help if I had debug versions of all libnds library routines with source line level debug information?
Update. After starting to build DeSmuME for Linux, I found a bunch of "how to" instructions that I probably should have read prior to trying to build the OSX version. Or at least now I understand more about the Legacy stuff. It may have been possible to get further with the Legacy version, but it would have required the old Xcode 3 version... I don't know. At least the Linux build went super smooth compared to OSX, and just following the instructions, building from the newest svn trunk gave me a seemingly working "desmume-cli" binary. It looks like it doesn't support the --arm9gdb parameter. I went back and did "./configure --enable-gdb-stub ; make ; make install" to get a version that does. Great.
I also realized that I should probably try and build the libnds stuff from sources, because now the debug symbols link to files like "c:/Users/davem_000/projects/devkitPro/libnds-master/" etc.
But first I'll just try to get debugging to work for FPC programs.
Of course I am using devkitPRO. I was able to compile FPC for ARM-NDS, together with the libnds support libraries and the example programs, and run the produced .nds ROM images in the DeSmuME emulator. This required over a dozen fixes all over the FPC 2.6.4 source tree, and quite many hours of problem-solving.
Perhaps you mean I should be using the C programming language on Windows? Sorry, I want to use Object Pascal on Mac and Linux.
Btw, right now I'm trying to build the development system on a Debian Linux computer. My idea is to try and get the kids interested in programming, and Nintendo DS seems like a good platform to do interesting stuff. I thought, Object Pascal might be a reasonable language for that. Certainly much better than C. And it's fun for me, because I've been programming in Pascal for over 25 years. The ultimate goal is to get this whole thing running with Lazarus, with integrated debugging, so I could show how the program runs line-by-line even on assembly instruction level.
I don't know why what everyone else might be doing with these NDS development things. For me it's mostly about having fun. I'm not trying to make money with this or anything.