You are not logged in.

Read the FAQ and Knowledge Base before posting.
We won't make a 3DS/2DS emulator.



#1 2007-07-10 14:43:15

Xoff
Member
From: France
Registered: 2007-07-01
Posts: 22

Multi-core

I have a macbook pro with a intel processor (2.33GHz, Core 2 Duo).
I tried to compile desmume with some multi-thread addition.

First, I tried to add one thread for the arm9 and one for arm7. But I didn't
succeed to achieve a working patch, as the code is too much intricated ( or I made mistakes ?) ....

Then, I tried to patch desmume-cli. I add on thread for graphics and sounds  (SPU and Draw)
and let the main emulation process in the main thread. It's very easy, as the code
is already well splited into different functions. (less than 10 lines to add, and some other ones to move)

For bomberman I was at ~42fps, and with thread I obtain ~54fps, but only 115% use of cpu
(remember that I have to core, so I can theoreticaly obtain 200%). I have some few graphical
glitchs that must be easy to cancel by adding some simple bufferizing code for the screen.

So, if I could split the core emulation, I could obtain better speed, bit I don't know how...

Offline

#2 2007-07-10 15:00:19

shash
Administrator
Registered: 2007-03-17
Posts: 897

Re: Multi-core

Does your new threading code introduce new bugs? If not, feel free to send a patch, and whoever if able to test if will commit it to the CVS  smile

In my humble opinion, it's a bit early to add threading, as it'll probably make the code a mess, more keeping in mind there's so many basic stuff still left to do.

Offline

#3 2007-07-11 11:20:34

masscat
Member
From: UK
Registered: 2007-03-17
Posts: 73
Website

Re: Multi-core

A while back I did some timings on Desmume (before the 3D core went in) and found the following to be a common split between where it spends its time (dependent upon the .nds file).

40% ARM9
40% 2D Graphics emulation (rendering)
20% other stuff

Multi threading is sometime that is definitely worth doing at some point especially with 2 and 4 core CPUs becoming more popular. Maybe splitting Desmume by threads could be something like:

1. 2D graphics emulation
2. 3D graphics emulation
3. ARMs
4. Other stuff


Also a while back, I did a quick test of implementing the 2D graphics emulation in another thread and it did produce a good speed improvement. Originally it was coordinated on a line by line basis, although this produced correct render results it did not produce a good speed up as there was little overlap in the Graphics thread and the other stuff thread with the other stuff thread spending a lot of time waiting for the graphics thread to complete before sending it another line to render.
Next I rendered the entire screen in one go in the graphics thread only coordinating once per frame. This gave a good improvement in speed but produced render errors, for example graphics changing half way through the render.


From this, I feel that some of Desmume will need to be redesigned in order to get good performance and correct emulation when multi threading.
Maybe keeping two copies of the graphics render state, one for the current screen render and another for the ARMs to update but this would add the complication of keeping the states coordinated and the overhead of copying the ARMs state to the render state.
Splitting the ARM9 and ARM7 maybe of not much benefit as in most cases the ARM7 does not do much and there could be problems of coordinating the ARMs' execution. A better split maybe separating the ARM memory access from the instruction execution.

These are just a couple of ideas with my main point being that it will require some thought and design to get right.


Just for information, the GDB stubs, if active, run in their own threads. This was done to make them easier to write (they sit blocked on a couple of sockets waiting for messages) rather than performance so they will hardly stretch a Core 2 Duo.

Offline

#4 2007-07-11 13:18:17

shash
Administrator
Registered: 2007-03-17
Posts: 897

Re: Multi-core

masscat wrote:

A while back I did some timings on Desmume (before the 3D core went in) and found the following to be a common split between where it spends its time (dependent upon the .nds file).

40% ARM9
40% 2D Graphics emulation (rendering)
20% other stuff

This has changed quite a lot, if my profiler isn't failing, due to the the changes on the 2D core, it isn't so much bottlenecking (I think it went to 5-15% tops the last time I profiled, which was a bit before 0.7.1).

Offline

#5 2007-07-11 16:07:26

Xoff
Member
From: France
Registered: 2007-07-01
Posts: 22

Re: Multi-core

shash wrote:

Does your new threading code introduce new bugs? If not, feel free to send a patch, and whoever if able to test if will commit it to the CVS  smile

In my humble opinion, it's a bit early to add threading, as it'll probably make the code a mess, more keeping in mind there's so many basic stuff still left to do.

Yes. It is not perfect now and it would add some complexity in some "unfinished" code....
But it is important to keep it in mind to allow easy threading addition later !

Offline

#6 2007-07-11 16:48:45

shash
Administrator
Registered: 2007-03-17
Posts: 897

Re: Multi-core

Xoff wrote:

Yes. It is not perfect now and it would add some complexity in some "unfinished" code....

Yep, that's my only concern, nothing is more annoying than modifying optimized code smile

Xoff wrote:

But it is important to keep it in mind to allow easy threading addition later !

I couldn't agree more

Offline

#7 2007-07-17 22:53:01

XTra KrazzY
Member
Registered: 2007-03-23
Posts: 108

Re: Multi-core

I had my own multi-threaded version of DeSmuME on my computer, and I even tried to build a second one after the data was corrupt for some Micro$ofty reason... It had something like 40% increase in the total speed.

Xoff wrote:

So, if I could split the core emulation, I could obtain better speed, bit I don't know how...

Though I don't think it would be a problem(I think it'd even be easier), programming something similar to linux wouldn't be easy. My old program used the windows 64's logical_processor API and as far as multi-coring, it sometimes failed and worked on the same core. If I only knew the correct syscalls I might've been able to help on this area. (After all, this IS the initial question/query)


In the meantime, I've been experimenting with a VERY VERY simple dynarec system. Not very helpful.

Last edited by XTra KrazzY (2007-07-17 22:53:33)


If you are reading this signature, you SERIOUSLY need to get a life.

Offline

#8 2007-07-17 23:47:56

snkmad
Member
Registered: 2007-03-17
Posts: 141
Website

Re: Multi-core

XTra, if im not asking too much, could you make a patch file of you Multi-thread implementation, but instead of X64 on x86?
Just wanted to see how it would perform here.

If you dont feel ok, dont worry, ill understand.


Athlon 64 X2 3800+ / 2Gb DDR2 800Mhz
Geforce 8600GT 256MB / Windows XP PRO SP3
http://desmume.org/compatibility-list/

Offline

#9 2007-07-18 22:01:54

XTra KrazzY
Member
Registered: 2007-03-23
Posts: 108

Re: Multi-core

x86 eh? Most of the speed came from the 64-bit effectiveness of the registers.
Even if you had an x64 machine you wouldn't notice any difference, and with the new 2D engine optimizations(that I haven't tried nor adapted my version to)

Plus, I don't have the first version, nor the second, because I bought a new hard drive.

Sorry(I guess)... I'll build something as soon as I have the time.


If you are reading this signature, you SERIOUSLY need to get a life.

Offline

#10 2007-07-19 00:42:05

snkmad
Member
Registered: 2007-03-17
Posts: 141
Website

Re: Multi-core

Oh dont bother, i was just curious.


Athlon 64 X2 3800+ / 2Gb DDR2 800Mhz
Geforce 8600GT 256MB / Windows XP PRO SP3
http://desmume.org/compatibility-list/

Offline

Board footer

Powered by FluxBB