You are not logged in.

Read the FAQ and Knowledge Base before posting.
We won't make a 3DS/2DS emulator.



#1 2017-08-27 01:07:29

mgitkun
Member
Registered: 2017-06-06
Posts: 43

Current State of MSAA?

Since commit 3b354a00 MSAA no longer smooths out the scene as effectively as it previously did; it's still doing something...just not as much. I thought that it might be fixed by commit 6acf7818 but it wasn't.
As of commit e6d5a8fb there's been no change in this regard.

Just wondering whether this is a known issue that will be fixed eventually or if the previous behavior of MSAA was "incorrect" and it is now working as intended.

d2b25360 - MSAA ON
e6d5a8fb - MSAA ON
MSAA OFF

Here are the settings used, builds were downloaded off of the appveyor page.

Last edited by mgitkun (2017-08-27 01:09:04)


i5 2500K - CPU | GTX 480 - GPU | 8GB - RAM | TX650 - PSU | Windows 7 x64 - OS

Offline

#2 2017-08-27 02:15:55

rogerman
Member
Registered: 2011-06-04
Posts: 380

Re: Current State of MSAA?

MSAA was supposed to be limited to 8xMSAA as of commit f50f9d3, but there was a bug that caused MSAA to always be set to the max supported by the GPU. In commit 3b354a0, there was an undocumented bug fix that limited MSAA to 8xMSAA as originally intended. This may be the difference that you're seeing here.

To note, there are no plans for supporting user-defined MSAA sample sizes. MSAA itself is kind of a hack as it, and setting MSAA to very high sample sizes causes massive slowdowns at higher resolutions. The 8x limit was chosen at the time as the best compromise between visual quality and performance.

I suppose the limiter can be modified to allow higher MSAA limits at the lower resolutions, and then lowering the MSAA limits as the resolution increases. I'll look into that.

Offline

#3 2017-08-27 02:22:50

rogerman
Member
Registered: 2011-06-04
Posts: 380

Re: Current State of MSAA?

Now that I look closer at the images you posted, it looks less like an MSAA difference, and more like you have Texture Smoothing enabled in the first image, and Texture Smoothing disabled in the second image.

Ensure that the Texture Smoothing option is exactly the same for your MSAA comparisons, and try again.

Offline

#4 2017-08-27 04:36:56

mgitkun
Member
Registered: 2017-06-06
Posts: 43

Re: Current State of MSAA?

rogerman wrote:

Now that I look closer at the images you posted, it looks less like an MSAA difference, and more like you have Texture Smoothing enabled in the first image, and Texture Smoothing disabled in the second image.

Ensure that the Texture Smoothing option is exactly the same for your MSAA comparisons, and try again.

I thought so too and double checked before posting.
Here's MSAA + Texture Smoothing for both:

e6d5a8fb MSAA + Texture Smoothing
d2b25360 MSAA + Texture Smoothing

Texture Smoothing picks up most of the slack at lower resolutions, but there is still a difference...whatever it is.
To better illustrate that difference here's 2x gpu scaling with MSAA, Texture Smoothing, 24 bit color depth, Texture Deposterization enabled, 4x Texture Scaling, Display Method Filter Enabled, and Filter set to Normal.

e6d5a8fb 2x
d2b25360 2x

rogerman wrote:

MSAA was supposed to be limited to 8xMSAA as of commit f50f9d3, but there was a bug that caused MSAA to always be set to the max supported by the GPU. In commit 3b354a0, there was an undocumented bug fix that limited MSAA to 8xMSAA as originally intended. This may be the difference that you're seeing here.

Ah, okay; that seems like what I'm seeing.

rogerman wrote:

To note, there are no plans for supporting user-defined MSAA sample sizes. MSAA itself is kind of a hack as it, and setting MSAA to very high sample sizes causes massive slowdowns at higher resolutions. The 8x limit was chosen at the time as the best compromise between visual quality and performance.

Is there any particular reason that you couldn't just add a warning/disclaimer like you did with Advanced Bus Timing, JIT, and ROM Loading, or is it just that you would prefer not to?


i5 2500K - CPU | GTX 480 - GPU | 8GB - RAM | TX650 - PSU | Windows 7 x64 - OS

Offline

#5 2017-08-27 05:11:22

rogerman
Member
Registered: 2011-06-04
Posts: 380

Re: Current State of MSAA?

8xMSAA is considered a lot of smoothing, so I'm frankly shocked to see such a massive difference.

What kind of GPU are you using? I have a feeling that your GPU supports something like 32xMSAA, so that would be a difference with 8xMSAA. Alright, I'll look at increasing the MSAA cap at the lower resolutions, which need MSAA the most. I'm thinking about capping MSAA at these values:

1x Native Resolution - 32xMSAA
2x Native Resolution - 16xMSAA
3x-4x Native Resolution - 8xMSAA
5x and greater Native Resolution - 4xMSAA

No warning/disclaimer needs to be provided. Gamers should know what MSAA does and the performance hit that will result from enabling it. And even if people don't know beforehand what MSAA does, the effects should be immediate and allow users to instantly make a judgement call whether to keep it enabled or not. Finally, MSAA only becomes a performance issue at the higher resolutions only. However, higher resolutions also mitigate the need for MSAA anyways.

Offline

#6 2017-08-27 05:49:20

mgitkun
Member
Registered: 2017-06-06
Posts: 43

Re: Current State of MSAA?

rogerman wrote:

8xMSAA is considered a lot of smoothing, so I'm frankly shocked to see such a massive difference.

What kind of GPU are you using? I have a feeling that your GPU supports something like 32xMSAA, so that would be a difference with 8xMSAA. Alright, I'll look at increasing the MSAA cap at the lower resolutions, which need MSAA the most. I'm thinking about capping MSAA at these values:

I'm using pretty dated hardware (2500K and GTX 480) relatively speaking, it shouldn't support MSAA higher than 8x.....unless you mean CSAA.

1x Native Resolution - 32xMSAA
2x Native Resolution - 16xMSAA
3x-4x Native Resolution - 8xMSAA
5x and greater Native Resolution - 4xMSAA

No warning/disclaimer needs to be provided. Gamers should know what MSAA does and the performance hit that will result from enabling it. And even if people don't know beforehand what MSAA does, the effects should be immediate and allow users to instantly make a judgement call whether to keep it enabled or not. Finally, MSAA only becomes a performance issue at the higher resolutions only. However, higher resolutions also mitigate the need for MSAA anyways.

Is there any chance of leaving some setting in some form, maybe not even exposed in the GUI, that lets you choose, just to have that option so that if you're recording gameplay from a DSM and not actually playing you can have it entirely smooth.
I can't go past 6x scaling without the emulator hanging and at 6x scaling there's still some minor aliasing with the current level of MSAA that wouldn't be there with the previous level of MSAA..... sad

Last edited by mgitkun (2017-08-27 05:56:34)


i5 2500K - CPU | GTX 480 - GPU | 8GB - RAM | TX650 - PSU | Windows 7 x64 - OS

Offline

#7 2017-08-27 06:13:54

rogerman
Member
Registered: 2011-06-04
Posts: 380

Re: Current State of MSAA?

You are misunderstanding a bunch of things here.
1. The MSAA sample size and the GPU Scaling Factor are two separate things and are completely independent of one another.
2. The MSAA sample size is automatically set, and is limited by both your GPU's capabilities and the internal limit that DeSmuME uses. For example, your Nvidia GeForce GTX 480 has a GL_MAX_SAMPLE_SIZE of 32. This means that DeSmuME's internal MSAA limit will limit you, not your GPU. However, if your GPU only supported 2xMSAA, then the GPU is what would limit you, not DeSmuME.
3. If the emulator hangs at 6x scaling, then that is most likely a problem with the Windows port and not the core graphics code. The Cocoa port can run with up to a 16x GPU Scaling Factor without any crashing.

Try commit 2379dc1. I've implemented the new limits I mentioned, except now they are as follows:
1x Native Resolution - 32xMSAA
2x Native Resolution - 16xMSAA
3x-8x Native Resolution - 8xMSAA
9x and greater Native Resolution - 4xMSAA

And no, I don't see any need to add a user-defined setting for the MSAA sample size. There is no need, especially now with the tiered MSAA limits. The performance problems from MSAA are due to MSAA consuming too much GPU memory bandwidth. At smaller resolutions, this is not a problem, since smaller resolutions consume less bandwidth. But higher resolutions consume more bandwidth, and MSAA multiplies this bandwidth cost. With the tiered MSAA limits, we keep the bandwidth usage more balanced.

Offline

#8 2017-08-27 08:12:52

mgitkun
Member
Registered: 2017-06-06
Posts: 43

Re: Current State of MSAA?

rogerman wrote:

You are misunderstanding a bunch of things here.
1. The MSAA sample size and the GPU Scaling Factor are two separate things and are completely independent of one another.
2. The MSAA sample size is automatically set, and is limited by both your GPU's capabilities and the internal limit that DeSmuME uses. For example, your Nvidia GeForce GTX 480 has a GL_MAX_SAMPLE_SIZE of 32. This means that DeSmuME's internal MSAA limit will limit you, not your GPU. However, if your GPU only supported 2xMSAA, then the GPU is what would limit you, not DeSmuME.
3. If the emulator hangs at 6x scaling, then that is most likely a problem with the Windows port and not the core graphics code. The Cocoa port can run with up to a 16x GPU Scaling Factor without any crashing.

1. I understand that.
2. I realize that it can't go higher magically because DeSmuME tells it to, I just didn't know that the maximum sample size was 32.
3. That's interesting, how does it run?
I mean in the sense that on the Windows port when running BRZ or HQ4x filters in combination with 3x+ GPU Scaling factor it starts frame skipping...with frame skipping disabled. But it technically runs.
That's just a comparison, I've read your replies about performance with filters and scaling being more so an issue for the Windows port , I'm just asking if anything funky happens when using 16x GPU Scaling on the Cocoa Port.
Does 16x GPU Scaling on the Cocoa port run like that or does it actually run without dropping frames (obviously not at full speed)

Oh and I tried 2379dc1e, oddly enough I can go up to 7x now with everything enabled whereas on e6d5a8fb I couldn't do 7x even with everything including MSAA disabled.


rogerman wrote:

Try commit 2379dc1. I've implemented the new limits I mentioned, except now they are as follows:
1x Native Resolution - 32xMSAA
2x Native Resolution - 16xMSAA
3x-8x Native Resolution - 8xMSAA
9x and greater Native Resolution - 4xMSAA

And no, I don't see any need to add a user-defined setting for the MSAA sample size. There is no need, especially now with the tiered MSAA limits. The performance problems from MSAA are due to MSAA consuming too much GPU memory bandwidth. At smaller resolutions, this is not a problem, since smaller resolutions consume less bandwidth. But higher resolutions consume more bandwidth, and MSAA multiplies this bandwidth cost. With the tiered MSAA limits, we keep the bandwidth usage more balanced.

Oh well.....that's a shame, either way this is definitely better than it was before for 2x.

Here's the same frame with the new build
2379dc1e 2x
d2b25360 2x

It still has very minor differences but nothing that should be noticeable when not comparing them side by side.

You might want to consider bumping 3-4x GPU Scaling to tier 2 as well, 5x scaling is probably fine at tier 3......
2379dc1e 3x
d2b25360 3x

2379dc1e 4x
d2b25360 4x

2379dc1e 5x
d2b25360 5x

Last edited by mgitkun (2017-08-27 08:58:47)


i5 2500K - CPU | GTX 480 - GPU | 8GB - RAM | TX650 - PSU | Windows 7 x64 - OS

Offline

#9 2017-08-27 21:09:46

rogerman
Member
Registered: 2011-06-04
Posts: 380

Re: Current State of MSAA?

Try commit a05e03e.

Offline

#10 2017-08-27 22:07:54

rogerman
Member
Registered: 2011-06-04
Posts: 380

Re: Current State of MSAA?

mgitkun wrote:

3. That's interesting, how does it run?
I mean in the sense that on the Windows port when running BRZ or HQ4x filters in combination with 3x+ GPU Scaling factor it starts frame skipping...with frame skipping disabled. But it technically runs.
That's just a comparison, I've read your replies about performance with filters and scaling being more so an issue for the Windows port , I'm just asking if anything funky happens when using 16x GPU Scaling on the Cocoa Port.
Does 16x GPU Scaling on the Cocoa port run like that or does it actually run without dropping frames (obviously not at full speed)

16x GPU scaling works on all Macs, from the highest-spec iMac w/ Intel i7-7700K all the way down to a Power Mac w/ PowerPC G5. Obviously, the Power Mac will be completely unusable at 16x, but the iMac i7-7700K can run 16x at full speed, albeit with so much frame skip that games become unplayable. I haven't yet seen any PC that can run DeSmuME at 16x at full speed without dropping any frames, but I'm keeping the idea of running 16x at full speed without dropping frames as a sort of performance "holy grail" to reach in the future.

A tip: Don't run magnification filters, like xBRZ or HQnx, while running any resolution above native. It's buggy, it doesn't produce the intended results, and is a complete waste of CPU cycles. In reality, magnification filters should be completely disabled on the Windows port when running non-native resolutions, until the time when the entire video system on Windows can be completely reworked.

Ideally, the two NDS screens should be processed separately. If an NDS screen comes in at the native resolution, then run the magnification filter on it. However, if the NDS screen comes in at a custom resolution, then use the screen as-is without magnification filters. The Cocoa port does this, and it looks something like this:

In this image, the main screen is at the native resolution, and was therefore upscaled using 4xBRZ. The touch screen is at a custom resolution, so 4xBRZ was not used on it.
SCo8XjU.jpg

Offline

#11 2017-08-28 01:41:58

mgitkun
Member
Registered: 2017-06-06
Posts: 43

Re: Current State of MSAA?

rogerman wrote:

Try commit a05e03e.

Yeah, much better.
The difference is nearly indistinguishable at 3x and 4x now.

a05e03e2 3x
d2b25360 3x

a05e03e2 4x
d2b25360 4x

rogerman wrote:

16x GPU scaling works on all Macs, from the highest-spec iMac w/ Intel i7-7700K all the way down to a Power Mac w/ PowerPC G5. Obviously, the Power Mac will be completely unusable at 16x, but the iMac i7-7700K can run 16x at full speed, albeit with so much frame skip that games become unplayable. I haven't yet seen any PC that can run DeSmuME at 16x at full speed without dropping any frames, but I'm keeping the idea of running 16x at full speed without dropping frames as a sort of performance "holy grail" to reach in the future.

Yeah, that's why I said not at full speed; in other words if you have it set to "0 (never skip)" will it adhere to that setting and actually slow down game speed as it should, and as it does normally, or will it just randomly start skipping because of some shortcoming?

rogerman wrote:

A tip: Don't run magnification filters, like xBRZ or HQnx, while running any resolution above native. It's buggy, it doesn't produce the intended results, and is a complete waste of CPU cycles. In reality, magnification filters should be completely disabled on the Windows port when running non-native resolutions, until the time when the entire video system on Windows can be completely reworked.

Other than just trying every filter out to see what it does I haven't used anything but normal filter since GPU Scaling was introduced and I only use normal filter to smooth out any jagged edges that MSAA may have missed.

rogerman wrote:

Ideally, the two NDS screens should be processed separately. If an NDS screen comes in at the native resolution, then run the magnification filter on it. However, if the NDS screen comes in at a custom resolution, then use the screen as-is without magnification filters. The Cocoa port does this, and it looks something like this:
In this image, the main screen is at the native resolution, and was therefore upscaled using 4xBRZ. The touch screen is at a custom resolution, so 4xBRZ was not used on it.

That's interesting, seems like it'd be very useful for games that put 2d sprites or graphics that don't benefit much from scaling on one screen and 3d on the other.

Last edited by mgitkun (2017-08-28 01:46:20)


i5 2500K - CPU | GTX 480 - GPU | 8GB - RAM | TX650 - PSU | Windows 7 x64 - OS

Offline

#12 2017-08-28 03:11:47

rogerman
Member
Registered: 2011-06-04
Posts: 380

Re: Current State of MSAA?

In the Cocoa port, if frameskip is disabled, then the overall execution rate will slow down in order to display every single frame. This is true, regardless of hardware used or GPU Scaling Factor used. For example, a Power Mac G5 running 16x will produce and show every single frame if frameskip is disabled, but the overall execution rate will become unbearably slow.

Ah, I think I see the issue on the Windows port now. If frameskip is disabled, then you would expect that every single frame would show, regardless of execution rate. But at the higher scaling factors, frames are still skipped, despite frameskip being disabled. This is clearly a bug in the Windows port.

Offline

#13 2017-08-28 06:38:21

zeromus
Radical Ninja
Registered: 2009-01-05
Posts: 6,169

Re: Current State of MSAA?

Frames can probably be skipped if the thread that handles displaying the frames goes slower than the thread that generates them. If you're scaling giant screens with software magnifying manglers, that would do the trick. I don't care.

Offline

#14 2017-08-28 21:14:28

mgitkun
Member
Registered: 2017-06-06
Posts: 43

Re: Current State of MSAA?

rogerman wrote:

In the Cocoa port, if frameskip is disabled, then the overall execution rate will slow down in order to display every single frame. This is true, regardless of hardware used or GPU Scaling Factor used. For example, a Power Mac G5 running 16x will produce and show every single frame if frameskip is disabled, but the overall execution rate will become unbearably slow.

Ah, I think I see the issue on the Windows port now. If frameskip is disabled, then you would expect that every single frame would show, regardless of execution rate. But at the higher scaling factors, frames are still skipped, despite frameskip being disabled. This is clearly a bug in the Windows port.

Good to know that it doesn't happen at very high resolutions on the Cocoa port.


i5 2500K - CPU | GTX 480 - GPU | 8GB - RAM | TX650 - PSU | Windows 7 x64 - OS

Offline

Board footer

Powered by FluxBB