Weird config results in optimal performance on my pc

Hello. I wanted to comment on the unusual experience I’ve had with configuring retroarch to run optimally on my machine.

Using the suggested settings in this guide (http://www.libretro.com/index.php/wiki/ … ows-guide/), I get sub-optimal performance. Those settings are:

In retroarch: Vsync on Hard GPU sync on GPU sync frames 1 Vsync swap interval 1 Threaded driver off

In my graphics card cp: Triple buffer off Vsync- use application setting Threaded driver optimization- auto Maximum pre-rendered frames- use application setting

With these settings, I get skippy video/audio, and every so often my frame rate plummets- maybe to 30- for just a second or so, then returns to normal. It’s quite distracting and takes one out of the experience when it occurs.

I’ve tried adjusting triple buffer and vsync and threaded driver in my graphics card cp, but get the same results.

I’ve tried basically every combination of the above settings and only found one that resulted in no screen tearing, no input lag, no a/v jitters and no horrendous frame rate drops. That combination of settings is:

In retroarch: Vsync OFF Hard GPU sync on GPU sync frames 0 Vsync swap interval 1 Threaded driver off

In graphics card CP: Triple buffer - ON Vsync - “adaptive.” I also tried “ON” and got nearly the same results, but I think adaptive was slightly better (it’s advertised as a more advanced form of vsync by nvidia). Threaded driver optimization - auto or off Maximum pre-rendered frames: use application setting.

My system specs are as follows:

Intel dual core t4400 at 2.2Ghz Ram - 4 Gb Nvidia GeForce 9300 Windows 7 64 bit

So, does anyone have any ideas as to why I’m unable to use the suggested settings without running into problems, and why retroarch is awesome when i use the second set of settings? I’m assuming this is hardware specific, but what would I need to upgrade? Does my cpu just suck? I thought 2.2ghz dual core would be enough to handle any game up to the late 90s…

I imagine it’s hard sync. Do you get good performance with the first set of settings, but with hard sync off?

Your second set of settings probably works well because by turning off RetroArch’s vsync you’ve also disabled hard sync and are instead using your video card’s triple buffered adaptive vsync method.

Hard sync is really CPU intensive. I have to disable it on my older dual core systems or even less demanding cores like Nestopia and Genesis Plus GX run terribly. It works quite well on my i5 2500k 3.3 ghz, GTX 570 system though. I have most of my Nvidia control panel settings default except maximum pre-rendered frames set to 1; that seemed to give just a very small amount less display lag.

With hard sync frames set to 0 I also have to set my Windows 7 power plan from balanced to high performance for certain demanding cores like BSNES Balanced and Mednafen PSX so they don’t get audio crackling. And if I add the crt-royale filter on top of those I’ve found some games (Kirby Super Star would be one) in those cores will crackle occasionally even with the high performance plan. So if I want to use that filter without crackles I have to set hard sync frames to 1 for those cores.

I also have a 120hz monitor, so I have swap interval set to 2 to prevent stuttering that happens every few seconds at that refresh rate. Display lag is halved by playing at that refresh so I recommend monitors with high refresh rates if you care about reducing lag and are looking for a new monitor. The motion clarity of PC games that can run higher than 60 fps looks amazing at high refresh rates too.

Ah, I see. That makes sense, I didn’t realize that hard gpu sync was tied to vsync.

With original suggested settings but hard gpu sync turned off, performance is good but input lag is slightly worse - it improves when I turn triple buffering on. Apparently, there are two kinds of triple buffering: the kind that increases input lag and the kind that decreases it. I have the kind that decreases vsync input lag:

“In other words, with triple buffering we get the same high actual performance and similar decreased input lag of a vsync disabled setup while achieving the visual quality and smoothness of leaving vsync enabled… Some game developers implement a short render ahead queue and call it triple buffering (because it uses three total buffers). They certainly cannot be faulted for this, as there has been a lot of confusion on the subject and under certain circumstances this setup will perform the same as triple buffering as we have described it (but definitely not when framerate is higher than refresh rate).”

http://www.anandtech.com/show/2794/2

So basically I just have a crappy old cpu and can’t use hard gpu sync. :stuck_out_tongue: Well, that’s kind of disappointing, but I’m getting great results without it, honestly. I’m just glad to have no tearing, a/v jitters, framerate drops or input lag. I think triple buffer plus adaptive vsync results in a max of 1/2 a frame of input lag, and my display gets 25ms of input lag, for a total of 33ms input lag.

I think the single thing that would have the biggest impact on my experience would be to get a better gaming display- ASUS makes a few that have 10ms of input lag. Nvidia also just came out with GSYNC which completely eliminates tearing, stutter, and input lag, but the cheapest I’ve seen that go for is $500 for a modded monitor.

http://www.digitalstormonline.com/nvidia-g-sync.asp http://www.geforce.com/hardware/technol … technology

Thanks for your reply!

EDIT:

Okay, so I found the power mode setting and set it from “auto” to “always prefer maximum performance.” Now it appears I can use the suggested settings without issue. I guess the random frame rate drops were the result of the gpu throttling up or down. Although, I notice that my computer sounds like it’s working a lot harder under these settings. What is the actual advantage/disadvantage of hard gpu sync at 1 frames vs leaving it off?

Hard GPU sync is quite heavy indeed, Hard sync attempts to hard-synchronize CPU and GPU. the number allows you to set how ow many frames CPU can run ahead of GPU when using video_hard_sync.

I’ve been reading a lot into this after your original question on the blog.

From my research, triple buffering most definitely introduces ONE frame of display lag. Here is a good explanation I found:

In computer graphics, triple buffering is similar to double buffering but provides a speed improvement. In double buffering the program must wait until the finished drawing is copied or swapped before starting the next drawing. This waiting period could be several milliseconds during which neither buffer can be touched.

In triple buffering the program has two back buffers and can immediately start drawing in the one that is not involved in such copying. The third buffer, the front buffer, is read by the graphics card to display the image on the monitor. Once the monitor has been drawn, the front buffer is flipped with (or copied from) the back buffer holding the last complete screen. Since one of the back buffers is always complete, the graphics card never has to wait for the software to complete. Consequently, the software and the graphics card are completely independent, and can run at their own pace. Finally, the displayed image was started without waiting for synchronization and thus with minimum lag.[1]

Due to the software algorithm not having to poll the graphics hardware for monitor refresh events, the algorithm is free to run as fast as possible. This can mean that several drawings that are never displayed are written to the back buffers. This is not the only method of triple buffering available, but is the most prevalent on the PC architecture where the speed of the target machine is highly variable.

Another method of triple buffering involves synchronizing with the monitor frame rate. Drawing is not done if both back buffers contain finished images that have not been displayed yet. This avoids wasting CPU drawing undisplayed images and also results in a more constant frame rate (smoother movement of moving objects), but with increased latency.[1] This is the case when using triple buffering in DirectX, where a chain of 3 buffers are rendered and always displayed.

Triple buffering implies three buffers, but the method can be extended to as many buffers as is practical for the application. Usually, there is no advantage to using more than three buffers.

Triple buffering allows the game to run “faster” than the monitor refresh rate. Even in sub-60 fps situations it might help (with vsync small frame drops lead to halved framerate). From my understanding, triple buffering produces a placebo effect regarding lag since the game is actually running ahead than the what is being displayed.

Anyway it’s too confusing, lots of counter arguments, and it seems details vary on each implementation…

Anyway more buffers = bad :p. Also, it seems there are some sound corruption and desync issues with nestopia, so try something else for a change

it is really complicated. Check out the third image down: http://www.anandtech.com/show/2794/2

Also, the update in italics: http://www.anandtech.com/show/2794/4

That helped me to understand triple buffering a little better.

Is fceux generally considered the better emulator? I seem to get slightly less input lag using it, but that could be in my head. I guess my main aversion to using it prior to retroarch was that I couldn’t figure out how to get scanlines with it. I think the fceux colors are a little whack, though. Is there a way to get NTSC-like colors with fceux using RA?

fceu* has support for more mappers (mostly weird chinese bootlegs) but nestopia is generally considered more “accurate” (whatever that means). Basically, there’s not enough difference between the two in most cases to draw any meaningful distinctions. Just use whichever works better for your use-case.

For NTSC colors, you can use maister’s NTSC shader, but that brings along some artifacting and whatnot, since it reproduces the signal effects from a composite or S-video connection, or you can use the NTSC softfilter, which is based on blargg’s NTSC filter. The softfilter stuff isn’t available in 1.0.0.2 and may be a bit finicky in git builds, while the shader is mature and straightforward (just choose the ntsc.cgp preset).

I don’t think my system can handle the NTSC shader :frowning:

That’s weird, because I’m pretty sure I was able to use an NTSC shader with the S-video effects on another emulator - Nestopia or FCEUX, maybe.

Do I need to change something else in my video settings?

Oh well, I’ll just stick with Nestopia and the single pass scanline.cg shader.

BTW, scanline.cg looks sweet- way better than the scanline options I’ve tried in other emulators, which usually just result in thick black lines across the entire screen and/or unevenly spaced lines (yuck).

You were probably using blargg’s CPU filter rather than maister’s GPU shader. The shader is pretty resource-intense and requires a decent GPU (HD4000 is the weakest integrated solution that can handle it, I think). blargg’s filter is super-optimized and works on even weak CPUs, but is less flexible with resolutions/cores than the shader.

Generally with scanlines, you need to use integer scaling to keep it from looking like garbage. Scanline.cg looks surprisingly good at non-integer scales, as does crt-easymode.cg, which uses the same strategy to draw the darker lines. cgwg’s crt-geom.cg also looks good with non-integer scaling, but it may be too heavy for your GPU.

honestly, I think it’s a little weird to want to reproduce all the artifacts from CRTs with S-video or component or whatever. I think the graphics look fine in HD with scanlines, so I’m happy just using one of the simple scanline shaders. :slight_smile:

It’d be cool to be able to get NTSC colors with FCEUX without all the gpu intensive stuff but it’s no biggie as Nestopia seems fine.

Thanks!

I don’t know, guys - I’m fairly confident that you can enable hard gpu sync without vsync being turned on in RA. Earlier, Awakened had suggested that vsync and hard gpu sync were tied together.

Example: I’m playing Castlevania SOTN with mednafen psx, I have vsync disabled in RA but set to ON in my Nvidia CP, and I have hard GPU sync set to ON and set to 1 frame in RA.

Under these settings, SOTN runs smoothly. If I set hard GPU sync to 0 frames, my framerate drops horribly and the game becomes unplayable. If I set it to 1, the game runs smoothly with no lag. This proves that hard GPU sync is doing something despite Vsync not being activated through RA.

A second test involved Super Mario Bros. I’m able to set hard gpu sync to 0 with nestopia and the game runs perfectly with no perceptible input lag. If I set hard gpu sync to 1 I get almost a full frame of additional input lag. I’m very familiar with how Super Mario Bros is supposed to respond and can detect differences in input lag from playing hundreds of hours of twitch video games. This second test might be called “subjective,” but you’d have a much harder time saying that about the first test. Taken together, I think these tests show pretty conclusively that gsync can be enabled in retroarch without turning vsync on in RA.

Anyway, if someone is having performance issues using vsync+hard gpu sync in RA, then they might want to try disabling vsync in RA and forcing it on through their graphics card, although I honestly don’t know if it makes any difference. On my old dual-core system I use the following settings for everything up to and including psx and n64, and everything works great. These are the settings suggested by AndreSM for hardware-limited systems. I also followed Awakened’s advice re: the power mode and maximum pre-rendered frames.

Retroarch: vsync - on hard gpu sync - on gpu sync frames - 1 threaded driver - off fullscreen - on under shader options - default filter nearest, shader passes 1, shader #0: crt-easymode.cg or scanline.cg

Nvida CP: vsync - use application setting triple buffer - on threaded optimization - auto maximum pre-rendered frames - 1 power mode - prefer maximum performance

[QUOTE=PatrickM;15971]I don’t know, guys - I’m fairly confident that you can enable hard gpu sync without vsync being turned on in RA. Earlier, Awakened had suggested that vsync and hard gpu sync were tied together.

Example: I’m playing Castlevania SOTN with mednafen psx, I have vsync disabled in RA but set to ON in my Nvidia CP, and I have hard GPU sync set to ON and set to 1 frame in RA.

Under these settings, SOTN runs smoothly. If I set hard GPU sync to 0 frames, my framerate drops horribly and the game becomes unplayable. If I set it to 1, the game runs smoothly with no lag. This proves that hard GPU sync is doing something despite Vsync not being activated through RA.

A second test involved Super Mario Bros. I’m able to set hard gpu sync to 0 with nestopia and the game runs perfectly with no perceptible input lag. If I set hard gpu sync to 1 I get almost a full frame of additional input lag. I’m very familiar with how Super Mario Bros is supposed to respond and can detect differences in input lag from playing hundreds of hours of twitch video games. This second test might be called “subjective,” but you’d have a much harder time saying that about the first test. Taken together, I think these tests show pretty conclusively that gsync can be enabled in retroarch without turning vsync on in RA.

Anyway, if someone is having performance issues using vsync+hard gpu sync in RA, then they might want to try disabling vsync in RA and forcing it on through their graphics card, although I honestly don’t know if it makes any difference. On my old dual-core system I use the following settings for everything up to and including psx and n64, and everything works great. These are the settings suggested by AndreSM for hardware-limited systems. I also followed Awakened’s advice re: the power mode and maximum pre-rendered frames.

Retroarch: vsync - on hard gpu sync - on gpu sync frames - 1 threaded driver - off fullscreen - on under shader options - default filter nearest, shader passes 1, shader #0: crt-easymode.cg or scanline.cg

Nvida CP: vsync - use application setting triple buffer - on threaded optimization - auto maximum pre-rendered frames - 1 power mode - prefer maximum performance[/QUOTE]

Not to necro a really old post but I found what you said here ot be true too…I have been fighting with trying to get smooth emulation and less input lag and found that using the exact settings above seemed to give me the best results. The downside is you loose the fast foward function with the max pre-rendered frames option, but it seems to help.

Necro because I spent hours googling for options before I found this post, so it could help someone else. I didn’t use your shader settings though I am using crt shaders.