Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NV low latency support #3690

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

esullivan-nvidia
Copy link
Contributor

This pull request implements a new device interface called ID3DLowLatencyDevice using the VK_NV_low_latency2 extension. The purpose of this interface, is to give dxvk-nvapi a way to implement the nvapi Reflex interface.

m_frameId += 1;
if (!Repeat) {
m_frameId = m_presenter->lowLatencyEnabled() ?
m_device->getLatencyMarkers().present :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this work with games that are not using latency markers, but only the sleep API? (e.g. Apex Legends)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was fixed in a previous update to this PR to only use the present latency marker if it is valid.

@adamnv
Copy link
Contributor

adamnv commented Oct 17, 2023

Support for the low-latency extensions just got released in the 545.23.06 NVIDIA driver so this is usable/testable now. (Propagating this note to all three project PRs, apologies for any duplication! 😸 )

@@ -274,6 +275,7 @@ namespace dxvk {
DxvkSubmitStatus* status) {
DxvkSubmitInfo submitInfo = { };
submitInfo.cmdList = commandList;
submitInfo.frameId = m_latencyMarkers.render;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that resource initialization (zeroing) submissions can happen independently from the immediate context or CS thread. submitCommandList is used by every submission including init submissions, and this can cause slightly awkward consequences such as submission tied with a particular frame ID after present.

It might be better to exclude InitContext from low latency tracking completely to avoid edge cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to disable passing frame ids at vkQueueSubmit time for work submitted from the initializer context.

}

D3D11SwapChain* GetLowLatencySwapChain() {
return (m_swapchains.size()) == 1 ? m_swapchains[0] : nullptr;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an inherent race condition here if the app destroys the swap chain while also calling any of the LowLatencyDevice methods that access it, in which case we're likely to just crash. Is this something we should be robust against?

std::remove(m_swapchains.begin(), m_swapchains.end(), swapchain);
}

UINT GetSwapchainCount() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

src/d3d11/d3d11_interfaces.h Show resolved Hide resolved
loathingKernel added a commit to loathingKernel/Proton that referenced this pull request Nov 22, 2023
Requires at least Nvidia 545.xx drivers.

The patches are from the following pull requests:
[wine](ValveSoftware/wine#200)
[dxvk-nvapi](jp7677/dxvk-nvapi#147)
[vkd3d-proton](HansKristian-Work/vkd3d-proton#1739)
[dxvk](doitsujin/dxvk#3690)

Thanks also goes to @ptr1337 for initially building and testing the patchset.
loathingKernel added a commit to loathingKernel/Proton that referenced this pull request Nov 22, 2023
Requires at least Nvidia 545.xx drivers.

The patches are from the following pull requests:
[wine](ValveSoftware/wine#200)
[dxvk-nvapi](jp7677/dxvk-nvapi#147)
[vkd3d-proton](HansKristian-Work/vkd3d-proton#1739)
[dxvk](doitsujin/dxvk#3690)

Thanks also goes to @ptr1337 for initially building and testing the patchset.
@@ -48,6 +57,7 @@ namespace dxvk {


VkResult Presenter::acquireNextImage(PresenterSync& sync, uint32_t& index) {
std::lock_guard<dxvk::mutex> lock(m_lowLatencyMutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AcquireNextImage can block. I think holding a mutex for the entire duration of acquire can cause potential trouble.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the same concern was also raised in vkd3d-proton review, so I suppose this can be addressed in a similar way (only hold the lock in case of recreation).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the locking in acquire and present removed my layer-based LL2 implementation works with significantly less stutters — so this is probably that has to be fixed. I think the locks aren't really necessary here anyway since access to swapchain is externally synchronized (which implies it can't be destroyed for the duration of call).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock should only be taken when the swapchain is destroyed now. So I think this issue can be considered resolved.

@@ -68,11 +78,13 @@ namespace dxvk {
VkResult Presenter::presentImage(
VkPresentModeKHR mode,
uint64_t frameId) {
std::lock_guard<dxvk::mutex> lock(m_lowLatencyMutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueuePresent is also allowed to block, although this might be less of a concern if the driver is implemented in a way that doesn't block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to only take the lock when destroying the swapchain.

@@ -151,6 +163,8 @@ namespace dxvk {


VkResult Presenter::recreateSwapChain(const PresenterDesc& desc) {
std::lock_guard<dxvk::mutex> lock(m_lowLatencyMutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recreateSurface can also result in the swapchain getting destroyed. I think we currently lack locking there (and this might be a tricky one since recreateSurface will keep m_swapchain NULL until the next call to recreateSwapchain).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to only take the lock when destroying the swapchain. Whenever dxvk calls one of the other LL2 entry points it holds the low latency lock, and confirms the swapchain handle is valid. This approach should prevent any usage of the destroyed swapchain handle.

Comment on lines 504 to 513
uint32_t timingCount = 0;

std::lock_guard<dxvk::mutex> lock(m_lowLatencyMutex);
m_vkd->vkGetLatencyTimingsNV(m_vkd->device(), m_swapchain, &timingCount, &markerInfo);

if (timingCount != 0) {
frameReports.resize(timingCount, { VK_STRUCTURE_TYPE_GET_LATENCY_MARKER_INFO_NV });
markerInfo.pTimings = frameReports.data();

m_vkd->vkGetLatencyTimingsNV(m_vkd->device(), m_swapchain, &timingCount, &markerInfo);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think VkStructureType passed to resize is wrong here (VK_STRUCTURE_TYPE_GET_LATENCY_MARKER_INFO_NVVK_STRUCTURE_TYPE_LATENCY_TIMINGS_FRAME_REPORT_NV)? So together with updated Vulkan headers for spec 2 version of NV_ll2 this should look more like:

Suggested change
uint32_t timingCount = 0;
std::lock_guard<dxvk::mutex> lock(m_lowLatencyMutex);
m_vkd->vkGetLatencyTimingsNV(m_vkd->device(), m_swapchain, &timingCount, &markerInfo);
if (timingCount != 0) {
frameReports.resize(timingCount, { VK_STRUCTURE_TYPE_GET_LATENCY_MARKER_INFO_NV });
markerInfo.pTimings = frameReports.data();
m_vkd->vkGetLatencyTimingsNV(m_vkd->device(), m_swapchain, &timingCount, &markerInfo);
std::lock_guard<dxvk::mutex> lock(m_lowLatencyMutex);
m_vkd->vkGetLatencyTimingsNV(m_vkd->device(), m_swapchain, &markerInfo);
if (markerInfo.timingCount != 0) {
frameReports.resize(markerInfo.timingCount, { VK_STRUCTURE_TYPE_LATENCY_TIMINGS_FRAME_REPORT_NV });
markerInfo.pTimings = frameReports.data();
m_vkd->vkGetLatencyTimingsNV(m_vkd->device(), m_swapchain, &markerInfo);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this should be fixed now.

}

void RemoveSwapchain(D3D11SwapChain* swapchain) {
std::remove(m_swapchains.begin(), m_swapchains.end(), swapchain);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::remove(m_swapchains.begin(), m_swapchains.end(), swapchain);
m_swapchains.erase(std::remove(m_swapchains.begin(), m_swapchains.end(), swapchain));

Found by clang-cl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks for running this change through clang-cl!



if (m_device->GetDXVKDevice()->features().nvLowLatency2) {
m_device->AddSwapchain(presenter.ref());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to miss a matching RemoveSwapchain call.

Jan200101 pushed a commit to Jan200101/NorthstarProton that referenced this pull request Jan 28, 2024
Requires at least Nvidia 545.xx drivers.

The patches are from the following pull requests:
[wine](ValveSoftware/wine#200)
[dxvk-nvapi](jp7677/dxvk-nvapi#147)
[vkd3d-proton](HansKristian-Work/vkd3d-proton#1739)
[dxvk](doitsujin/dxvk#3690)

Thanks also goes to @ptr1337 for initially building and testing the patchset.
@esullivan-nvidia esullivan-nvidia force-pushed the nv_low_latency2 branch 2 times, most recently from a7cdb7b to 88b323b Compare February 6, 2024 08:53
This commit add support for the VK_NV_low_latency2 extension, and
implements the ID3DLowLatencyDevice interface.
@esullivan-nvidia
Copy link
Contributor Author

Just like I mentioned in the vkd3d-proton PR, thank you for your patience in regards to the slow updates. Feel free to provide any additional feedback, or concerns and I will address them as quickly as possible. I am aware there are a couple of outstanding issues. I will get to those tomorrow. I think most of the major concerns have been resolved though.

if (!Repeat)
m_frameId += 1;
if (!Repeat) {
m_frameId = (m_presenter->lowLatencyEnabled() && m_device->getLatencyMarkers().present) ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things mandates that m_frameId is monotonically increasing:

  • VK_KHR_present_id: A non-zero presentId must be greater than any non-zero presentId passed previously by the application for the same swapchain.
  • CallbackFence: The helper works similarly to a timeline semaphore so the timeline value must be increasing.

I think we need to ensure that the value increase here. For present IDs, recreating the swapchain will do the job when we see a smaller ID. CallbackFence will need some more modification: I'll describe my idea below.

  • Make D3D9Swapchain use CallbackFence and remove Fence; the latter seems to be a superset of the former.
  • Move CallbackFence into Presenter, so that it can be easily recreated by the Presenter. Change accesses to e.g. D3D11Swapchain::m_frameLatencySignal to something like m_presenter->frameLatencySignal().
  • Make Presenter::destroySwapchain destroy the signal too, then also recreate the signal in Presenter::recreateSwapchain. (destroySwapchain waits for the current frame to be signaled.) Add a new parameter to Presenter::recreateSwapchain for the frame ID the signal should be initialized with. Also reset m_lastFrameId to the specified value.

cc @doitsujin in case they have feedback for the proposed refactor.

@ptr1337
Copy link

ptr1337 commented Feb 16, 2024

Hi,

We have updated on proton-cachyos the nvidia-reflex patches. One user is following a issue in apex legends.
The issue happens as soon Reflex + Boost gets enabled. Reflex only is okay.
The error message is:
There was a problem processing game logic

I have attached a Screenshot.

Reflex without boost is working without problems. 545 drivers.

Changes compared to last version:
CachyOS/CachyOS-PKGBUILDS@c09fd8f

If you need any further informations or testing, feel free to hit me or @A1RM4X

image

Edit: No issue.

@ishitatsuyuki
Copy link
Contributor

We have updated on proton-cachyos the nvidia-reflex patches. One user is following a issue in apex legends. The issue happens as soon Reflex + Boost gets enabled. Reflex only is okay. The error message is: There was a problem processing game logic

I have attached a Screenshot.

Reflex without boost is working without problems. 545 drivers.

Toggling Reflex causing game logic crash feels pretty unlikely so it might be good to test this multiple times to confirm that it's not a fluke.

If it's reproducible, I think attaching logs with PROTON_LOG=1 would be useful for confirming or ruling out exceptions within the driver.

@A1RM4X
Copy link

A1RM4X commented Feb 16, 2024

Fluke confirmed.

It is working now. It just crashed the first time for whatever reason.
Screenshot_20240216_14325-Reflex+Boost

@A1RM4X
Copy link

A1RM4X commented Feb 23, 2024

Tested with the latest 550 drivers: Cyberpunk 77 show the nvidia reflex option.

The option was not showing with the 545 drivers.
Screenshot_20240223_163929-CP77

@oscarbg
Copy link

oscarbg commented Mar 6, 2024

curious on the state of merging this PR.. as equivalent VKD3D and DXVK-NVAPI Reflex patches got merged already..

@ishitatsuyuki
Copy link
Contributor

Please don't make "state of merging" type of comments. The PR will be merged once all the concerns are addressed. Currently, the frame ID interaction with swapchain and CallbackFence needs to be fixed.

@ptr1337
Copy link

ptr1337 commented Mar 6, 2024

Please don't make "state of merging" type of comments. The PR will be merged once all the concerns are addressed. Currently, the frame ID interaction with swapchain and CallbackFence needs to be fixed.

We have also a user, which reports problems with this MR, if VRR is used together with Reflex in Overwatch. i can provide later the day some logs from him.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants