Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] export failure with CUDA driver < 526 and pynvml>=11.5.0 #1537

Closed
wants to merge 1 commit into from

Conversation

CoderHam
Copy link

@CoderHam CoderHam commented May 3, 2024

  • There is a bug that was fixed in the 526 driver release. For older driver versions the recommendation is to downgrade the pynvml version to 11.4.0 and use 11.5.0 only for drivers after 526.

Uses the legacy pynvml memory usage function even with pynvml 11.5.0 if the driver version is older than 526.

Mentioned in the issue as well: #808 (comment)

- There is a bug that was fixed in the 526 driver release. For older driver versions the recommendation is to downgrade the pynvml version to 11.4.0 and use 11.5.0 only for drivers after 526.

Uses the legacy pynvml memory usage function even with pynvml 11.5.0 if the driver version is older than 526.

Mentioned in the issue as well: NVIDIA#808 (comment)
@jaedeok-nvidia
Copy link

Thanks for addressing the pynvml issue, relating to a driver version. @CoderHam can I know which doc(or link) you referred to determine the driver version (526)?

@CoderHam
Copy link
Author

@jaedeok-nvidia took a while to dig through it but I followed the thread from https://forums.developer.nvidia.com/t/nvml-bug-nvmldevicegetcomputerunningprocesses-returns-compute-processes-for-all-gpu-devices/222337/2 and NVIDIA/k8s-device-plugin#331 (comment)

This confirmed that the issue with missing symbols in the underlying nvml libraries prevents us from using the v2 api prior to driver 526.

@kaiyux kaiyux mentioned this pull request May 28, 2024
@kaiyux
Copy link
Member

kaiyux commented May 28, 2024

Hi @CoderHam , the changes are integrated in #1688 and we've credited you as co-author, hence I'm closing this PR now, thanks a lot

@kaiyux kaiyux closed this May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants