LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 07-23-2019, 04:28 PM   #1
ehereth
LQ Newbie
 
Registered: Jul 2017
Location: Chattanooga TN
Posts: 15

Rep: Reputation: Disabled
NVIDIA driver working on CentOS 7.2, but not for all users


Good day LQ!!

I have a confusing issue that I've wasted most of a day trying to debug.

I have a server (really, a cluster of severs, but I do not think that is relevant to the question) with a NVIDIA P100 installed in it. We have groups of researchers who run GPU enabled codes on these servers. There's one group running NAMD on these successfully, however, recently, they've added a few users and some of them seem to be unable to successfully run the code with errors like:

... CUDA driver version is insufficient for CUDA runtime version

However, other users are able to run the code (using the GPU) without any trouble.

Now, I've looked at length at pretty much anything that I can think of that might be different about these users:
  1. The users are using the exact same executable, options, and inputs
  2. Their environments are functionally identical ($SHELL, $PATH, $LD_LIBRARY_PATH, etc. the only things that are different are user specific stuff like $HOME etc.)
  3. Their permissions/groups are correct
  4. Running modinfo nvidia results in the exact same output for each user (the version of the driver is 361.93.03)
  5. The permissions of /dev/nvidia* are such that all users can see/use them

I've simplified about everything I can think about their use case; they normally try to access these servers using a job scheduler (sge, which can be complicated and confusing), but I've logged into one of the target servers as several of the users and can verify that some of them can run the code directly on the server and others cannot.

I'm at a loss and have run out of ideas; I would very much appreciate any help you may be able to give me the might point me to the reason why certain users cannot use the GPUs while others can. Please give me ideas!

Thank you all very much for your support and time!

Last edited by ehereth; 07-24-2019 at 05:43 PM.
 
Old 07-24-2019, 05:45 PM   #2
ehereth
LQ Newbie
 
Registered: Jul 2017
Location: Chattanooga TN
Posts: 15

Original Poster
Rep: Reputation: Disabled
ping

All, I'm not trying to be a pain; but I really need help with this. Does anybody have any ideas or perhaps a recommendation about where else I might post this question?
 
Old 07-24-2019, 06:32 PM   #3
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,725

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
Compare the .bashrc / ,profile / .bash_profile, etc. files?
 
Old 07-25-2019, 08:37 AM   #4
ehereth
LQ Newbie
 
Registered: Jul 2017
Location: Chattanooga TN
Posts: 15

Original Poster
Rep: Reputation: Disabled
scasey, thank you for your reply. While your suggestion didn't directly fix my problem, it did help me find that this particular application that we're trying to run has a secret hidden/dot file that it loads if it exists that I'd completely forgotten about. Once all users have this file, they can run the application.

The application did nothing at all to hint that this was the problem and the errors didn't indicate anything helpful either. Very frustrating and a crappy way to waste time!

Thanks again for helping me find this solution!

Cheers!
 
  


Reply

Tags
gpu, nvidia driver



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
nvidia-driver SBo The symbolic link '/usr/lib/libGL.so.1' does not point to 'tmp/SBo/package-nvidia-driver/usr/lib64/libGL.so.1' Gerardo Zamudio Slackware 5 07-30-2017 10:44 PM
[SOLVED] NFS on CentOS 6 can talk to Centos 5 but not Centos 6 clients deathsfriend99 Linux - Server 2 11-08-2013 02:33 PM
Recommended NVIDIA driver (v 180) not working with NVIDIA 6100 card MagicT Linux - Newbie 4 07-28-2009 07:39 AM
livna nvidia driver works with FC6 root but not other users terence8888 Linux - Software 7 06-30-2007 09:18 AM
How to remove Mandrake Galaxy Theming for all users and all new users. Zombie_Ryushu Mandriva 0 01-04-2005 05:38 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 09:22 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration