LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 12-20-2023, 02:25 PM   #1
h2-1
Member
 
Registered: Mar 2018
Distribution: Debian Testing
Posts: 562

Rep: Reputation: 320Reputation: 320Reputation: 320Reputation: 320
Data needed for AMD threadripper 16 core 2950x CPU


Someone posted an issue about incorrect core counts for inxi CPU with AMD Threadripper 2950x 16 core CPU. But then decided not to provide the required data to debug the issue. Rather than wait, figured I'd see if anyone here can provide it.

This is one that has 2 dies, but to correct the issue I need these two files:

Code:
for i in $(find /sys/devices/system/cpu/ -type f); do echo ${i}::$(cat $i 2>/dev/null);done | sort > cpu-data-sys.txt; cp -f /proc/cpuinfo cpu-data-cpuinfo.txt
That crudely produces the data in a format inxi can digest to emulate the issue. Just upload those somewhere and provide the links to the two.

Note it has to be a 16 core or greater Zen+, the Zen 2 3990x does not appear to have this issue, I've got a set of that data from a 64 core version and tested it, and it's fine. AMD clearly did a subtle change to how they report the die, because the 3990x shows only 1 die, when it should have 4, but keeping up with how these are actually made is hard.

If anyone has this CPU and can provide this data, much appreciated.

You can see if your cpu has the issue by testing with current inxi: inxi -Cxxx

and if it shows half the actual CPU core count, it does. Note it will show the proper speed per cores, 2x the actual physical cores, but will say it's 1/2 the actual physical cores, with 2 dies, or more of it's more than 16 cores.

This is corner case, but if you should happen to have this exact CPU, appreciate the data.

This is a scenario that may have slipped by in the initial big CPU refactor we did here a while back.

Last edited by h2-1; 12-20-2023 at 02:31 PM.
 
Old 12-21-2023, 05:54 AM   #2
guanx
Senior Member
 
Registered: Dec 2008
Posts: 1,183

Rep: Reputation: 237Reputation: 237Reputation: 237
I'm not even sure if the reported CPU-topology has to be the same as the actual hardware. Some BIOS may have topology options to assist HPC application placement.

Did you try to get these kind of information from hwloc?
 
Old 12-21-2023, 05:25 PM   #3
h2-1
Member
 
Registered: Mar 2018
Distribution: Debian Testing
Posts: 562

Original Poster
Rep: Reputation: 320Reputation: 320Reputation: 320Reputation: 320
I need the above data, which I did not get, so there's no my getting the data, I do not have this CPU. It's strictly empirical, inxi needs the raw data to run through to find what odd behavior is tripping this. it's a case, corner obviously given how many cpu data samples I got here when doing the main CPU feature a few years back, but this one was never exposed, probably because nobody had this exact 16 core version of that CPU, which has clearly 2 dies.

I will have to check the logic without the data if worst comes to worst, but it's almost impossible to figure out without the data tripping the issue.

I doubt this is HPC, it's almost certainly a consumer desktop running the Threadripper 2950x 16 core cpu.

The fact that it showed 2 dies means it got that data, so maybe I'll just have to patiently review the code and see if I can find a place where the die fails to properly register, my guess is it's getting for some reason only the cores per die for this specific cpu, not the core total, but why that is is beyond me since it's been tested on other multi die amd cpus of that era, like Epyc, and there is no issue. It could be literally some oddity with literally only this cpu release, or how it's reporting itself to the kernel, I just don't know.
 
Old 12-21-2023, 06:00 PM   #4
guanx
Senior Member
 
Registered: Dec 2008
Posts: 1,183

Rep: Reputation: 237Reputation: 237Reputation: 237
That's fair. It's nice to have an alternative codebase than hwloc for the purpose of divergence.

I just think it's not worth the time to look for someone here to have the exact hardware, firmware, and operating system as the bug reporter's, and even here they are, it will be impossible to to check for the sameness because the bug reporter will not disclose this information.
 
Old 12-21-2023, 06:03 PM   #5
h2-1
Member
 
Registered: Mar 2018
Distribution: Debian Testing
Posts: 562

Original Poster
Rep: Reputation: 320Reputation: 320Reputation: 320Reputation: 320
Reviewing the code, it looks quite possible that the issue is inxi is counting core ids, which except for this cpu, as far as I know, are numbered 0-xx, regardless of the dies, only changing when it's a new physical cpu, but looking very carefully I believe because no samples of restart core id numbering per die ever appeared, inxi did not handle the numbering for that case.

No intels did this either, but it looks like I have to extend the overall logic to try to include one more layer, dies, which will be difficult and non trivial if that's the cause.

I need data however from a system that ran the inxi debugger after I added that debugging data type during the cpu refactor, which I think was 2021, I'll check.

Very hard to do, and risky without verifying it doesn't break anything else that is working as is.

But maybe necessary.

There's not a lot of variables involved, it only requires the Linux kernel and that cpu, which either somebody has or they don't have. It's just a shot, not a bad one, since all I need is one person on one forum to provide the data. If I don't get it, I'm just in the same place I was before asking, except maybe I've looked at a bit more, though theorizing about stuff like this isn't a great idea, but i do see a possible issue, all I need to do is find at least one multi die ryzen of the zen/zen+ era that manifests this result, but that's hit or miss too, zgrep only takes me so far going through datasets, but I will probably focus on the right time frame for the debuggers, and amd zen cpus.

It does not appear they did this in Zen2 however.

Last edited by h2-1; 12-22-2023 at 12:29 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
New threadripper desktop: random crashes/freezes asheshambasta Linux - Hardware 13 04-01-2020 02:18 AM
AMD Threadripper 2990wx freezing on very high load magogo200 Linux - Hardware 2 02-27-2019 02:26 PM
Is ASUS ROG Zenith Extreme AMD Ryzen Threadripper TR4 motherboard compatible with Ubuntu? younglinuxuser Ubuntu 1 05-01-2018 09:38 PM
Have those issues related to Threadripper and PCIe 3.0 been fixed? younglinuxuser Linux - Hardware 0 04-29-2018 11:45 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 04:26 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration