[SOLVED] Where do initial task/kernel/cpu scheduler values after cold boot come from?
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Where do initial task/kernel/cpu scheduler values after cold boot come from?
To make a long story short, I have two physical servers hosting two almost identical VMs, but one of these scales very badly in some workloads. And only some, not always in all workloads and even the problematic workloads work e.g. always after restarts. It starts to not work anymore after some time only. I'm unable to reproduce this problem in the other VM and am now comparing differences of the output of "sysctl -a". One of those differences addresses the task/kernel/cpu scheduler.
So I'm wandering where those different values come from initially?
E.g. if they are calculated, which facts they depend on, maybe the VM-host, if they change on runtime automatically etc. Estimations about if these concrete differences in my values could have any reasonable impact on overall scaling of a system most likely are welcome as well.
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524
Rep:
The initial values are based on your hardware and suggested default values. If the servers are different, ie. different memory, processor; the default values can be different. None of what I see here is going to make much real-world difference. I know the numbers look a lot different, but in real-world terms they're not.
In theory the VM-hosts are completely identical, CPUs, memory, HDDs etc., only the load and number of VMs are different. I guess that has an influence as well at least on some different numbers and is taken into account when a VM starts?
Besides that, I know for sure now that most of the differences simply come from the fact that one VM had 2 and the other 8 vCPUs at the moment I executed "sysctl -a". There's most likely no wrong global setting or such, like I assumed. I see exactly the same different values for e.g. "*_interval" and "sched_*_ns" using some Ubuntu 14.04 I had on my desktop in VMware Workstation with 2 and 8 vCPUs. Complete different hardware, VMs etc., same numbers.
Last edited by ams_tschoening; 04-20-2018 at 03:58 AM.
Huh ?.
You said the guests were "almost identical" !. I was going to ask if that was like "almost pregnant".
300% more (nominal) compute power is not almost identical.
What about memory ?. Swapping, I/O contention ... You need to look (initially) at the macro, not micro knobs like sysctls.
I had tested the VMs with exactly the same hardware and wasn't able to reproduce my problem, afterwards I reduced the hardware of the test-VM by purpose to 2 vCPUs, its default RAM etc. again to see what happens if I put the same load as with 8 vCPUs on it. Nothing happened, the same strange "slowness" I see in the production VM didn't occur even with far less computing power. Things only took longer of course, but the system was responding as expected. And because of that I decided to compare the settings I had at that moment, because I really thought that it would make spotting important differences easier and the problem doesn't seem to depend on raw computing power.
htop, sar, iostat etc. didn't reveal any obvious bottleneck. No swapping occurred, plenty of RAM available at all and free, used for caches and buffers etc. The only thing looking somewhat strange sometimes are the numbers of context switches of both VM-hosts under load when the problem occurs, but that's exactly why I had a look at sysctl and compared things.
All that info should have been in the initial post.
No hard data, so impossible to hazard a guess as to what is happening. Abnormal context switch counts I usually suspect as driver/interrupt handler issues. But it can be CPU cache miss, TCP queuing, who knows.
But I don't use hipervisors unless I have to - and you haven't even indicated which you use.
I wasn't asking for general debugging help by purpose, because it's very likely that such an unspecific question brings me nowhere. I prefer to collect data on my own for now and ask very specific questions about things I don't understand or seem to be strange.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.