Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL 2 uses half the number of cores on AMD Threadripper 3990X #5423

Open
AAlMutairi opened this issue Jun 16, 2020 · 145 comments
Open

WSL 2 uses half the number of cores on AMD Threadripper 3990X #5423

AAlMutairi opened this issue Jun 16, 2020 · 145 comments
Labels

Comments

@AAlMutairi
Copy link

AAlMutairi commented Jun 16, 2020

Environment

Windows build number: Microsoft Windows [Version 10.0.19041.329]
Your Distribution version: Ubuntu: 20.04
WSL 2

Steps to reproduce

I am using AMD threadripper 3990x in my PC. when I use the command lscpu I get the following

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              2
Core(s) per socket:              32
Socket(s):                       1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD Ryzen Threadripper 3990X 64-Core Processor
.
.
.

Also when I use the command nproc, I get 64.

However, using both openmpi and mpich to run parallel job, mpi uses only 32 cores (half real cores). For this test I used the following code (copied from: https://mpitutorial.com/tutorials/mpi-hello-world/)

#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);

    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    // Get the name of the processor
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);

    // Print off a hello world message
    printf("Hello world from processor %s, rank %d out of %d processors\n",
           processor_name, world_rank, world_size);

    // Finalize the MPI environment.
    MPI_Finalize();
}

Expected behavior

.
.
.
Hello world from processor Ubuntu, rank 10 out of 64 processors
Hello world from processor Ubuntu, rank 18 out of 64 processors
Hello world from processor Ubuntu, rank 23 out of 64 processors
.
.
.

Actual behavior

.
.
.
Hello world from processor Ubuntu, rank 10 out of 32 processors
Hello world from processor Ubuntu, rank 18 out of 32 processors
Hello world from processor Ubuntu, rank 23 out of 32 processors
.
.
.
@AAlMutairi
Copy link
Author

Not sure if it is relevant but I am experiencing the same issue in hyper-V too.

@WSLUser
Copy link

WSLUser commented Jun 19, 2020

It's the kernel config. Look at https://github.com/microsoft/WSL2-Linux-Kernel/tree/master/Microsoft. In the config for x86_64 you will see it's set to 64. This is standard from the Linux kernel. What you can do is update the config to match the number of cores you have. Ideally WSL would do a check on the number of CPU cores and update the config appropriately in .wslconfig. For now this is a manual process.

@AAlMutairi
Copy link
Author

@WSLUser , thanks for the answer. Just to confirm, you meant updating config-wsl since I couldn't find .wslconfig. if this is the case, I believe the part of interest is the following:

CONFIG_NR_CPUS_RANGE_BEGIN=2
CONFIG_NR_CPUS_RANGE_END=512
CONFIG_NR_CPUS_DEFAULT=64
CONFIG_NR_CPUS=256

Are you suggesting that despite the range, WSL 2 uses the default value as a max value?

My apologies if I misunderstood your suggestion.

@WSLUser
Copy link

WSLUser commented Jun 19, 2020

You have to create .wslconfig. https://docs.microsoft.com/en-us/windows/wsl/release-notes#build-18945

@AAlMutairi
Copy link
Author

AAlMutairi commented Jun 19, 2020

@WSLUser thank you again for the suggestion and sorry for the misunderstanding on my part. I tried your method to change the number of processors. It works when I decrease the number of processors but unfortunately, it doesn't work passed the 64 processor (which is equivalent to 32 physical processors). it seems to still limit me to half of number of physical cores (64/2 = 32).

@sanastasiou
Copy link

@WSLUser Does this work also for multiple CPUs? i.e. Dual Xeon setup?

@WSLUser
Copy link

WSLUser commented Jun 19, 2020

Not sure. @craigloewen-msft would probably know better. In your case it appears the kernel config itself needs updating. You should be able to override the original value in .wslconfig as well. You should see the option in the release notes. And yes (sorry I didn't answer before), it's using the default value. So you'll overwrite it. I don't recommend going above 256.

@AAlMutairi
Copy link
Author

@WSLUser Thanks for all the help, I guess I will wait for the kernel to be updated.
@benhillis @therealkenc, would you be able to let us know if such fix to the kernel will be added to the next build?
@sanastasiou Did you have the chance to try the .wslconfig method?

@sanastasiou
Copy link

@AAlMutairi not yet,not sure if it applies to dual cpu setups as well. If it does, I'll try.

@AAlMutairi
Copy link
Author

Any updates or fixes to test?

@mozram
Copy link

mozram commented Jun 30, 2020

It affect compiling also when running make -j. Only half of CPUs used whereas WSL1 does not have this issue. Ryzen 2600, Ubuntu 20.04 WSL2

@sanastasiou
Copy link

This basically blocks any usage of WSL 2, even if I check out my repo there, I lose 50% of my processing power.. That's simply a no go.

@AAlMutairi
Copy link
Author

@mozram, it is surprising that it was working for you in WSL 1. Unfortunately for me, Both don't work for me.

@AAlMutairi
Copy link
Author

@sanastasiou , hopefully any fix can work for both WSL 2 and hyper-V since the issue persist in both.

@AAlMutairi
Copy link
Author

Interestingly, even when I used mpich on windows, it only sees 32 physical cores. I guess this issue isn't just limited to WSL or hyper-V

@AAlMutairi
Copy link
Author

Any updates?

@sanastasiou
Copy link

Changing WSL config has 0 effect whatsoever. 2nd CPU is not recognized.

@AAlMutairi
Copy link
Author

AAlMutairi commented Jul 13, 2020

I tried contacting AMD customer support about the issue and if they have any fixes but to no avail.

@onomatopellan
Copy link

Ben said "I am already looking into this, AMD brought this to my attention as well."
So be patient.

@AAlMutairi
Copy link
Author

Just to help narrow the issue, this issue seems to effect the 3990x alone since John from the AMD community test running WSL2 on his 3970x and got the following results:
pastedImage_1

it shows it detected all 32 physical cores (shown next to cores per socket) and all 64 logical cores (next to CPUs). not sure how helpful it is, but I thought it might help.

@sanastasiou
Copy link

Not quite true, I have a dual xeon setup and it only detects one of them. So it doesn't affect only 3990X

@AAlMutairi
Copy link
Author

@sanastasiou , my apologies, I meant within the AMD thread ripper line, only the 3990x is affected. by the way, did you test if the same issue persists when you use hyper-V? because it is the case for me.

@ykim362
Copy link
Member

ykim362 commented Jul 22, 2020

I have the same issue with Intel Xeon. I have two 6242R CPUs (2 sockets), and only 1 socket is available from WSL 2.

@AAlMutairi
Copy link
Author

@ykim362 Which Windows are you using? Do you have the same issue with hyper-V?

@sanastasiou
Copy link

@AAlMutairi how do I enable/how can I check this with Hyper-V?

@AAlMutairi
Copy link
Author

@sanastasiou it is similar to WSL in which you enable it through the "Turn Windows features on or off" as shown here:
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/quick-start/enable-hyper-v#enable-the-hyper-v-role-through-settings

Then use the "Hyper-V quick create" as shown here (based on old windows but it is still the same):
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/quick-start/quick-create-virtual-machine

I guess you can install an Ubuntu VM for now.

if you right click on the VM you can access it setting and change the number of cores and sockets you want. Then you can run and test.

@ykim362
Copy link
Member

ykim362 commented Jul 22, 2020

@AAlMutairi I was able to configure the number of virtual cores (2 x physical cores) with Hyper-V (on windows 10 enterprise). But, I am not sure it's really using all CPUs, or just doing virtually showing 2x more cpus. It was 40 logical cores (20 physical cores) by default, and even after I increased the number to 80 logical cores, it only shows as 1 socket.

lscpu

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 40
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz
Stepping: 7
CPU MHz: 3092.733
BogoMIPS: 6185.46
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-79

@AAlMutairi
Copy link
Author

@alice-comfy, would it be okay for you to share which version of Windows you are using? and regarding hyper-V, have you tested that it actually uses all 96 physical cores you allocated? Windows can be a bit weird when it comes to virtualisation.

@alice-comfy
Copy link

alice-comfy commented Nov 15, 2023

Windows 11 22h2 Build 22621.2715.

Ubuntu detects 96 threads when I send them through. I changed it to 192 and that works too.

Hyper-V: 192 CPUs
Threads per core: 2
Cores per socket: 96
WSL:
64 CPUs
Threads per core: 2
Cores per socket: 32

@alice-comfy
Copy link

alice-comfy commented Nov 15, 2023

image Task manager while running Y Cruncher on linux shows a bit over 50% utilization, which implies it's actually using >32 cores, despite reading in WSL as 32 cores, 64 threads. I will test this again later with SMT off, but this should show that at least the scheduler is smart enough to give it 64 physical cores rather than 32/64.

That perfectly matches up with the score difference between Windows & Linux. 56%. Seems like the scheduler is smart enough then to avoid a negative SMT hit.

@mistergitj
Copy link

alice-comfy, I got involved very early in this issue. Since you are running an AMD processor, I suggest you install and run Ryzen Master. It will tell you exactly what cores/threads are doing what. Enjoy, John.

@alice-comfy
Copy link

Can't use Ryzen Master on EPYC (that ES is a 9654 96 core chip, I use it as a workstation CPU). I also tried canary builds to see if later versions of windows had it fixed, and it hits the same snag. Windows Sandbox is also limited to 64 threads by default, but as shown above Hyper-V works great with any number of cores.

From testing with VMs, it will use either 64 threads if you have >64 threads on a single socket, or the number of threads a single CPU has if you have a dual socket system.

paging @craigloewen-msft in case he has any insights here.

@alice-comfy
Copy link

Hoping with this new release we might see more work on this issue.

https://www.phoronix.com/review/threadripper-7995wx-windows-linux

@uniartisan
Copy link

Device name Server
Processor Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz 2.40GHz (2 processors)
Machine with RAM 192 GB
Version Windows 11 Professional Workstation Edition
Version 23H2
Installation date: ‎2023/‎12/‎5
Operating system version 22631.2792
Experience Windows Feature Experience Pack 1000.22681.1000.0
wsl --version
WSL version: 2.0.9.0
Kernel version: 5.15.133.1-1
WSLg version: 1.0.59
MSRDC version: 1.2.4677
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22631.2792

lscpu

lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  56
  On-line CPU(s) list:   0-55
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
    CPU family:          6
    Model:               79
    Thread(s) per core:  2
    Core(s) per socket:  28
    Socket(s):           1
    Stepping:            1
    BogoMIPS:            4788.90
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse ss                         e2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid p                         ni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c
                         rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow
                         vnmi ept vpid ept_ad fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt                          flush_l1d arch_capabilities
Virtualization features:
  Virtualization:        VT-x
  Hypervisor vendor:     Microsoft
  Virtualization type:   full
Caches (sum of all):
  L1d:                   896 KiB (28 instances)
  L1i:                   896 KiB (28 instances)
  L2:                    7 MiB (28 instances)
  L3:                    35 MiB (1 instance)
Vulnerabilities:
  Gather data sampling:  Not affected
  Itlb multihit:         KVM: Mitigation: VMX disabled
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
  Retbleed:              Not affected
  Spec rstack overflow:  Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS
                         Not affected
  Srbds:                 Not affected
  Tsx async abort:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown

htop

image

I have 2 sockets (2 cpus) each with 14 cores and 28 threads, and now I can recognize all the cores in wsl2 pretty well, but numa can't recognize 2. This also causes all numa2 kernels to be identified as numa1 hyperthreads. This has a significant impact on performance. I think this is a big problem, although I can manually specify make-j56 to use all the threads, but for a program like pytorch it can only fully utilize the first processor. The second should be treated as hyperthreading, is there a way to fix this?

@xieyubo
Copy link

xieyubo commented Dec 16, 2023

I think this is a wsl's bug. I think in wslservice.exe, it might invoke GetSystemInfo() or GetLogicalProcessorInformation() api to get how many cores and pass this number to HcsCreateComputeSystem() api. GetSystemInfo()/GetLogicalProcessorInformation() can only get the number of cores in current cpu group. In windows, the max cores in a single cpu group is 64, so the vm created by HcsCreateComputeSystem() for WSL only has 64 cores at max.

It's better to invoke GetLogicalProcessorInformationEx() to get number of cores. This api supports get all cores cross cpu groups. I have a hack to resolve this issue: xieyubo/WSL2@9bdce81

Now all cores works:

image

@alice-comfy
Copy link

Absolutely amazing! How'd you get the code for the exe, I thought that was closed source?

@xieyubo
Copy link

xieyubo commented Dec 16, 2023

Absolutely amazing! How'd you get the code for the exe, I thought that was closed source?

I don't have the source code. That exe just invokes windows api to create VM. These apis exist in computecore.dll. I just create a dll with the same name and put it besides that exe. So that exe will load my dll instead of the system dll. In my dll, I will load system dll. So I have the chance to change the parameter and pass to the system dll.

@Vincentzyx
Copy link

Absolutely amazing! How'd you get the code for the exe, I thought that was closed source?

I don't have the source code. That exe just invokes windows api to create VM. These apis exist in computecore.dll. I just create a dll with the same name and put it besides that exe. So that exe will load my dll instead of the system dll. In my dll, I will load system dll. So I have the chance to change the parameter and pass to the system dll.

Hi, have you try to stress the cpu to check if it can truely utilize all the cores? And maybe consider @ someone from the wsl team?

@xieyubo
Copy link

xieyubo commented Dec 18, 2023

Yes, I believe it can truely utilize all the cores, but you need change the hyperV scheduler type to "Core". I added an instruction about how to do it: xieyubo/WSL2@41511c9

image

@AAlMutairi
Copy link
Author

Amazing work @xieyubo, I do not have time to test it now, but I will share an update once I test it. Looking forward for others testing this workaround.

@benhillis we would appreciate your feedback on this workaround.

@Vincentzyx
Copy link

Vincentzyx commented Dec 18, 2023

7f5f17384427c12b4b5ee4c17fcfa5ec
c11042e9fbd6621ae53092bc5a09ed1b
@xieyubo 's solution, tested on my 2*Intel Xeon E5-2696v4, all 88 threads are detected.
@AAlMutairi

@AAlMutairi
Copy link
Author

@Vincentzyx thank you for testing it. So it clearly works. I am excited to test it on my setup. Might test the performance of my simulations with this vs when I disable SMT.

@alice-comfy
Copy link

image Can validiate it works with the AMD EPYC as well. image

@Vincentzyx
Copy link

@therealkenc Hi, someone seems to solve this issue, see messages above

@geth03
Copy link

geth03 commented Jan 25, 2024

hi @xieyubo , i have been trying to follow your instructions. however, my WSL does not have any wslservice.exe nor do i have a directory called WSL anywhere. I installed my Ubuntu from the windows store, so installation data is in Program Files\WindowsApps. However, nowhere can i find any WSL directory or wslservice.exe.
It would be great if you can help me in this regard.

@xieyubo
Copy link

xieyubo commented Feb 6, 2024

@geth03 maybe you can try to delete the WSL which you installed from the windows store, and re-install it from "Control Panel -> Programs -> Programs and Features -> Turn Windows features on or off -> Windows Subsystem for Linux"

@nguyentrangiabao05
Copy link

@geth03 maybe you can try to delete the WSL which you installed from the windows store, and re-install it from "Control Panel -> Programs -> Programs and Features -> Turn Windows features on or off -> Windows Subsystem for Linux"

I think this is a wsl's bug. I think in wslservice.exe, it might invoke GetSystemInfo() or GetLogicalProcessorInformation() api to get how many cores and pass this number to HcsCreateComputeSystem() api. GetSystemInfo()/GetLogicalProcessorInformation() can only get the number of cores in current cpu group. In windows, the max cores in a single cpu group is 64, so the vm created by HcsCreateComputeSystem() for WSL only has 64 cores at max.

It's better to invoke GetLogicalProcessorInformationEx() to get number of cores. This api supports get all cores cross cpu groups. I have a hack to resolve this issue: xieyubo/WSL2@9bdce81

Now all cores works:

image

Hi. I have been running the cmake and no errors have occurred. However, I cant find the computecore.dll in the build directory.

1 similar comment
@nguyentrangiabao05
Copy link

@geth03 maybe you can try to delete the WSL which you installed from the windows store, and re-install it from "Control Panel -> Programs -> Programs and Features -> Turn Windows features on or off -> Windows Subsystem for Linux"

I think this is a wsl's bug. I think in wslservice.exe, it might invoke GetSystemInfo() or GetLogicalProcessorInformation() api to get how many cores and pass this number to HcsCreateComputeSystem() api. GetSystemInfo()/GetLogicalProcessorInformation() can only get the number of cores in current cpu group. In windows, the max cores in a single cpu group is 64, so the vm created by HcsCreateComputeSystem() for WSL only has 64 cores at max.

It's better to invoke GetLogicalProcessorInformationEx() to get number of cores. This api supports get all cores cross cpu groups. I have a hack to resolve this issue: xieyubo/WSL2@9bdce81

Now all cores works:

image

Hi. I have been running the cmake and no errors have occurred. However, I cant find the computecore.dll in the build directory.

@xieyubo
Copy link

xieyubo commented Feb 21, 2024

cmake and no errors have occurred. However, I cant find the computecore.dll i

Do you do the step 2? "Open build/Project.sln and generate a x64 release build." :)

@nguyentrangiabao05
Copy link

I’m sorry, I feel so dumb because I can’t find the Project.sln file in the build folder. :(( It would be great if you can help me in this regard.
image

@xieyubo
Copy link

xieyubo commented Feb 21, 2024

I’m sorry, I feel so dumb because I can’t find the Project.sln file in the build folder. :(( It would be great if you can help me in this regard. image

You need run cmake udner widnows and you need have visual studio installed. WSL service is a windows process.

@nguyentrangiabao05
Copy link

I’m sorry, I feel so dumb because I can’t find the Project.sln file in the build folder. :(( It would be great if you can help me in this regard. image

You need run cmake udner widnows and you need have visual studio installed. WSL service is a windows process.

Thank you for your helping. I have been running the cmake under the Windows terminal. However i can find the method to generate a x64 release build from the Project.sln. Would you mind to give more instruction about this action?

@xieyubo
Copy link

xieyubo commented Feb 22, 2024

Thank you for your helping. I have been running the cmake under the Windows terminal. However i can find the method to generate a x64 release build from the Project.sln. Would you mind to give more instruction about this action?

You can launch Project.sln by Visual Studio, Chose Release and x64 on the toolbar, then click Build menu select Build Solution

image

@aramor
Copy link

aramor commented Feb 29, 2024

Thank you for your helping. I have been running the cmake under the Windows terminal. However i can find the method to generate a x64 release build from the Project.sln. Would you mind to give more instruction about this action?

You can launch Project.sln by Visual Studio, Chose Release and x64 on the toolbar, then click Build menu select Build Solution

image

Can you please send compiled dll? Cant undestand how to compile by myself

@niltecedu
Copy link

Hey guys got a dual Intel Xeon Gold 6430 with the same problem only utilising 64 out of my 128 cores; any chance that this pushed to upstream? We cant really compile from scratch as its a corporate environment however it still a slightly bigger issue for us as it wont be approved in a package push

@alice-comfy
Copy link

I'm hoping for some positive news given David Cutler mentioned in his interview with Dave's Garage that he's rocking a 96 core PC as his personal system. Given the need for either a Hyper-V change, or to override the scheduler (from root to core), I don't think this will happen before the next major version (25H2? or 12)

@niltecedu
Copy link

Its just a overriding the scheduler thats the issue, you also need that custom dll to capture all cores, hoping they fix it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests