Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add x86_64 archspec support for Windows #3803

Merged
merged 4 commits into from
Feb 24, 2025

Conversation

jjerphan
Copy link
Member

@jjerphan jjerphan commented Feb 5, 2025

Fix #3795.

Verified

This commit was signed with the committer’s verified signature.
jjerphan Julien Jerphanion
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
@jjerphan jjerphan added the release::bug_fixes For PRs fixing bugs label Feb 5, 2025
@jjerphan jjerphan requested a review from Klaim February 5, 2025 12:37
@jjerphan jjerphan marked this pull request as ready for review February 5, 2025 12:37
@intentionally-left-nil
Copy link

Hi Mamba folks, is there any particular testing that would be helpful to get this PR over the line? Or is it just waiting for someone to review it

@jjerphan
Copy link
Member Author

@intentionally-left-nil: we need someone to test and review it, ideally on several Windows machines with different architectures.

We welcome your feedback greatly! 🙂

Copy link
Member

@JohanMabille JohanMabille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM although having feedback from people who can test it would be nice.

The nitpicking comments can be ignored if you prefer to keep the exhaustive check list for each version of x86_64.

bool avx512cd = cpu_info[1] & (1 << 28);
bool avx512bw = cpu_info[1] & (1 << 30);
bool avx512vl = cpu_info[1] & (1 << 31);
if (avx512f && avx512dq && avx512cd && avx512bw && avx512vl)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking: avx512 dq/cd/bw/vl are all implemented on top of avx512f, so no need to check for this value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I propose to keep the logic in line with the one for unix system as currently implemented a few lines above because it is cheap and it does not make assumption (also, I would take time to make sure that the above is correctly implemented, because more instructions seem to be present for each architecture).

bool bmi2 = cpu_info[1] & (1 << 8);
bool fma_ = cpu_info[1] & (1 << 12);
bool avx = cpu_info[1] & (1 << 28);
if (bmi && avx2 && bmi2 && fma_ && avx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar remark here: avx2 is implmeented on top of avx, so no need to check for avx.

bool sse41 = cpu_info[1] & (1 << 19);
bool sse42 = cpu_info[1] & (1 << 20);
bool popcnt = cpu_info[1] & (1 << 23);
if (sse3 && ssse3 && sse41 && sse42 && popcnt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar remark here: sse42 implies sse41 which implies ssse3 which implies sse3,so checking sse42 should be enough (I haven't checked for popcnt).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark here, the support for all these instructions should be read from cpu_info[2] after call to cpuid with eax=1.

Verified

This commit was signed with the committer’s verified signature.
jjerphan Julien Jerphanion
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>

Co-authored-by: Johan Mabille <johan.mabille@gmail.com>
@Klaim
Copy link
Member

Klaim commented Feb 13, 2025

I tested on my Windows machine, it has an AMD Ryzen 9 5900X , running Windows 11.

I built this branch locally using Visual Studio (preview) v17.13.0-pre.2.1, compiler version 19.43.34618 for x64.
I obtained:

mamba info -v

       libmamba version : 2.0.6
          mamba version : 2.0.6
           curl version : libcurl/8.11.1 Schannel zlib/1.3.1 libssh2/1.11.1
     libarchive version : libarchive 3.7.7 zlib/1.3.1 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.6
       envs directories : C:\Users\Klaim\micromamba\envs
          package cache : C:\Users\Klaim\micromamba\pkgs
                          C:\Users\Klaim\.mamba\pkgs
                          C:\Users\Klaim\AppData\Roaming\.mamba\pkgs
            environment : mamba-pr-testing (active)
           env location : C:\Users\Klaim\micromamba\envs\mamba-pr-testing
      user config files : C:\Users\Klaim\.mambarc
 populated config files :
       virtual packages : __win=10.0.22631=0
                          __archspec=1=x86_64_v2
                          __cuda=12.7=0
               channels : https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/conda-forge/win-64
       base environment : C:\Users\Klaim\micromamba
               platform : win-64

I was told __archspec should be v3 for this case? (I have no idea what is the correct value for this cpu)

@jjerphan
Copy link
Member Author

AMD Ryzen 9 5900X qualifies for x86_64_v3.

@nbeerbower
Copy link

Perhaps I compiled something wrong but my results on Windows 11 with an i9-13950HX:


       libmamba version : 2.0.6.rc2
          mamba version : 2.0.6.rc2
           curl version : libcurl/8.11.1 Schannel zlib/1.3.1 libssh2/1.11.1
     libarchive version : libarchive 3.7.7 zlib/1.3.1 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.6
       envs directories : C:\Users\Nick\.local\share\mamba\envs
          package cache : C:\Users\Nick\.local\share\mamba\pkgs
                          C:\Users\Nick\.mamba\pkgs
                          C:\Users\Nick\AppData\Roaming\.mamba\pkgs
            environment : build_env (active)
           env location : C:\Users\Nick\.local\share\mamba\envs\build_env
      user config files : C:\Users\Nick\.mambarc
 populated config files :
       virtual packages : __win=10.0.26100=0
                          __archspec=1=x86_64
                          __cuda=12.4=0
               channels : https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/conda-forge/win-64
       base environment : C:\Users\Nick\.local\share\mamba
               platform : win-64

Compiler: Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30154 for x64

@JohanMabille
Copy link
Member

@nbeerbower I think you're testing the wrong binary, the output version is 2.0.6.rc2, while it should be 2.0.6.

if (cpu_info[0] < 7)
{
return "x86_64";
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of this, eax>7 is required to get supported instructions for v4, but instructions for lower versions (such as v3) are checked with a lower value of eax (see my other comment below).

bool avx2 = cpu_info[1] & (1 << 5);
bool bmi2 = cpu_info[1] & (1 << 8);
bool fma_ = cpu_info[1] & (1 << 12);
bool avx = cpu_info[1] & (1 << 28);
Copy link
Member

@JohanMabille JohanMabille Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The support for fma and avx should be checked after a call to cpuid with eax = 1, and read from ecx (i.e. cpu_info[2]), see https://en.wikipedia.org/wiki/CPUID#EAX=1:_Processor_Info_and_Feature_Bits

Sorry for not catching it during the first review.

bool sse41 = cpu_info[1] & (1 << 19);
bool sse42 = cpu_info[1] & (1 << 20);
bool popcnt = cpu_info[1] & (1 << 23);
if (sse3 && ssse3 && sse41 && sse42 && popcnt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark here, the support for all these instructions should be read from cpu_info[2] after call to cpuid with eax=1.

jjerphan and others added 2 commits February 14, 2025 10:17

Verified

This commit was signed with the committer’s verified signature.
jjerphan Julien Jerphanion
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>

Co-authored-by: Johan Mabille <johan.mabille@gmail.com>

Verified

This commit was signed with the committer’s verified signature.
jjerphan Julien Jerphanion
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
@nbeerbower
Copy link

My bad! I cloned the fork repo but didn't checkout the PR branch.

       libmamba version : 2.0.6
          mamba version : 2.0.6
           curl version : libcurl/8.11.1 Schannel zlib/1.3.1 libssh2/1.11.1
     libarchive version : libarchive 3.7.7 zlib/1.3.1 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.6
       envs directories : C:\Users\Nick\.local\share\mamba\envs
          package cache : C:\Users\Nick\.local\share\mamba\pkgs
                          C:\Users\Nick\.mamba\pkgs
                          C:\Users\Nick\AppData\Roaming\.mamba\pkgs
            environment : build_env (active)
           env location : C:\Users\Nick\.local\share\mamba\envs\build_env
      user config files : C:\Users\Nick\.mambarc
 populated config files :
       virtual packages : __win=10.0.26100=0
                          __archspec=1=x86_64_v3
                          __cuda=12.4=0
               channels : https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/conda-forge/win-64
       base environment : C:\Users\Nick\.local\share\mamba
               platform : win-64

Looks like it works perfectly now! Let me know if you'd like me to test anything else.

@jjerphan
Copy link
Member Author

Thank you for testing, @nbeerbower.

I would at least wait for another review from Joël who also uses Windows (it might take some time because this PR is not the most urgent one regarding work on Windows).

@intentionally-left-nil
Copy link

@jjerphan thanks for the updates. If the main reason is just for extra hardware coverage, is there any scenario in particular? We can try to dig up more machines that match the desired test case.

Otherwise if your main goal is to have more PR reviews, then we'll just patiently wait :)

@jjerphan
Copy link
Member Author

Is there any scenario in particular?

Ideally all the four x84_64 versions must be tested.

@Klaim
Copy link
Member

Klaim commented Feb 19, 2025

Here is my result with the current version:

$ mamba info -v

       libmamba version : 2.0.6
          mamba version : 2.0.6
           curl version : libcurl/8.11.1 Schannel zlib/1.3.1 libssh2/1.11.1
     libarchive version : libarchive 3.7.7 zlib/1.3.1 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.6
       envs directories : C:\Users\Klaim\micromamba\envs
          package cache : C:\Users\Klaim\micromamba\pkgs
                          C:\Users\Klaim\.mamba\pkgs
                          C:\Users\Klaim\AppData\Roaming\.mamba\pkgs
            environment : mamba-pr-testing (active)
           env location : C:\Users\Klaim\micromamba\envs\mamba-pr-testing
      user config files : C:\Users\Klaim\.mambarc
 populated config files :
       virtual packages : __win=10.0.22631=0
                          __archspec=1=x86_64_v3
                          __cuda=12.7=0
               channels : https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/conda-forge/win-64
       base environment : C:\Users\Klaim\micromamba
               platform : win-64

If my understanding is correct this is the expected output? You'll have to confirm as I dont know much on the cpu capability flags subjects.

@jjerphan
Copy link
Member Author

If my understanding is correct this is the expected output? You'll have to confirm as I dont know much on the cpu capability flags subjects.

Yes, it is correct.

@jjerphan jjerphan merged commit cdcec32 into mamba-org:main Feb 24, 2025
34 checks passed
@jjerphan jjerphan deleted the fix/windows-x86_64-archspec branch February 24, 2025 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release::bug_fixes For PRs fixing bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Archspec not implemented for Windows
5 participants