CUDA 11.6 added the `nvcc -arch=native` flag to automatically compile
for the host GPUs' architectures. Add support for specifying this
special `native` value in `CMAKE_CUDA_ARCHITECTURES` and
`CUDA_ARCHITECTURES`. During the compiler ABI detection step,
detect the native architectures so we can pass them explicitly
when using Clang or older versions of nvcc.
Fixes: #22375
Skip the architecture verification check for these values on Visual
Studio. It cannot be implemented correctly until future work delays the
check to the main compiler test step.
Issue: #23164, #23161
Changes in commit 8f64df0a7c (CUDA: Generic all and all-major support,
2021-12-19, v3.23.0-rc1~23^2) broke our architecture verification checks
when using `-arch={all,all-major}` with NVCC 11.5+. If we test the
compiler with `-arch={all,all-major}`, we have no expected list of
architectures, so skip the check.
Fixes: #23278
Refactor the logic checking `CMAKE_CUDA_ARCHITECTURES` special values.
Switch on the value first, and then make other decisions for each case.
This makes room for other special values to be added later.
Commit 14d8a276 (CUDA: Support nvcc 11.5 new -arch=all|all-major flags,
2021-08-17) added all and all-major options to CUDA_ARCHITECTURES. These are
fairly generic and likely to see real-world use by distributors. Thus it's
desirable to support these also for Clang and older NVCC versions.
The supported architectures are dependent on the toolkit version. We determine
the toolkit version prior to compiler detection. For NVCC we get the version
from the vendor identification output, but for Clang we need to invoke NVCC
separately.
The architecture information is mostly based on the Wikipedia list with the
earliest supported version being CUDA 7.0. This could be documented and
expanded in the future to allow projects to query CUDA toolkit version and
architecture information.
For Clang we additionally constrain based on its support.
Additionally the architecture mismatch detection logic is fixed, improved and
updated for generic support:
* Commit 01428c55 (CUDA: Fail fast if CMAKE_CUDA_ARCHITECTURES doesn't work
during detection, 2020-08-29) enabled CMAKE_CUDA_COMPILER_ID_REQUIRE_SUCCESS
if CMAKE_CUDA_ARCHITECTURES is specified. This results in
CMakeDetermineCompilerID.cmake printing the compiler error and our code for
presenting the mismatch in a user-friendly way being useless. The custom
logic seems preferable so go back to not enabling it.
* Commit 14d8a276 (CUDA: Support nvcc 11.5 new -arch=all|all-major flags,
2021-08-17) tried to support CMP0054 but forgot to add x to the interpolated
result. Thus the conditions would always evaluate to false. This is fixed as
a byproduct of removing NVIDIA specific checks, improving the error message
and replacing architectures_mode with a simpler architectures_explicit.
Visual Studio support omits testing the flags during detection due to
complexities in determining the toolkit version when using it.
A long-term proper implementation would be #23161.
Implements #22860.
We require CUDA_ARCHITECTURES to be set for targets (see CMP0104). If not set
anything after compiler detection such as ABI detection will fail to generate.
This means we need to error if CMAKE_CUDA_ARCHITECTURES is not set to a valid
value as a result of compiler detection.
Currently we fail to issue the error if compiler detection failed and the ID is
unset. In such a case we won't define detected_architecture making the code
responsible for the error unreachable.
Simplify the detection of architectures used during compiler detection by
always detecting all of them, which enables us to simply the check in the
"default to compiler" path if CMAKE_CUDA_ARCHITECTURES is empty.
As a result we need to move the error checking and CMAKE_CUDA_ARCHITECTURES=OFF
handling fully into the default path thus simplifying the code and unifying the
code paths for NVCC and CUDA.
This also happens to fix:
1. CMAKE_CUDA_ARCHITECTURES=OFF on Clang.
2. A theoretical issue of a compiler defaulting to multiple architectures.
I've additionally added printing of the compiler output along the error to
better reveal possible underlying compiler/system configuration issues.
Fixes#23010.
If empty we otherwise treat it the same as unset in most places, but still end
up failing eventually with a confusing "Failed to find a working CUDA
architecture".
This also detects some other basic invalid ones (e.g. "al").
The code now matches what the comment describes.
This mistake seems to have been present since the initial introduction in
commit 5df21adf (CUDA: Add support for Clang compiler, 2020-05-07).
Depending on the CUDA installation the nvvm/libdevice folder can
be in different locations. Since `nvcc` outputs the expected location use that when possible.
The original regular expression was greedy and would match any
environment variable ending with `TOP` (like `DESKTOP`). This is an
issue on windows where `nvcc -v` would output all environment variables
before the compiler's verbose output.
To resolve this issue we use a tighter match algorithm that looks
for `#$ TOP=` instead of `TOP=`.
Fixes: #22158
Previously, nvcc needed to be present in PATH or specified by CUDACXX.
On Windows with vcpkg, the PATH is heavily modified, which lead to
nvcc not being found with the Ninja generator.
Since commit fb2afef620 (CUDA: Support nvcc symlinking to ccache,
2021-01-07) and commit 3cef91a321 (CUDA: Always extract CUDA Toolkit
root from nvcc verbose output, 2021-02-03) we always run the command
`nvcc -v __cmake_determine_cuda` to look for the toolkit root in its
stderr. On Windows, that command may print to stdout instead, so
capture that as well.
Fixes#21750, #21763
Given that NVCC can be provided by multiple different sources (NVIDIA HPC SDK, CUDA Toolkit, distro)
each of which has a different layout, we need to extract the CUDA toolkit root from the compiler
itself, allowing us to support numerious different scattered toolkit layouts.
The NVIDIA HPC SDK specifically ships two copies of nvcc one in
`compilers/bin/` and one in `cuda/bin`. Thus when using
`compilers/bin/nvcc` the Toolkit root logic fails.
NVCC's default architecture may be newer than the one supported by the
machine's GPU.
In such cases it's useful to have an environment variable for initializing
CMAKE_CUDA_ARCHITECTURES to avoid specifying it for every invocation.
Finding the toolkit is required for Clang and is assumed to have been correctly
found by FindCUDAToolkit if the CUDA language is found.
Error out early with an useful error instead of failing later on due to the
path not being set.
version.txt is gone from CUDA 11.1 installations, but the rest is the same.
Instead of looking for version.txt look for <CUDA path>/nvvm/libdevice, which
is the main thing that Clang requires (though it also checks for the existence
of bin and include).
Fixes#21353.
For NVCC the compiler takes care of device linking when passed the "-dlink"
flag.
Clang doesn't support such magic and requires the buildsystem to do the work
that NVCC does behind the scenes.
The implementation is based on Bazel's device linking documentation:
7cabcdf073/third_party/nccl/build_defs.bzl.tpl (L259)Closes: #20726
Previously when CMAKE_CROSSCOMPILING was ON we'd end up not setting the target
directory if the non-scattered one didn't exist.
Fix this by assuming a scattered installation if the target directory isn't set
after the crosscompiling logic.
If an user specified a host compiler we should fail if we are unable to perform
compiler detection with it.
Previously we would try without and likely succeed and continue. Then we'd fail
during ABI detection and compiler testing since we'd still try to use it.
This is particularly problematic when crosscompiling since we extract the host
linker from the compiler detection link line. This would result in the wrong
host linker being used and a linking error due to architecture mismatch during
ABI detection where other necessary flags may already be present to make the
host compiler work. See #21076 for an example.
Fix this by adding CMAKE_<LANG>_COMPILER_ID_REQUIRE_SUCCESS to
CMakeDetermineCompilerId, which throws a fatal error if executing the compiler
results in a non-zero exit code.
Fixes#21120.
Capture CMAKE_<LANG>_COMPILER_ARG1 from CMAKE_<LANG>_COMPILER in the
same fashion that it is from $ENV{<LANG>}.
Since get_filename_component() returns a single string of all the
arguments from $ENV{<LANG>}, a single string of arguments will be
constructed from the items contained in CMAKE_<LANG>_COMPILER.
Fixes#20089
When using a Visual Studio generator, we do not know the path to
`CMAKE_CUDA_COMPILER` until after the compiler id detection project
extracts it. However, based on the VS toolchain configuration we know
that the compiler id will be NVIDIA, so use that to drive the CUDA
architecture flag logic. After the main compiler id step we know
`CMAKE_CUDA_COMPILER`, so `CMAKE_CUDA_COMPILER_TOOLKIT_ROOT` can then be
computed.
The result variable was `CUDAToolkit_NVCC_EXECUTABLE` instead of
`_CUDA_NVCC_EXECUTABLE`, which the rest of the code used. Also, we can
simply `set()` the `_CUDA_NVCC_EXECUTABLE` value without searching.
Fixes: #20847
The ability to disable adding architectures completely for packaging purposes
and cases requiring passing the architectures flags explicitly has been
requested.
Support a false value for CUDA_ARCHITECTURES and CMAKE_CUDA_ARCHITECTURES
for this purpose.
Implements #20821.
Clang isn't very good at finding the installed CUDA toolkit.
The upstream recommendation is that we should pass the toolkit explicitly.
Additionally:
* Avoids Clang having to search for the toolkit on every invocation.
* Allows the user to use a toolkit from a non-standard location by simply
setting CUDAToolkit_ROOT. The same way as with FindCUDAToolkit.
Clang wants the directory containing the device library and version.txt as the
toolkit path.
We thus pass the newly introduced CUDAToolkit_LIBRARY_ROOT as the toolkit path.
We save CUDAToolkit_ROOT_DIR and CUDAToolkit_LIBRARY_ROOT on Clang to have them
available in try_compile() and avoid unnecessary re-searching or a possibly
different installation being found in FindCUDAToolkit.
This however means that the selected toolkit can't be changed after the initial
language enablement.
We now determine CUDA compiler ID before doing actual detection, as we don't
want to spend time finding the CUDA toolkit for NVIDIA.
Implements #20754.
We can avoid searching for this since CUDAToolkit_TARGET_DIR always contains
the include/ directory. But add a warning just in case.
Also apply this in CMakeDetermineCUDACompiler for Clang code.
On scattered installations version.txt and nvvm are located at this location.
This may be useful to users and will allow us in the future to parse
version.txt instead of invoking nvcc to figure out the CUDA toolkit version.
We also add it to CMakeDetermineCUDACompiler in preparation for future use by
Clang code.