When CUDA is enabled, and a pure non-CUDA target has
CMAKE_CUDA_SEPARABLE_COMPILATION enabled, don't actually perform
the device linking step, as it will fail. A target that has
CMAKE_CUDA_SEPARABLE_COMPILATION enabled must also have CUDA
usage (either itself, or something it links to).
Fixes: #20182
Checks added in commit 81b4d10d8f (CUDA: More exhaustive checks to
determine when to do device linking, 2019-05-09, v3.15.0-rc1~82^2)
assumed that CUDA properties would be set only if CUDA is enabled.
We cannot do a device link step if we do not have the CUDA language
enabled. This was discovered as some projects unconditionally set CUDA
properties such as `CUDA_RESOLVE_DEVICE_SYMBOLS` even when the CUDA
language has not been enabled.
Fixes: #19432
Convert the `CudaOnly.LinkSystemDeviceLibraries` test to a new
`Cuda.ProperDeviceLibraries` test. The former covered only the
`cublas_device` library which is removed by CUDA 10. Extend the new
test to also cover various cases of using threads.
Issue: #18008
Run the `clang-format.bash` script to update all our C and C++ code to a
new style defined by `.clang-format`. Use `clang-format` version 6.0.
* If you reached this commit for a line in `git blame`, re-run the blame
operation starting at the parent of this commit to see older history
for the content.
* See the parent commit for instructions to rebase a change across this
style transition commit.
When C++ features require a certain C++/CUDA level, verify or update the
standard level target property for each language independently.
While at it, add missing rejection of invalid `CUDA_STANDARD` property
values.
Co-Author: Brad King <brad.king@kitware.com>
Fixes: #17519
On Windows with MSVC-like host compilers we must honor the standard
libraries chosen by the `Platform/Windows-MSVC` module. Otherwise C
code linked into the CUDA binary that expects to have these libraries
available may not link.
Add `.clang-format` configuration files for Cuda test directories that
use `Standard: Cpp11`. Otherwise clang-format splits the triple angle
brackets used for CUDA kernels.
As kernel launches are asynchronous, a `cudaGetLastError()` right after
the kernel launch might be executed while the kernel is still running.
Synchronizing the device will ensure that all the work is completed
before progressing further on, and allows to catch errors that were
previously missed.
The `cudaGetLastError()` after the `cudaDeviceSynchronize()` is there
to reset the error variable to `cudaSuccess`.