The previous version had two bugs that caused the JIT runtime errors.
1. It was building the executable without separable compilation enabled
2. All kernel launches will fail if any kernel is missing a symbol, that
is why the call to file2_launch_kernel had to be removed
If a static library has the property CUDA_RESOLVE_DEVICE_SYMBOLS enabled
it will now perform the device link step. The normal behavior is
to delay calling device link until the static library is consumed by
a shared library or an executable.