mirror of
https://github.com/Kitware/CMake.git
synced 2026-03-19 02:10:27 -05:00
Help: Add documentation for CTest hardware allocation
This commit is contained in:
@@ -17,6 +17,7 @@ Perform the :ref:`CTest Test Step` as a :ref:`Dashboard Client`.
|
|||||||
[EXCLUDE_FIXTURE_SETUP <regex>]
|
[EXCLUDE_FIXTURE_SETUP <regex>]
|
||||||
[EXCLUDE_FIXTURE_CLEANUP <regex>]
|
[EXCLUDE_FIXTURE_CLEANUP <regex>]
|
||||||
[PARALLEL_LEVEL <level>]
|
[PARALLEL_LEVEL <level>]
|
||||||
|
[HARDWARE_SPEC_FILE <file>]
|
||||||
[TEST_LOAD <threshold>]
|
[TEST_LOAD <threshold>]
|
||||||
[SCHEDULE_RANDOM <ON|OFF>]
|
[SCHEDULE_RANDOM <ON|OFF>]
|
||||||
[STOP_TIME <time-of-day>]
|
[STOP_TIME <time-of-day>]
|
||||||
@@ -82,6 +83,11 @@ The options are:
|
|||||||
Specify a positive number representing the number of tests to
|
Specify a positive number representing the number of tests to
|
||||||
be run in parallel.
|
be run in parallel.
|
||||||
|
|
||||||
|
``HARDWARE_SPEC_FILE <file>``
|
||||||
|
Specify a
|
||||||
|
:ref:`hardware specification file <ctest-hardware-specification-file>`. See
|
||||||
|
:ref:`ctest-hardware-allocation` for more information.
|
||||||
|
|
||||||
``TEST_LOAD <threshold>``
|
``TEST_LOAD <threshold>``
|
||||||
While running tests in parallel, try not to start tests when they
|
While running tests in parallel, try not to start tests when they
|
||||||
may cause the CPU load to pass above a given threshold. If not
|
may cause the CPU load to pass above a given threshold. If not
|
||||||
|
|||||||
@@ -414,6 +414,7 @@ Properties on Tests
|
|||||||
/prop_test/LABELS
|
/prop_test/LABELS
|
||||||
/prop_test/MEASUREMENT
|
/prop_test/MEASUREMENT
|
||||||
/prop_test/PASS_REGULAR_EXPRESSION
|
/prop_test/PASS_REGULAR_EXPRESSION
|
||||||
|
/prop_test/PROCESSES
|
||||||
/prop_test/PROCESSOR_AFFINITY
|
/prop_test/PROCESSOR_AFFINITY
|
||||||
/prop_test/PROCESSORS
|
/prop_test/PROCESSORS
|
||||||
/prop_test/REQUIRED_FILES
|
/prop_test/REQUIRED_FILES
|
||||||
|
|||||||
@@ -90,6 +90,15 @@ Options
|
|||||||
|
|
||||||
See `Label and Subproject Summary`_.
|
See `Label and Subproject Summary`_.
|
||||||
|
|
||||||
|
``--hardware-spec-file <file>``
|
||||||
|
Run CTest with :ref:`hardware allocation <ctest-hardware-allocation>` enabled,
|
||||||
|
using the
|
||||||
|
:ref:`hardware specification file <ctest-hardware-specification-file>`
|
||||||
|
specified in ``<file>``.
|
||||||
|
|
||||||
|
When ``ctest`` is run as a `Dashboard Client`_ this sets the
|
||||||
|
``HardwareSpecFile`` option of the `CTest Test Step`_.
|
||||||
|
|
||||||
``--test-load <level>``
|
``--test-load <level>``
|
||||||
While running tests in parallel (e.g. with ``-j``), try not to start
|
While running tests in parallel (e.g. with ``-j``), try not to start
|
||||||
tests when they may cause the CPU load to pass above a given threshold.
|
tests when they may cause the CPU load to pass above a given threshold.
|
||||||
@@ -958,6 +967,11 @@ Arguments to the command may specify some of the step settings.
|
|||||||
|
|
||||||
Configuration settings include:
|
Configuration settings include:
|
||||||
|
|
||||||
|
``HardwareSpecFile``
|
||||||
|
Specify a
|
||||||
|
:ref:`hardware specification file <ctest-hardware-specification-file>`. See
|
||||||
|
:ref:`ctest-hardware-allocation` for more information.
|
||||||
|
|
||||||
``LabelsForSubprojects``
|
``LabelsForSubprojects``
|
||||||
Specify a semicolon-separated list of labels that will be treated as
|
Specify a semicolon-separated list of labels that will be treated as
|
||||||
subprojects. This mapping will be passed on to CDash when configure, test or
|
subprojects. This mapping will be passed on to CDash when configure, test or
|
||||||
@@ -1267,6 +1281,221 @@ model is defined as follows:
|
|||||||
Test properties.
|
Test properties.
|
||||||
Can contain keys for each of the supported test properties.
|
Can contain keys for each of the supported test properties.
|
||||||
|
|
||||||
|
.. _`ctest-hardware-allocation`:
|
||||||
|
|
||||||
|
Hardware Allocation
|
||||||
|
===================
|
||||||
|
|
||||||
|
CTest provides a mechanism for tests to specify the hardware that they need and
|
||||||
|
how much of it they need, and for users to specify the hardware availiable on
|
||||||
|
the running machine. This allows CTest to internally keep track of which
|
||||||
|
hardware is in use and which is free, scheduling tests in a way that prevents
|
||||||
|
them from trying to claim hardware that is not available.
|
||||||
|
|
||||||
|
A common use case for this feature is for tests that require the use of a GPU.
|
||||||
|
Multiple tests can simultaneously allocate memory from a GPU, but if too many
|
||||||
|
tests try to do this at once, some of them will fail to allocate, resulting in
|
||||||
|
a failed test, even though the test would have succeeded if it had the memory
|
||||||
|
it needed. By using the hardware allocation feature, each test can specify how
|
||||||
|
much memory it requires from a GPU, allowing CTest to schedule tests in a way
|
||||||
|
that running several of these tests at once does not exhaust the GPU's memory
|
||||||
|
pool.
|
||||||
|
|
||||||
|
Please note that CTest has no concept of what a GPU is or how much memory it
|
||||||
|
has, nor does it have any way of communicating with a GPU to retrieve this
|
||||||
|
information or perform any memory management. CTest simply keeps track of a
|
||||||
|
list of abstract resource types, each of which has a certain number of slots
|
||||||
|
available for tests to use. Each test specifies the number of slots that it
|
||||||
|
requires from a certain resource, and CTest then schedules them in a way that
|
||||||
|
prevents the total number of slots in use from exceeding the listed capacity.
|
||||||
|
When a test is executed, and slots from a resource are allocated to that test,
|
||||||
|
tests may assume that they have exclusive use of those slots for the duration
|
||||||
|
of the test's process.
|
||||||
|
|
||||||
|
The CTest hardware allocation feature consists of two inputs:
|
||||||
|
|
||||||
|
* The :ref:`hardware specification file <ctest-hardware-specification-file>`,
|
||||||
|
described below, which describes the hardware resources available on the
|
||||||
|
system, and
|
||||||
|
* The :prop_test:`PROCESSES` property of tests, which describes the resources
|
||||||
|
required by the test
|
||||||
|
|
||||||
|
When CTest runs a test, the hardware allocated to that test is passed in the
|
||||||
|
form of a set of
|
||||||
|
:ref:`environment variables <ctest-hardware-environment-variables>` as
|
||||||
|
described below. Using this information to decide which resource to connect to
|
||||||
|
is left to the test writer.
|
||||||
|
|
||||||
|
Please note that these processes are not spawned by CTest. The ``PROCESSES``
|
||||||
|
property merely tells CTest what processes the test expects to launch. It is up
|
||||||
|
to the test itself to do this process spawning, and read the :ref:`environment
|
||||||
|
variables <ctest-hardware-environment-variables>` to determine which resources
|
||||||
|
each process has been allocated.
|
||||||
|
|
||||||
|
.. _`ctest-hardware-specification-file`:
|
||||||
|
|
||||||
|
Hardware Specification File
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
The hardware specification file is a JSON file which is passed to CTest, either
|
||||||
|
on the :manual:`ctest(1)` command line as ``--hardware-spec-file``, or as the
|
||||||
|
``HARDWARE_SPEC_FILE`` argument of :command:`ctest_test`. The hardware
|
||||||
|
specification file must be a JSON object. All examples in this document assume
|
||||||
|
the following hardware specification file:
|
||||||
|
|
||||||
|
.. code-block:: json
|
||||||
|
|
||||||
|
{
|
||||||
|
"local": [
|
||||||
|
{
|
||||||
|
"gpus": [
|
||||||
|
{
|
||||||
|
"id": "0",
|
||||||
|
"slots": 2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "1",
|
||||||
|
"slots": 4
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "2",
|
||||||
|
"slots": 2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "3"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"crypto_chips": [
|
||||||
|
{
|
||||||
|
"id": "card0",
|
||||||
|
"slots": 4
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
The members are:
|
||||||
|
|
||||||
|
``local``
|
||||||
|
A JSON array consisting of CPU sockets present on the system. Currently, only
|
||||||
|
one socket is supported.
|
||||||
|
|
||||||
|
Each socket is a JSON object with members whose names are equal to the
|
||||||
|
desired resource types, such as ``gpu``. These names must start with a
|
||||||
|
lowercase letter or an underscore, and subsequent characters can be a
|
||||||
|
lowercase letter, a digit, or an underscore. Uppercase letters are not
|
||||||
|
allowed, because certain platforms have case-insensitive environment
|
||||||
|
variables. See the `Environment Variables`_ section below for
|
||||||
|
more information. It is recommended that the resource type name be the plural
|
||||||
|
of a noun, such as ``gpus`` or ``crypto_chips`` (and not ``gpu`` or
|
||||||
|
``crypto_chip``.)
|
||||||
|
|
||||||
|
Please note that the names ``gpus`` and ``crypto_chips`` are just examples,
|
||||||
|
and CTest does not interpret them in any way. You are free to make up any
|
||||||
|
resource type you want to meet your own requirements.
|
||||||
|
|
||||||
|
The value for each resource type is a JSON array consisting of JSON objects,
|
||||||
|
each of which describe a specific instance of the specified resource. These
|
||||||
|
objects have the following members:
|
||||||
|
|
||||||
|
``id``
|
||||||
|
A string consisting of an identifier for the resource. Each character in
|
||||||
|
the identifier can be a lowercase letter, a digit, or an underscore.
|
||||||
|
Uppercase letters are not allowed.
|
||||||
|
|
||||||
|
Identifiers must be unique within a resource type. However, they do not
|
||||||
|
have to be unique across resource types. For example, it is valid to have a
|
||||||
|
``gpus`` resource named ``0`` and a ``crypto_chips`` resource named ``0``,
|
||||||
|
but not two ``gpus`` resources both named ``0``.
|
||||||
|
|
||||||
|
Please note that the IDs ``0``, ``1``, ``2``, ``3``, and ``card0`` are just
|
||||||
|
examples, and CTest does not interpret them in any way. You are free to
|
||||||
|
make up any IDs you want to meet your own requirements.
|
||||||
|
|
||||||
|
``slots``
|
||||||
|
An optional unsigned number specifying the number of slots available on the
|
||||||
|
resource. For example, this could be megabytes of RAM on a GPU, or
|
||||||
|
cryptography units available on a cryptography chip. If ``slots`` is not
|
||||||
|
specified, a default value of ``1`` is assumed.
|
||||||
|
|
||||||
|
In the example file above, there are four GPUs with ID's 0 through 3. GPU 0 has
|
||||||
|
2 slots, GPU 1 has 4, GPU 2 has 2, and GPU 3 has a default of 1 slot. There is
|
||||||
|
also one cryptography chip with 4 slots.
|
||||||
|
|
||||||
|
``PROCESSES`` Property
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
See :prop_test:`PROCESSES` for a description of this property.
|
||||||
|
|
||||||
|
.. _`ctest-hardware-environment-variables`:
|
||||||
|
|
||||||
|
Environment Variables
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
Once CTest has decided which resources to allocate to a test, it passes this
|
||||||
|
information to the test executable as a series of environment variables. For
|
||||||
|
each example below, we will assume that the test in question has a
|
||||||
|
:prop_test:`PROCESSES` property of ``2,gpus:2;gpus:4,gpus:1,crypto_chips:2``.
|
||||||
|
|
||||||
|
The following variables are passed to the test process:
|
||||||
|
|
||||||
|
.. envvar:: CTEST_PROCESS_COUNT
|
||||||
|
|
||||||
|
The total number of processes specified by the :prop_test:`PROCESSES`
|
||||||
|
property. For example:
|
||||||
|
|
||||||
|
* ``CTEST_PROCESS_COUNT=3``
|
||||||
|
|
||||||
|
This variable will only be defined if :manual:`ctest(1)` has been given a
|
||||||
|
``--hardware-spec-file``, or if :command:`ctest_test` has been given a
|
||||||
|
``HARDWARE_SPEC_FILE``. If no hardware specification file has been given,
|
||||||
|
this variable will not be defined.
|
||||||
|
|
||||||
|
.. envvar:: CTEST_PROCESS_<num>
|
||||||
|
|
||||||
|
The list of resource types allocated to each process, with each item
|
||||||
|
separated by a comma. ``<num>`` is a number from zero to
|
||||||
|
``CTEST_PROCESS_COUNT`` minus one. ``CTEST_PROCESS_<num>`` is defined for
|
||||||
|
each ``<num>`` in this range. For example:
|
||||||
|
|
||||||
|
* ``CTEST_PROCESS_0=gpus``
|
||||||
|
* ``CTEST_PROCESS_1=gpus``
|
||||||
|
* ``CTEST_PROCESS_2=crypto_chips,gpus``
|
||||||
|
|
||||||
|
.. envvar:: CTEST_PROCESS_<num>_<resource-type>
|
||||||
|
|
||||||
|
The list of resource IDs and number of slots from each ID allocated to each
|
||||||
|
process for a given resource type. This variable consists of a series of
|
||||||
|
pairs, each pair separated by a semicolon, and with the two items in the pair
|
||||||
|
separated by a comma. The first item in each pair is ``id:`` followed by the
|
||||||
|
ID of a resource of type ``<resource-type>``, and the second item is
|
||||||
|
``slots:`` followed by the number of slots from that resource allocated to
|
||||||
|
the given process. For example:
|
||||||
|
|
||||||
|
* ``CTEST_PROCESS_0_GPUS=id:0,slots:2``
|
||||||
|
* ``CTEST_PROCESS_1_GPUS=id:2,slots:2``
|
||||||
|
* ``CTEST_PROCESS_2_GPUS=id:1,slots:4;id:3,slots:1``
|
||||||
|
* ``CTEST_PROCESS_2_CRYPTO_CHIPS=id:card0,slots:2``
|
||||||
|
|
||||||
|
In this example, process 0 gets 2 slots from GPU ``0``, process 1 gets 2 slots
|
||||||
|
from GPU ``2``, and process 2 gets 4 slots from GPU ``1`` and 2 slots from
|
||||||
|
cryptography chip ``card0``.
|
||||||
|
|
||||||
|
``<num>`` is a number from zero to ``CTEST_PROCESS_COUNT`` minus one.
|
||||||
|
``<resource-type>`` is the name of a resource type, converted to uppercase.
|
||||||
|
``CTEST_PROCESS_<num>_<resource-type>`` is defined for the product of each
|
||||||
|
``<num>`` in the range listed above and each resource type listed in
|
||||||
|
``CTEST_PROCESS_<num>``.
|
||||||
|
|
||||||
|
Because some platforms have case-insensitive names for environment variables,
|
||||||
|
the names of resource types may not clash in a case-insensitive environment.
|
||||||
|
Because of this, for the sake of simplicity, all resource types must be
|
||||||
|
listed in all lowercase in the
|
||||||
|
:ref:`hardware specification file <ctest-hardware-specification-file>` and in
|
||||||
|
the :prop_test:`PROCESSES` property, and they are converted to all uppercase
|
||||||
|
in the ``CTEST_PROCESS_<num>_<resource-type>`` environment variable.
|
||||||
|
|
||||||
See Also
|
See Also
|
||||||
========
|
========
|
||||||
|
|
||||||
|
|||||||
54
Help/prop_test/PROCESSES.rst
Normal file
54
Help/prop_test/PROCESSES.rst
Normal file
@@ -0,0 +1,54 @@
|
|||||||
|
PROCESSES
|
||||||
|
----------
|
||||||
|
|
||||||
|
Set to specify the number of processes spawned by a test, and the resources
|
||||||
|
that they require. See :ref:`hardware allocation <ctest-hardware-allocation>`
|
||||||
|
for more information on how this property integrates into the CTest hardware
|
||||||
|
allocation feature.
|
||||||
|
|
||||||
|
The ``PROCESSES`` property is a :ref:`semicolon-separated list <CMake Language
|
||||||
|
Lists>` of process descriptions. Each process description consists of an
|
||||||
|
optional number of processes for the description followed by a series of
|
||||||
|
resource requirements for those processes. These requirements (and the number
|
||||||
|
of processes) are separated by commas. The resource requirements consist of the
|
||||||
|
name of a resource type, followed by a colon, followed by an unsigned integer
|
||||||
|
specifying the number of slots required on one resource of the given type.
|
||||||
|
|
||||||
|
Please note that these processes are not spawned by CTest. The ``PROCESSES``
|
||||||
|
property merely tells CTest what processes the test expects to launch. It is up
|
||||||
|
to the test itself to do this process spawning, and read the :ref:`environment
|
||||||
|
variables <ctest-hardware-environment-variables>` to determine which resources
|
||||||
|
each process has been allocated.
|
||||||
|
|
||||||
|
Consider the following example:
|
||||||
|
|
||||||
|
.. code-block:: cmake
|
||||||
|
|
||||||
|
add_test(NAME MyTest COMMAND MyExe)
|
||||||
|
set_property(TEST MyTest PROPERTY PROCESSES
|
||||||
|
"2,gpus:2"
|
||||||
|
"gpus:4,crypto_chips:2")
|
||||||
|
|
||||||
|
In this example, there are two process descriptions (implicitly separated by a
|
||||||
|
semicolon.) The content of the first description is ``2,gpus:2``. This
|
||||||
|
description spawns 2 processes, each of which requires 2 slots from a single
|
||||||
|
GPU. The content of the second description is ``gpus:4,crypto_chips:2``. This
|
||||||
|
description does not specify a process count, so a default of 1 is assumed.
|
||||||
|
This single process requires 4 slots from a single GPU and 2 slots from a
|
||||||
|
single cryptography chip. In total, 3 processes are spawned from this test,
|
||||||
|
each with their own unique requirements.
|
||||||
|
|
||||||
|
When CTest sets the :ref:`environment variables
|
||||||
|
<ctest-hardware-environment-variables>` for a test, it assigns a process number
|
||||||
|
based on the process description, starting at 0 on the left and the number of
|
||||||
|
processes minus 1 on the right. For example, in the example above, the two
|
||||||
|
processes in the first description would have IDs of 0 and 1, and the single
|
||||||
|
process in the second description would have an ID of 2.
|
||||||
|
|
||||||
|
Both the ``PROCESSES`` and :prop_test:`RESOURCE_LOCK` properties serve similar
|
||||||
|
purposes, but they are distinct and orthogonal. Resources specified by
|
||||||
|
``PROCESSES`` do not affect :prop_test:`RESOURCE_LOCK`, and vice versa. Whereas
|
||||||
|
:prop_test:`RESOURCE_LOCK` is a simpler property that is used for locking one
|
||||||
|
global resource, ``PROCESSES`` is a more advanced property that allows multiple
|
||||||
|
tests to simultaneously use multiple resources of the same type, specifying
|
||||||
|
their requirements in a fine-grained manner.
|
||||||
@@ -8,3 +8,11 @@ not to run concurrently.
|
|||||||
|
|
||||||
See also :prop_test:`FIXTURES_REQUIRED` if the resource requires any setup or
|
See also :prop_test:`FIXTURES_REQUIRED` if the resource requires any setup or
|
||||||
cleanup steps.
|
cleanup steps.
|
||||||
|
|
||||||
|
Both the :prop_test:`PROCESSES` and ``RESOURCE_LOCK`` properties serve similar
|
||||||
|
purposes, but they are distinct and orthogonal. Resources specified by
|
||||||
|
:prop_test:`PROCESSES` do not affect ``RESOURCE_LOCK``, and vice versa. Whereas
|
||||||
|
``RESOURCE_LOCK`` is a simpler property that is used for locking one global
|
||||||
|
resource, :prop_test:`PROCESSES` is a more advanced property that allows
|
||||||
|
multiple tests to simultaneously use multiple resources of the same type,
|
||||||
|
specifying their requirements in a fine-grained manner.
|
||||||
|
|||||||
6
Help/release/dev/ctest-hardware-allocation.rst
Normal file
6
Help/release/dev/ctest-hardware-allocation.rst
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
ctest-hardware-allocation
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
* :manual:`ctest(1)` now has the ability to serialize tests based on hardware
|
||||||
|
requirements for each test. See :ref:`ctest-hardware-allocation` for
|
||||||
|
details.
|
||||||
Reference in New Issue
Block a user