Help: Add documentation for CTest hardware allocation

This commit is contained in:
Kyle Edwards
2019-07-11 17:14:01 -04:00
committed by Brad King
parent d1f100a415
commit e9500271a3
6 changed files with 304 additions and 0 deletions

View File

@@ -17,6 +17,7 @@ Perform the :ref:`CTest Test Step` as a :ref:`Dashboard Client`.
[EXCLUDE_FIXTURE_SETUP <regex>]
[EXCLUDE_FIXTURE_CLEANUP <regex>]
[PARALLEL_LEVEL <level>]
[HARDWARE_SPEC_FILE <file>]
[TEST_LOAD <threshold>]
[SCHEDULE_RANDOM <ON|OFF>]
[STOP_TIME <time-of-day>]
@@ -82,6 +83,11 @@ The options are:
Specify a positive number representing the number of tests to
be run in parallel.
``HARDWARE_SPEC_FILE <file>``
Specify a
:ref:`hardware specification file <ctest-hardware-specification-file>`. See
:ref:`ctest-hardware-allocation` for more information.
``TEST_LOAD <threshold>``
While running tests in parallel, try not to start tests when they
may cause the CPU load to pass above a given threshold. If not

View File

@@ -414,6 +414,7 @@ Properties on Tests
/prop_test/LABELS
/prop_test/MEASUREMENT
/prop_test/PASS_REGULAR_EXPRESSION
/prop_test/PROCESSES
/prop_test/PROCESSOR_AFFINITY
/prop_test/PROCESSORS
/prop_test/REQUIRED_FILES

View File

@@ -90,6 +90,15 @@ Options
See `Label and Subproject Summary`_.
``--hardware-spec-file <file>``
Run CTest with :ref:`hardware allocation <ctest-hardware-allocation>` enabled,
using the
:ref:`hardware specification file <ctest-hardware-specification-file>`
specified in ``<file>``.
When ``ctest`` is run as a `Dashboard Client`_ this sets the
``HardwareSpecFile`` option of the `CTest Test Step`_.
``--test-load <level>``
While running tests in parallel (e.g. with ``-j``), try not to start
tests when they may cause the CPU load to pass above a given threshold.
@@ -958,6 +967,11 @@ Arguments to the command may specify some of the step settings.
Configuration settings include:
``HardwareSpecFile``
Specify a
:ref:`hardware specification file <ctest-hardware-specification-file>`. See
:ref:`ctest-hardware-allocation` for more information.
``LabelsForSubprojects``
Specify a semicolon-separated list of labels that will be treated as
subprojects. This mapping will be passed on to CDash when configure, test or
@@ -1267,6 +1281,221 @@ model is defined as follows:
Test properties.
Can contain keys for each of the supported test properties.
.. _`ctest-hardware-allocation`:
Hardware Allocation
===================
CTest provides a mechanism for tests to specify the hardware that they need and
how much of it they need, and for users to specify the hardware availiable on
the running machine. This allows CTest to internally keep track of which
hardware is in use and which is free, scheduling tests in a way that prevents
them from trying to claim hardware that is not available.
A common use case for this feature is for tests that require the use of a GPU.
Multiple tests can simultaneously allocate memory from a GPU, but if too many
tests try to do this at once, some of them will fail to allocate, resulting in
a failed test, even though the test would have succeeded if it had the memory
it needed. By using the hardware allocation feature, each test can specify how
much memory it requires from a GPU, allowing CTest to schedule tests in a way
that running several of these tests at once does not exhaust the GPU's memory
pool.
Please note that CTest has no concept of what a GPU is or how much memory it
has, nor does it have any way of communicating with a GPU to retrieve this
information or perform any memory management. CTest simply keeps track of a
list of abstract resource types, each of which has a certain number of slots
available for tests to use. Each test specifies the number of slots that it
requires from a certain resource, and CTest then schedules them in a way that
prevents the total number of slots in use from exceeding the listed capacity.
When a test is executed, and slots from a resource are allocated to that test,
tests may assume that they have exclusive use of those slots for the duration
of the test's process.
The CTest hardware allocation feature consists of two inputs:
* The :ref:`hardware specification file <ctest-hardware-specification-file>`,
described below, which describes the hardware resources available on the
system, and
* The :prop_test:`PROCESSES` property of tests, which describes the resources
required by the test
When CTest runs a test, the hardware allocated to that test is passed in the
form of a set of
:ref:`environment variables <ctest-hardware-environment-variables>` as
described below. Using this information to decide which resource to connect to
is left to the test writer.
Please note that these processes are not spawned by CTest. The ``PROCESSES``
property merely tells CTest what processes the test expects to launch. It is up
to the test itself to do this process spawning, and read the :ref:`environment
variables <ctest-hardware-environment-variables>` to determine which resources
each process has been allocated.
.. _`ctest-hardware-specification-file`:
Hardware Specification File
---------------------------
The hardware specification file is a JSON file which is passed to CTest, either
on the :manual:`ctest(1)` command line as ``--hardware-spec-file``, or as the
``HARDWARE_SPEC_FILE`` argument of :command:`ctest_test`. The hardware
specification file must be a JSON object. All examples in this document assume
the following hardware specification file:
.. code-block:: json
{
"local": [
{
"gpus": [
{
"id": "0",
"slots": 2
},
{
"id": "1",
"slots": 4
},
{
"id": "2",
"slots": 2
},
{
"id": "3"
}
],
"crypto_chips": [
{
"id": "card0",
"slots": 4
}
]
}
]
}
The members are:
``local``
A JSON array consisting of CPU sockets present on the system. Currently, only
one socket is supported.
Each socket is a JSON object with members whose names are equal to the
desired resource types, such as ``gpu``. These names must start with a
lowercase letter or an underscore, and subsequent characters can be a
lowercase letter, a digit, or an underscore. Uppercase letters are not
allowed, because certain platforms have case-insensitive environment
variables. See the `Environment Variables`_ section below for
more information. It is recommended that the resource type name be the plural
of a noun, such as ``gpus`` or ``crypto_chips`` (and not ``gpu`` or
``crypto_chip``.)
Please note that the names ``gpus`` and ``crypto_chips`` are just examples,
and CTest does not interpret them in any way. You are free to make up any
resource type you want to meet your own requirements.
The value for each resource type is a JSON array consisting of JSON objects,
each of which describe a specific instance of the specified resource. These
objects have the following members:
``id``
A string consisting of an identifier for the resource. Each character in
the identifier can be a lowercase letter, a digit, or an underscore.
Uppercase letters are not allowed.
Identifiers must be unique within a resource type. However, they do not
have to be unique across resource types. For example, it is valid to have a
``gpus`` resource named ``0`` and a ``crypto_chips`` resource named ``0``,
but not two ``gpus`` resources both named ``0``.
Please note that the IDs ``0``, ``1``, ``2``, ``3``, and ``card0`` are just
examples, and CTest does not interpret them in any way. You are free to
make up any IDs you want to meet your own requirements.
``slots``
An optional unsigned number specifying the number of slots available on the
resource. For example, this could be megabytes of RAM on a GPU, or
cryptography units available on a cryptography chip. If ``slots`` is not
specified, a default value of ``1`` is assumed.
In the example file above, there are four GPUs with ID's 0 through 3. GPU 0 has
2 slots, GPU 1 has 4, GPU 2 has 2, and GPU 3 has a default of 1 slot. There is
also one cryptography chip with 4 slots.
``PROCESSES`` Property
----------------------
See :prop_test:`PROCESSES` for a description of this property.
.. _`ctest-hardware-environment-variables`:
Environment Variables
---------------------
Once CTest has decided which resources to allocate to a test, it passes this
information to the test executable as a series of environment variables. For
each example below, we will assume that the test in question has a
:prop_test:`PROCESSES` property of ``2,gpus:2;gpus:4,gpus:1,crypto_chips:2``.
The following variables are passed to the test process:
.. envvar:: CTEST_PROCESS_COUNT
The total number of processes specified by the :prop_test:`PROCESSES`
property. For example:
* ``CTEST_PROCESS_COUNT=3``
This variable will only be defined if :manual:`ctest(1)` has been given a
``--hardware-spec-file``, or if :command:`ctest_test` has been given a
``HARDWARE_SPEC_FILE``. If no hardware specification file has been given,
this variable will not be defined.
.. envvar:: CTEST_PROCESS_<num>
The list of resource types allocated to each process, with each item
separated by a comma. ``<num>`` is a number from zero to
``CTEST_PROCESS_COUNT`` minus one. ``CTEST_PROCESS_<num>`` is defined for
each ``<num>`` in this range. For example:
* ``CTEST_PROCESS_0=gpus``
* ``CTEST_PROCESS_1=gpus``
* ``CTEST_PROCESS_2=crypto_chips,gpus``
.. envvar:: CTEST_PROCESS_<num>_<resource-type>
The list of resource IDs and number of slots from each ID allocated to each
process for a given resource type. This variable consists of a series of
pairs, each pair separated by a semicolon, and with the two items in the pair
separated by a comma. The first item in each pair is ``id:`` followed by the
ID of a resource of type ``<resource-type>``, and the second item is
``slots:`` followed by the number of slots from that resource allocated to
the given process. For example:
* ``CTEST_PROCESS_0_GPUS=id:0,slots:2``
* ``CTEST_PROCESS_1_GPUS=id:2,slots:2``
* ``CTEST_PROCESS_2_GPUS=id:1,slots:4;id:3,slots:1``
* ``CTEST_PROCESS_2_CRYPTO_CHIPS=id:card0,slots:2``
In this example, process 0 gets 2 slots from GPU ``0``, process 1 gets 2 slots
from GPU ``2``, and process 2 gets 4 slots from GPU ``1`` and 2 slots from
cryptography chip ``card0``.
``<num>`` is a number from zero to ``CTEST_PROCESS_COUNT`` minus one.
``<resource-type>`` is the name of a resource type, converted to uppercase.
``CTEST_PROCESS_<num>_<resource-type>`` is defined for the product of each
``<num>`` in the range listed above and each resource type listed in
``CTEST_PROCESS_<num>``.
Because some platforms have case-insensitive names for environment variables,
the names of resource types may not clash in a case-insensitive environment.
Because of this, for the sake of simplicity, all resource types must be
listed in all lowercase in the
:ref:`hardware specification file <ctest-hardware-specification-file>` and in
the :prop_test:`PROCESSES` property, and they are converted to all uppercase
in the ``CTEST_PROCESS_<num>_<resource-type>`` environment variable.
See Also
========

View File

@@ -0,0 +1,54 @@
PROCESSES
----------
Set to specify the number of processes spawned by a test, and the resources
that they require. See :ref:`hardware allocation <ctest-hardware-allocation>`
for more information on how this property integrates into the CTest hardware
allocation feature.
The ``PROCESSES`` property is a :ref:`semicolon-separated list <CMake Language
Lists>` of process descriptions. Each process description consists of an
optional number of processes for the description followed by a series of
resource requirements for those processes. These requirements (and the number
of processes) are separated by commas. The resource requirements consist of the
name of a resource type, followed by a colon, followed by an unsigned integer
specifying the number of slots required on one resource of the given type.
Please note that these processes are not spawned by CTest. The ``PROCESSES``
property merely tells CTest what processes the test expects to launch. It is up
to the test itself to do this process spawning, and read the :ref:`environment
variables <ctest-hardware-environment-variables>` to determine which resources
each process has been allocated.
Consider the following example:
.. code-block:: cmake
add_test(NAME MyTest COMMAND MyExe)
set_property(TEST MyTest PROPERTY PROCESSES
"2,gpus:2"
"gpus:4,crypto_chips:2")
In this example, there are two process descriptions (implicitly separated by a
semicolon.) The content of the first description is ``2,gpus:2``. This
description spawns 2 processes, each of which requires 2 slots from a single
GPU. The content of the second description is ``gpus:4,crypto_chips:2``. This
description does not specify a process count, so a default of 1 is assumed.
This single process requires 4 slots from a single GPU and 2 slots from a
single cryptography chip. In total, 3 processes are spawned from this test,
each with their own unique requirements.
When CTest sets the :ref:`environment variables
<ctest-hardware-environment-variables>` for a test, it assigns a process number
based on the process description, starting at 0 on the left and the number of
processes minus 1 on the right. For example, in the example above, the two
processes in the first description would have IDs of 0 and 1, and the single
process in the second description would have an ID of 2.
Both the ``PROCESSES`` and :prop_test:`RESOURCE_LOCK` properties serve similar
purposes, but they are distinct and orthogonal. Resources specified by
``PROCESSES`` do not affect :prop_test:`RESOURCE_LOCK`, and vice versa. Whereas
:prop_test:`RESOURCE_LOCK` is a simpler property that is used for locking one
global resource, ``PROCESSES`` is a more advanced property that allows multiple
tests to simultaneously use multiple resources of the same type, specifying
their requirements in a fine-grained manner.

View File

@@ -8,3 +8,11 @@ not to run concurrently.
See also :prop_test:`FIXTURES_REQUIRED` if the resource requires any setup or
cleanup steps.
Both the :prop_test:`PROCESSES` and ``RESOURCE_LOCK`` properties serve similar
purposes, but they are distinct and orthogonal. Resources specified by
:prop_test:`PROCESSES` do not affect ``RESOURCE_LOCK``, and vice versa. Whereas
``RESOURCE_LOCK`` is a simpler property that is used for locking one global
resource, :prop_test:`PROCESSES` is a more advanced property that allows
multiple tests to simultaneously use multiple resources of the same type,
specifying their requirements in a fine-grained manner.

View File

@@ -0,0 +1,6 @@
ctest-hardware-allocation
-------------------------
* :manual:`ctest(1)` now has the ability to serialize tests based on hardware
requirements for each test. See :ref:`ctest-hardware-allocation` for
details.