performance_test

performance_test

[TOC]

The performance_test tool tests latency and other performance metrics of various middleware implementations that support a pub/sub pattern. It is used to simulate non-functional performance of your application.

The performance_test tool allows you to quickly set up a pub/sub configuration, e.g. number of publisher/subscribers, message size, QOS settings, middleware. The following metrics are automatically recorded when the application is running:

latency: corresponds to the time a message takes to travel from a publisher to subscriber. The latency is measured by timestamping the sample when it’s published and subtracting the timestamp (from the sample) from the measured time when the sample arrives at the subscriber (only logged when a subscriber is created)
CPU usage: percentage of the total system wide CPU usage (logged separately for each instance of perf_test)
resident memory: heap allocations, shared memory segments, stack (used for system’s internal work) (logged separately for each instance of perf_test)
sample statistics: number of samples received, sent, and lost per experiment run.

This master branch is compatible with the following ROS 2 versions

rolling
jazzy
iron
humble
galactic
foxy
eloquent
dashing
Apex.OS

How to use this document

Start here for a quick example of building and running the performance_test tool with the Cyclone DDS plugin.
If needed, find more detailed information about building and running
Or, if the quick example is good enough, skip ahead to the list of supported middleware plugins to learn how to test a specific middleware implementation.
Check out the tools for visualizing the results
If desired, read about the design and architecture of the tool.

Example

This example shows how to test the non-functional performance of the following configuration:

Option	Value
Plugin	Cyclone DDS
Message type	Array1k
Publishing rate	100Hz
Topic name	test_topic
Duration of the experiment	30s
Number of publisher(s)	1 (default)
Number of subscriber(s)	1 (default)

Install ROS 2
Install Cyclone DDS to /opt/cyclonedds
Build performance_test with the CMake build flag for Cyclone DDS:

    source /opt/ros/rolling/setup.bash
    cd ~/perf_test_ws
    colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
    source ./install/setup.bash
    

Run with the communication plugin option for Cyclone DDS:

mkdir experiment
./install/performance_test/lib/performance_test/perf_test --communication CycloneDDS
                                                          --msg Array1k
                                                          --rate 100
                                                          --topic test_topic
                                                          --max-runtime 30
                                                          --logfile experiment/log.csv

At the end of the experiment, a CSV log file will be generated in the experiment folder with a name that starts with log.

Building the performance_test tool

For a simple example, see Dockerfile.rclcpp.

The performance_test tool is structured as a ROS 2 package, so colcon is used to build it. Therefore, you must source a ROS 2 installation:

source /opt/ros/rolling/setup.bash

Select a middleware plugin from this list. Then build the performance_test tool with the selected middleware:

mkdir -p ~/perf_test_ws/src
cd ~/perf_test_ws/src
git clone https://gitlab.com/ApexAI/performance_test.git
cd ..
# At this stage, you need to choose which middleware you want to use
# The list of available flags is described in the middleware plugins section
# Square brackets denote optional arguments, like in the Python documentation.
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release -DPERFORMANCE_TEST_PLUGIN=<plugin>
source install/setup.bash

Running an experiment

The performance_test experiments are run through the perf_test executable. To find the available settings, run with --help (note the required and default arguments):

~/perf_test_ws$ ./install/performance_test/lib/performance_test/perf_test --help

The -c argument should match the selected middleware plugin from the build phase.
The --msg argument should be one of the supported message types, which are shown in the --help output.

Single machine or distributed system?

Based on the configuration you want to test, the usage of the performance_test tool differs. The different possibilities are explained below.

For running tests on a single machine, you can choose between the following options:

Intraprocess means that the publisher and subscriber threads are in the same process.

    perf_test <options> --num-sub-threads 1 --num-pub-threads 1

Interprocess means that the publisher and subscriber are in different processes. To test interprocess communication, two instances of the performance_test must be run, e.g.

    # Start the subscriber first
    perf_test <options> --num-sub-threads 1 --num-pub-threads 0 &
    sleep 1  # give the subscriber time to finish initializing
    perf_test <options> --num-sub-threads 0 --num-pub-threads 1
    

On a distributed system, testing latency is difficult, because the clocks are probably not perfectly synchronized between the two devices. To work around this, the performance_test tool supports relay mode, which allows for a round-trip style of communication:

# On the main machine
perf_test <options> --roundtrip-mode Main

# On the relay machine:
perf_test <options> --roundtrip-mode Relay

In relay mode, the Main machine sends messages to the Relay machine, which immediately sends the messages back. The Main machine receives the relayed message, and reports the round-trip latency. Therefore, the reported latency will be roughly double the latency compared to the latency reported in non-relay mode.

Single machine, single thread

An intra-thread configuration is experimentally supported, in which a publisher and subscriber both operate in the same thread. The publisher writes a messages, and the subscriber immediately takes it.

perf_test <options> -e INTRA_THREAD

Notes:

This is only available when zero copy transfer is enabled
This requires exactly one publisher and one subscriber
This is not compatible with roundtrip mode

Middleware plugins

The performance test tool can measure the performance of a variety of communication solutions from different vendors. In this case there is no rclcpp or rmw layer overhead over the publisher and subscriber routines.

The performance_test tool implements an executor that runs the publisher(s) and/or the subscriber(s) in their own thread.

The following plugins are currently implemented:

Eclipse Cyclone DDS

Eclipse Cyclone DDS 0.9.0b1
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
Communication plugin: -c CycloneDDS
Docker file: Dockerfile.CycloneDDS
Available transports:
- Cyclone DDS zero copy requires RouDi to be running. - | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP |

Eclipse Cyclone DDS C++ binding

Eclipse Cyclone DDS C++ bindings 0.9.0b1
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS_CXX
Communication plugin: -c CycloneDDS-CXX
Docker file: Dockerfile.CycloneDDS-CXX
Available transports:
- Cyclone DDS zero copy requires the RouDi to be running. - | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP |

Eclipse iceoryx

iceoryx (latest master as of Feb 13)
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=ICEORYX
Communication plugin: -c iceoryx
Docker file: Dockerfile.iceoryx
The iceoryx plugin is not a DDS implementation.
- The DDS-specific options (such as domain ID, durability, and reliability) do not apply.
To run with the iceoryx plugin, RouDi must be running.
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |———–|———————|———————————–| | LoanedSamples | LoanedSamples | Not supported by performance_test |

eProsima Fast DDS

FastDDS 2.6.2
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=FASTDDS
Communication plugin: -c FastRTPS
Docker file: Dockerfile.FastDDS
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), LoanedSamples (--zero-copy) | SHMEM (default), LoanedSamples (--zero-copy) | UDP |

OCI OpenDDS

OpenDDS 3.13.2
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=OPENDDS
Communication plugin: -c OpenDDS
Docker file: Dockerfile.OpenDDS
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | TCP | TCP | TCP |

RTI Connext DDS

RTI Connext DDS 5.3.1+
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CONNEXTDDS
Communication plugin: -c ConnextDDS
Docker file: Not available
A license is required
You need to source an RTI Connext DDS environment.
- If RTI Connext DDS was installed with ROS 2 (Linux only):
  - source /opt/rti.com/rti_connext_dds-5.3.1/setenv_ros2rti.bash
- If RTI Connext DDS is installed separately, you can source the following script to set the environment:
  - source <connextdds_install_path>/resource/scripts/rtisetenv_<arch>.bash
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |

RTI Connext DDS Micro

Connext DDS Micro 3.0.3
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CONNEXTDDSMICRO
Communication plugin: -c ConnextDDSMicro
Docker file: Not available
A license is required
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |

Framework plugins

The performance_test tool can also measure the end-to-end latency of a framework. In this case, the executor of the framework is used to run the publisher(s) and/or the subscriber(s). The potential overhead of the rclcpp or rmw layer is measured.

ROS 2

The performance test tool can also measure the performance of a variety of RMW implementations, through the ROS 2 rclcpp::publisher and rclcpp::subscriber API.

ROS 2 rclcpp::publisher and rclcpp::subscriber
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=ROS2 (default)
Communication plugin:
- Callback with Single Threaded Executor: -c rclcpp-single-threaded-executor
- Callback with Static Single Threaded Executor: -c rclcpp-static-single-threaded-executor
- rclcpp::WaitSet: -c rclcpp-waitset
Docker file: Dockerfile.rclcpp
Available underlying RMW implementations:
- ROS 2 Rolling is pre-configured to use rmw_fastrtps_cpp
- Follow these instructions to use a different RMW implementation
Available transports: depends on underlying RMW implementation
- LoanedSamples are available (--zero-copy) for ROS_DISTRO = foxy and above

Apex.OS

Apex.OS
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=APEX_OS
- It is also required to source /opt/ApexOS/setup.bash instead of a ROS 2 distribution
Communication plugin: -c ApexOSPollingSubscription
Docker file: Not available
Available underlying RMW implementations: rmw_apex_middleware
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero_copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero_copy) | UDP |

Analyze the results

After an experiment is run with the -l flag, a log file is recorded. Both CSV and JSON formats are supported. It is possible to add custom data to the log file by setting theAPEX_PERFORMANCE_TEST environment variable before running an experiment, e.g.

# JSON format
export APEX_PERFORMANCE_TEST="
{
\"My Version\": \"1.0.4\",
\"My Image Version\": \"5.2\",
\"My OS Version\": \"Ubuntu 16.04\"
}
"

Plot the results

To plot the results in the JSON or CSV log files, see the plotter README.

Architecture

Apex.AI’s Performance Testing in ROS 2 white paper (available here) describes how to design a fair and unbiased performance test, and is the basis for this project.

Each middleware has a different API. Thanks to the Plugin abstraction, the core logic of setting up and running an experiment is completely decoupled from the implementation details of sending and receiving individual messages.

Exactly one Plugin implementation is selected at build time. The design is similar to the Abstract Factory pattern. performance_test declares, but does not define, a static factory method in the PluginFactory class. Each middleware provides a definition for this factory method to create a concrete Plugin implementation, and perf_test calls this factory method directly.

An example plugin is available here.

Performance optimizations

On linux-based platforms, perf_test writes 0 to /dev/cpu_dma_latency and holds open the file handle, which will prevent the CPU from entering any idle states for the duration of the experiment. This should result in lower message latency and lower variance in that latency.

Future extensions and limitations

Communication frameworks like DDS have a huge amount of settings. This tool only allows the most common QOS settings to be configured. The other QOS settings are hardcoded in the application.
Only one publisher per topic is allowed, because the data verification logic does not support matching data to the different publishers.
Some communication plugins can get stuck in their internal loops if too much data is received. Figuring out ways around such issues is one of the goals of this tool.
FastRTPS wait-set does not support timeouts which can lead to the receiving not aborting. In that case the performance test must be manually killed.
Using Connext DDS Micro INTRA transport with reliable QoS and history kind set to keep_all is not supported with Connext Micro. Set keep-last as QoS history kind always when using reliable.

Possible additional communication which could be implemented are:

Raw UDP communication

Building with limited resources

When building this tool, the compiler must perform a lot of template expansion. This can be overwhelming for a system with a low-power CPU or limited RAM. There are some additional CMake options which can reduce the system load during compilation:

This tool includes many different message types, each with many different sizes. Reduce the number of messages, and thus the compilation load, by disabling one or more message types. For example, to build without PointCloud messages, add -DENABLE_MSGS_POINDCLOUD=OFF to the --cmake-args. The message types, and their options for enabling/disabling, can be found here.

CHANGELOG

Changelog for package performance_test

X.Y.Z (YYYY/MM/DD)

2.3.0 (2024/09/24)

Removed

Moved apex_performance_plotter to its own package here

2.2.0 (2024/05/15)

Added

performance_test can be built with ROS 2 Iron and Jazzy
Changed
Renamed the --dds-domain_id CLI arg to --dds-domain-id
When --dds-domain-id is unspecified, fall back to the ROS_DOMAIN_ID environment variable
--zero-copy has been separated into two flags:
- --shared-memory: Enable shared-memory transfer in the plugin. This is meant to replace the need to manually set runtime flags via CYCLONEDDS_URI, APEX_MIDDLEWARE_SETTINGS, etc.
- --loaned-samples: When publishing messages in the plugin, borrow loaned samples instead of publishing by copy
- --zero-copy is now an alias for --shared-memory --loaned-samples
- Supported plugins include:
  - -c CycloneDDS
  - -c CycloneDDS-CXX
  - -c ApexOSPollingSubscription
  - -c rclcpp-* with RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
  - -c rclcpp-* with RMW_IMPLEMENTATION=rmw_fastrtps_cpp

2.1.0 (2024/04/17)

Added

Add new function prepare() to the Publisher and Subscriber API, intended to allow participant discovery without blocking the main thread
Changed
Change the default --history arg from KEEP_ALL to KEEP_LAST
Change the default --history-depth arg from 1000 to 16
If --expected-num-pubs is unspecified, set it to the same value as -p
If --expected-num-subs is unspecified, set it to the same value as -s
Fixed
Removed an unused variable to fix a Clang build
Remove unused variable names in the Plugin abstract class
Fix a potential lockup in PublisherTask on QNX

2.0.0 (2024/03/19)

Added

Add experimental bazel support
- bazel build //performance_test --//:plugin_implementation=//path/to/a/plugin
Add a rudimentary socket-based plugin for testing the bazel support
- bazel run //performance_test --//:plugin_implementation=//performance_test/plugins/demo:demo_plugin -- --help
  Changed
Instead of enabling/disabling each plugin, you select exactly one with a CMake string option, for example:
- colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=ROS2
Renamed the --communication CLI arg to --communicator. The short -c is unchanged.
Removed
Removed the deprecated CLI flags for QOS settings:
- Instead of --reliable, use --reliability RELIABLE
- Instead of --transient, use --durability TRANSIENT_LOCAL
- Instead of --keep-last, use --history KEEP_LAST
Removed the obsolete BoundedSequenceFlat messages
Removed the superfluous --msg-list CLI flag. The --help message already lists the available messages.
Fixed
Update the Apex.OS Runner to use executor_runner::deferred instead of executor_runner::deferred_tag()
Ensure that the first few published samples are sent at the expected rate

1.5.2 (YYYY/MM/DD)

Added

--prevent-cpu-idle is available on QNX
Changed
JSON log files will contain all values in the APEX_PERFORMANCE_TEST dictionary, instead of the five specific values used previously
Switch to build as C++17 by default
Fixed
Zero copy transfer is again enabled for the rclcpp publisher

1.5.0 (2023/06/14)

Added

New CLI switch --prevent-cpu-idle (linux only). When specified, perf_test will use /dev/cpu_dma_latency to request that the CPU not enter any sleep states, to potentially give more consistent results
Some smaller Array messages, down to 32 bits
Added support to the FastDDS plugin for bounded and unbounded sequences
Changed
Update the README to better explain how to use this tool with Apex.OS
In the Runner, allocate the AnalysisResults on the stack instead of using shared_ptr
Subscriber methods accept a callback parameter, instead of returning a vector of results, to reduce heap usage
Refactored the interaction between SubscriberStats and AnalysisResult to remove the need for a std::vector of latency samples, to reduce heap usage
Adjusted the Array message sizes to make the name match the contents
Updated apex_os_communicator to use the new zero-copy API

1.4.2 (2023/03/15)

Added

Added perfplot support for JSON log files
Changed
Migrate the Apex.OS target to use rosidl_get_typesupport_target
Preallocate the JSON logger’s string buffer to prevent reallocations after the experiment begins

1.4.1 (2023/02/23)

Changed

Updated the iceoryx plugin to the latest master as of Feb 13

1.4.0 (2023/02/20)

Added

New message type BoundedSequenceFlat
- This is a BoundedSequence with the @flat annotation
- Sizes range from 1kB to 8MB, like Array and BoundedSequence
  Changed
Messages of different types can be optionally included via CMake args:
- -DENABLE_MSGS_ARRAY (default ON)
- -DENABLE_MSGS_STRUCT (default ON)
- -DENABLE_MSGS_POINT_CLOUD (default ON)
- -DENABLE_MSGS_BOUNDED_SEQUENCE (default OFF)
- -DENABLE_MSGS_BOUNDED_SEQUENCE_FLAT (default OFF)
- -DENABLE_MSGS_UNBOUNDED_SEQUENCE (default OFF)
- -DENABLE_MSGS_ALL (default OFF)
  - when ON, overrides the other defaults to ON
  - you can still optionally exclude some messages by explicitly setting them to OFF
    Removed
Removed a few messages:
- Range
- RadarTrack
- RadarDetection
- NavSatFix
  Fixed
In all cases, including loaned messages, capture the timestamp as the last step of initializing the message

1.3.7 (2023/01/04)

1.3.6 (2023/01/03)

Fixed

Set the correct IDL_GEN_ROOT for rclcpp plugins

1.3.5 (2022/12/05)

Fixed

Exit cleanly when a publisher process terminates before a subscriber process

1.3.4 (2022/11/28)

Changed

Updated Apex.OS plugins to use the unified LoanedSample::data()

1.3.3 (2022/11/28)

Fixed

Implement the missing take() method in ApexOSPollingSubscriptionSubscriber

1.3.2 (2022/11/21)

Fixed

Capture the this pointer in the lambda in the iceoryx publisher

1.3.1 (2022/11/21)

Added

New Apex.OS plugin, compatible with the ThreadedRunners
- The INTER_THREAD and INTRA_THREAD execution strategies, combined with -c ApexOSPollingSubscription, will use the ThreadedRunner instances
- The new APEX_SINGLE_EXECUTOR execution strategy will add all publishers and subscribers to a single Apex.OS Executor
- The new APEX_EXECUTOR_PER_COMMUNICATOR execution strategy will add each publisher and each subscriber to its own Apex.OS Executor instance
- The new APEX_CHAIN execution strategy will add a publisher and subscriber as a chain of nodes to an Apex.OS Executor
  Changed
Refactored FastRTPS communicator plugin:
- Uses DDS compliant API
- Code generator updated
- Implementation for publish_loaned()
- Dockerfile improvements
  Removed
CLI arg --disable-async. Synchronous / asynchronous publishing should be configured externally depending on the communication mean used.

1.3.0 (2022/08/25)

Added

New execution strategy option:
- The default -e INTER_THREAD runs each publisher and subscriber in its own separate thread, which matches the previous behavior
- A new -e INTRA_THREAD, which runs a single publisher and subscriber in the same thread. The publisher writes, and the subscriber immediately takes it
- For Apex.OS specifically, some optimized execution strategies which use the proprietary Apex.OS executor
  Changed
Significantly refactored the communicator plugins:
- Each plugin is split into an implementation of a Publisher and a Subscriber, instead of a single Communicator
- The plugin is no longer responsible for managing the metrics, such as sample count, lost samples, and latency
- The plugin does not require any special logic to support roundtrip mode
- It is safe for the plugins to initialize their data writers and readers at construction time, instead of delaying the initialization to the first call of publish() or update_subscription()
- Split publish() into publish_copy() and publish_loaned()
Significantly refactored the runner framework:
- The runner framework is responsible for the experiment metrics
- It manages the roundtrip mode logic
- It is extensible for different execution strategies or thread configurations
The iceoryx plugin now uses the untyped API, for improved performance

1.2.1 (2022/06/30)

Fixed

Capture the timestamp as soon as a message is received, instead of just before storing the metrics, to reduce the reported latency to a more correct value

1.2.0 (2022/06/28)

Changed

The CLI arguments for specifying the output type have changed:
- For console output, updated every second, add --print-to-console
- For file output, use --logfile my_file.csv or --logfile my_file.json
  - The type will be deduced from the file name
- If neither of these options is specified, then a warning will print, and the experiment will still run
The linter configurations are now configured locally. This means that the output of colcon test should be the same no matter the installed ROS distribution.
The --zero-copy arg is now valid even if the publisher and subscriber(s) are in the same process
Removed
The publisher and subscriber loop reserve metrics are no longer recorded or reported
Fixed
CPU usage will no longer be stuck at 0

Removed

The pub/sub loop reserve time metrics

1.1.2 (2022/06/08)

Changed

Use steady_clock for all platforms, including QNX QOS

1.1.1 (2022/06/07)

Changed

Significant refactor to simplify the analysis pipeline
Fixed
Add some missing definitions when Apex.OS is enabled, but the rclcpp plugins are disabled

1.1.0 (2022/06/02)

Added

New Apex.OS Polling Subscription plugin
Compatibility with ROS 2 Humble

1.0.0 (2022/05/12)

Added

More expressive perf_test CLI args for QOS settings
A plugin for Cyclone DDS with C++ bindings v0.9.0b1
Changed
CLI args for QOS settings:
- --reliability <RELIABLE|BEST_EFFORT>
- --durability <TRANSIENT_LOCAL|VOLATILE>
- --history <KEEP_LAST|KEEP_ALL>
master branch is compatible with many ROS 2 distributions:
- dashing
- eloquent
- foxy
- galactic
- rolling
  Deprecated
CLI flags for QOS settings:
- --reliable
- --transient
- --keep-last
  Removed
The branches for specific ROS 2 distributions have been deleted
Fixed
CI jobs and Dockerfiles are decoupled from the middleware bundled with the ROS 2 distribution

Wiki Tutorials

This package does not provide any links to tutorials in it's rosindex metadata. You can check on the ROS Wiki Tutorials page for the package.

Package Dependencies

Deps	Name
	rclcpp
	ros_environment
	ament_cmake
	rosidl_default_generators
	rmw_implementation
	rosidl_default_runtime
	ament_cmake_gtest
	ament_lint_auto
	ament_lint_common

System Dependencies

Name
git

Dependant Packages

Name	Deps
performance_report

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `performance_test` at Robotics Stack Exchange

performance_test package from performance_test repo

performance_report performance_test performance_test_ros1_msgs performance_test_ros1_publisher

Package Summary

Tags	No category tags.
Version	2.3.0
License	Apache 2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Description
Checkout URI	https://gitlab.com/ApexAI/performance_test.git
VCS Type	git
VCS Version	master
Last Updated	2025-03-25
Dev Status	MAINTAINED
CI status	No Continuous Integration
Released	RELEASED
Tags	No category tags.
Contributing	Help Wanted (0) Good First Issues (0) Pull Requests to Review (0)

Package Description

Tool to test performance of ROS 2 and DDS data layers and communication.

Additional Links

No additional links.

Maintainers

Apex.AI, Inc.

Authors

No additional authors.

performance_test

[TOC]

The performance_test tool tests latency and other performance metrics of various middleware implementations that support a pub/sub pattern. It is used to simulate non-functional performance of your application.

The performance_test tool allows you to quickly set up a pub/sub configuration, e.g. number of publisher/subscribers, message size, QOS settings, middleware. The following metrics are automatically recorded when the application is running:

latency: corresponds to the time a message takes to travel from a publisher to subscriber. The latency is measured by timestamping the sample when it’s published and subtracting the timestamp (from the sample) from the measured time when the sample arrives at the subscriber (only logged when a subscriber is created)
CPU usage: percentage of the total system wide CPU usage (logged separately for each instance of perf_test)
resident memory: heap allocations, shared memory segments, stack (used for system’s internal work) (logged separately for each instance of perf_test)
sample statistics: number of samples received, sent, and lost per experiment run.

This master branch is compatible with the following ROS 2 versions

rolling
jazzy
iron
humble
galactic
foxy
eloquent
dashing
Apex.OS

How to use this document

Start here for a quick example of building and running the performance_test tool with the Cyclone DDS plugin.
If needed, find more detailed information about building and running
Or, if the quick example is good enough, skip ahead to the list of supported middleware plugins to learn how to test a specific middleware implementation.
Check out the tools for visualizing the results
If desired, read about the design and architecture of the tool.

Example

This example shows how to test the non-functional performance of the following configuration:

Option	Value
Plugin	Cyclone DDS
Message type	Array1k
Publishing rate	100Hz
Topic name	test_topic
Duration of the experiment	30s
Number of publisher(s)	1 (default)
Number of subscriber(s)	1 (default)

Install ROS 2
Install Cyclone DDS to /opt/cyclonedds
Build performance_test with the CMake build flag for Cyclone DDS:

    source /opt/ros/rolling/setup.bash
    cd ~/perf_test_ws
    colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
    source ./install/setup.bash
    

Run with the communication plugin option for Cyclone DDS:

mkdir experiment
./install/performance_test/lib/performance_test/perf_test --communication CycloneDDS
                                                          --msg Array1k
                                                          --rate 100
                                                          --topic test_topic
                                                          --max-runtime 30
                                                          --logfile experiment/log.csv

At the end of the experiment, a CSV log file will be generated in the experiment folder with a name that starts with log.

Building the performance_test tool

For a simple example, see Dockerfile.rclcpp.

The performance_test tool is structured as a ROS 2 package, so colcon is used to build it. Therefore, you must source a ROS 2 installation:

source /opt/ros/rolling/setup.bash

Select a middleware plugin from this list. Then build the performance_test tool with the selected middleware:

mkdir -p ~/perf_test_ws/src
cd ~/perf_test_ws/src
git clone https://gitlab.com/ApexAI/performance_test.git
cd ..
# At this stage, you need to choose which middleware you want to use
# The list of available flags is described in the middleware plugins section
# Square brackets denote optional arguments, like in the Python documentation.
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release -DPERFORMANCE_TEST_PLUGIN=<plugin>
source install/setup.bash

Running an experiment

The performance_test experiments are run through the perf_test executable. To find the available settings, run with --help (note the required and default arguments):

~/perf_test_ws$ ./install/performance_test/lib/performance_test/perf_test --help

The -c argument should match the selected middleware plugin from the build phase.
The --msg argument should be one of the supported message types, which are shown in the --help output.

Single machine or distributed system?

Based on the configuration you want to test, the usage of the performance_test tool differs. The different possibilities are explained below.

For running tests on a single machine, you can choose between the following options:

Intraprocess means that the publisher and subscriber threads are in the same process.

    perf_test <options> --num-sub-threads 1 --num-pub-threads 1

Interprocess means that the publisher and subscriber are in different processes. To test interprocess communication, two instances of the performance_test must be run, e.g.

    # Start the subscriber first
    perf_test <options> --num-sub-threads 1 --num-pub-threads 0 &
    sleep 1  # give the subscriber time to finish initializing
    perf_test <options> --num-sub-threads 0 --num-pub-threads 1
    

On a distributed system, testing latency is difficult, because the clocks are probably not perfectly synchronized between the two devices. To work around this, the performance_test tool supports relay mode, which allows for a round-trip style of communication:

# On the main machine
perf_test <options> --roundtrip-mode Main

# On the relay machine:
perf_test <options> --roundtrip-mode Relay

In relay mode, the Main machine sends messages to the Relay machine, which immediately sends the messages back. The Main machine receives the relayed message, and reports the round-trip latency. Therefore, the reported latency will be roughly double the latency compared to the latency reported in non-relay mode.

Single machine, single thread

An intra-thread configuration is experimentally supported, in which a publisher and subscriber both operate in the same thread. The publisher writes a messages, and the subscriber immediately takes it.

perf_test <options> -e INTRA_THREAD

Notes:

This is only available when zero copy transfer is enabled
This requires exactly one publisher and one subscriber
This is not compatible with roundtrip mode

Middleware plugins

The performance test tool can measure the performance of a variety of communication solutions from different vendors. In this case there is no rclcpp or rmw layer overhead over the publisher and subscriber routines.

The performance_test tool implements an executor that runs the publisher(s) and/or the subscriber(s) in their own thread.

The following plugins are currently implemented:

Eclipse Cyclone DDS

Eclipse Cyclone DDS 0.9.0b1
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
Communication plugin: -c CycloneDDS
Docker file: Dockerfile.CycloneDDS
Available transports:
- Cyclone DDS zero copy requires RouDi to be running. - | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP |

Eclipse Cyclone DDS C++ binding

Eclipse Cyclone DDS C++ bindings 0.9.0b1
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS_CXX
Communication plugin: -c CycloneDDS-CXX
Docker file: Dockerfile.CycloneDDS-CXX
Available transports:
- Cyclone DDS zero copy requires the RouDi to be running. - | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP |

Eclipse iceoryx

iceoryx (latest master as of Feb 13)
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=ICEORYX
Communication plugin: -c iceoryx
Docker file: Dockerfile.iceoryx
The iceoryx plugin is not a DDS implementation.
- The DDS-specific options (such as domain ID, durability, and reliability) do not apply.
To run with the iceoryx plugin, RouDi must be running.
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |———–|———————|———————————–| | LoanedSamples | LoanedSamples | Not supported by performance_test |

eProsima Fast DDS

FastDDS 2.6.2
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=FASTDDS
Communication plugin: -c FastRTPS
Docker file: Dockerfile.FastDDS
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), LoanedSamples (--zero-copy) | SHMEM (default), LoanedSamples (--zero-copy) | UDP |

OCI OpenDDS

OpenDDS 3.13.2
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=OPENDDS
Communication plugin: -c OpenDDS
Docker file: Dockerfile.OpenDDS
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | TCP | TCP | TCP |

RTI Connext DDS

RTI Connext DDS 5.3.1+
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CONNEXTDDS
Communication plugin: -c ConnextDDS
Docker file: Not available
A license is required
You need to source an RTI Connext DDS environment.
- If RTI Connext DDS was installed with ROS 2 (Linux only):
  - source /opt/rti.com/rti_connext_dds-5.3.1/setenv_ros2rti.bash
- If RTI Connext DDS is installed separately, you can source the following script to set the environment:
  - source <connextdds_install_path>/resource/scripts/rtisetenv_<arch>.bash
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |

RTI Connext DDS Micro

Connext DDS Micro 3.0.3
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CONNEXTDDSMICRO
Communication plugin: -c ConnextDDSMicro
Docker file: Not available
A license is required
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |

Framework plugins

The performance_test tool can also measure the end-to-end latency of a framework. In this case, the executor of the framework is used to run the publisher(s) and/or the subscriber(s). The potential overhead of the rclcpp or rmw layer is measured.

ROS 2

The performance test tool can also measure the performance of a variety of RMW implementations, through the ROS 2 rclcpp::publisher and rclcpp::subscriber API.

ROS 2 rclcpp::publisher and rclcpp::subscriber
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=ROS2 (default)
Communication plugin:
- Callback with Single Threaded Executor: -c rclcpp-single-threaded-executor
- Callback with Static Single Threaded Executor: -c rclcpp-static-single-threaded-executor
- rclcpp::WaitSet: -c rclcpp-waitset
Docker file: Dockerfile.rclcpp
Available underlying RMW implementations:
- ROS 2 Rolling is pre-configured to use rmw_fastrtps_cpp
- Follow these instructions to use a different RMW implementation
Available transports: depends on underlying RMW implementation
- LoanedSamples are available (--zero-copy) for ROS_DISTRO = foxy and above

Apex.OS

Apex.OS
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=APEX_OS
- It is also required to source /opt/ApexOS/setup.bash instead of a ROS 2 distribution
Communication plugin: -c ApexOSPollingSubscription
Docker file: Not available
Available underlying RMW implementations: rmw_apex_middleware
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero_copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero_copy) | UDP |

Analyze the results

After an experiment is run with the -l flag, a log file is recorded. Both CSV and JSON formats are supported. It is possible to add custom data to the log file by setting theAPEX_PERFORMANCE_TEST environment variable before running an experiment, e.g.

# JSON format
export APEX_PERFORMANCE_TEST="
{
\"My Version\": \"1.0.4\",
\"My Image Version\": \"5.2\",
\"My OS Version\": \"Ubuntu 16.04\"
}
"

Plot the results

To plot the results in the JSON or CSV log files, see the plotter README.

Architecture

Apex.AI’s Performance Testing in ROS 2 white paper (available here) describes how to design a fair and unbiased performance test, and is the basis for this project.

Each middleware has a different API. Thanks to the Plugin abstraction, the core logic of setting up and running an experiment is completely decoupled from the implementation details of sending and receiving individual messages.

Exactly one Plugin implementation is selected at build time. The design is similar to the Abstract Factory pattern. performance_test declares, but does not define, a static factory method in the PluginFactory class. Each middleware provides a definition for this factory method to create a concrete Plugin implementation, and perf_test calls this factory method directly.

An example plugin is available here.

Performance optimizations

On linux-based platforms, perf_test writes 0 to /dev/cpu_dma_latency and holds open the file handle, which will prevent the CPU from entering any idle states for the duration of the experiment. This should result in lower message latency and lower variance in that latency.

Future extensions and limitations

Communication frameworks like DDS have a huge amount of settings. This tool only allows the most common QOS settings to be configured. The other QOS settings are hardcoded in the application.
Only one publisher per topic is allowed, because the data verification logic does not support matching data to the different publishers.
Some communication plugins can get stuck in their internal loops if too much data is received. Figuring out ways around such issues is one of the goals of this tool.
FastRTPS wait-set does not support timeouts which can lead to the receiving not aborting. In that case the performance test must be manually killed.
Using Connext DDS Micro INTRA transport with reliable QoS and history kind set to keep_all is not supported with Connext Micro. Set keep-last as QoS history kind always when using reliable.

Possible additional communication which could be implemented are:

Raw UDP communication

Building with limited resources

When building this tool, the compiler must perform a lot of template expansion. This can be overwhelming for a system with a low-power CPU or limited RAM. There are some additional CMake options which can reduce the system load during compilation:

This tool includes many different message types, each with many different sizes. Reduce the number of messages, and thus the compilation load, by disabling one or more message types. For example, to build without PointCloud messages, add -DENABLE_MSGS_POINDCLOUD=OFF to the --cmake-args. The message types, and their options for enabling/disabling, can be found here.

CHANGELOG

Changelog for package performance_test

X.Y.Z (YYYY/MM/DD)

2.3.0 (2024/09/24)

Removed

Moved apex_performance_plotter to its own package here

2.2.0 (2024/05/15)

Added

performance_test can be built with ROS 2 Iron and Jazzy
Changed
Renamed the --dds-domain_id CLI arg to --dds-domain-id
When --dds-domain-id is unspecified, fall back to the ROS_DOMAIN_ID environment variable
--zero-copy has been separated into two flags:
- --shared-memory: Enable shared-memory transfer in the plugin. This is meant to replace the need to manually set runtime flags via CYCLONEDDS_URI, APEX_MIDDLEWARE_SETTINGS, etc.
- --loaned-samples: When publishing messages in the plugin, borrow loaned samples instead of publishing by copy
- --zero-copy is now an alias for --shared-memory --loaned-samples
- Supported plugins include:
  - -c CycloneDDS
  - -c CycloneDDS-CXX
  - -c ApexOSPollingSubscription
  - -c rclcpp-* with RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
  - -c rclcpp-* with RMW_IMPLEMENTATION=rmw_fastrtps_cpp

2.1.0 (2024/04/17)

Added

Add new function prepare() to the Publisher and Subscriber API, intended to allow participant discovery without blocking the main thread
Changed
Change the default --history arg from KEEP_ALL to KEEP_LAST
Change the default --history-depth arg from 1000 to 16
If --expected-num-pubs is unspecified, set it to the same value as -p
If --expected-num-subs is unspecified, set it to the same value as -s
Fixed
Removed an unused variable to fix a Clang build
Remove unused variable names in the Plugin abstract class
Fix a potential lockup in PublisherTask on QNX

2.0.0 (2024/03/19)

Added

Add experimental bazel support
- bazel build //performance_test --//:plugin_implementation=//path/to/a/plugin
Add a rudimentary socket-based plugin for testing the bazel support
- bazel run //performance_test --//:plugin_implementation=//performance_test/plugins/demo:demo_plugin -- --help
  Changed
Instead of enabling/disabling each plugin, you select exactly one with a CMake string option, for example:
- colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=ROS2
Renamed the --communication CLI arg to --communicator. The short -c is unchanged.
Removed
Removed the deprecated CLI flags for QOS settings:
- Instead of --reliable, use --reliability RELIABLE
- Instead of --transient, use --durability TRANSIENT_LOCAL
- Instead of --keep-last, use --history KEEP_LAST
Removed the obsolete BoundedSequenceFlat messages
Removed the superfluous --msg-list CLI flag. The --help message already lists the available messages.
Fixed
Update the Apex.OS Runner to use executor_runner::deferred instead of executor_runner::deferred_tag()
Ensure that the first few published samples are sent at the expected rate

1.5.2 (YYYY/MM/DD)

Added

--prevent-cpu-idle is available on QNX
Changed
JSON log files will contain all values in the APEX_PERFORMANCE_TEST dictionary, instead of the five specific values used previously
Switch to build as C++17 by default
Fixed
Zero copy transfer is again enabled for the rclcpp publisher

1.5.0 (2023/06/14)

Added

New CLI switch --prevent-cpu-idle (linux only). When specified, perf_test will use /dev/cpu_dma_latency to request that the CPU not enter any sleep states, to potentially give more consistent results
Some smaller Array messages, down to 32 bits
Added support to the FastDDS plugin for bounded and unbounded sequences
Changed
Update the README to better explain how to use this tool with Apex.OS
In the Runner, allocate the AnalysisResults on the stack instead of using shared_ptr
Subscriber methods accept a callback parameter, instead of returning a vector of results, to reduce heap usage
Refactored the interaction between SubscriberStats and AnalysisResult to remove the need for a std::vector of latency samples, to reduce heap usage
Adjusted the Array message sizes to make the name match the contents
Updated apex_os_communicator to use the new zero-copy API

1.4.2 (2023/03/15)

Added

Added perfplot support for JSON log files
Changed
Migrate the Apex.OS target to use rosidl_get_typesupport_target
Preallocate the JSON logger’s string buffer to prevent reallocations after the experiment begins

1.4.1 (2023/02/23)

Changed

Updated the iceoryx plugin to the latest master as of Feb 13

1.4.0 (2023/02/20)

Added

New message type BoundedSequenceFlat
- This is a BoundedSequence with the @flat annotation
- Sizes range from 1kB to 8MB, like Array and BoundedSequence
  Changed
Messages of different types can be optionally included via CMake args:
- -DENABLE_MSGS_ARRAY (default ON)
- -DENABLE_MSGS_STRUCT (default ON)
- -DENABLE_MSGS_POINT_CLOUD (default ON)
- -DENABLE_MSGS_BOUNDED_SEQUENCE (default OFF)
- -DENABLE_MSGS_BOUNDED_SEQUENCE_FLAT (default OFF)
- -DENABLE_MSGS_UNBOUNDED_SEQUENCE (default OFF)
- -DENABLE_MSGS_ALL (default OFF)
  - when ON, overrides the other defaults to ON
  - you can still optionally exclude some messages by explicitly setting them to OFF
    Removed
Removed a few messages:
- Range
- RadarTrack
- RadarDetection
- NavSatFix
  Fixed
In all cases, including loaned messages, capture the timestamp as the last step of initializing the message

1.3.7 (2023/01/04)

1.3.6 (2023/01/03)

Fixed

Set the correct IDL_GEN_ROOT for rclcpp plugins

1.3.5 (2022/12/05)

Fixed

Exit cleanly when a publisher process terminates before a subscriber process

1.3.4 (2022/11/28)

Changed

Updated Apex.OS plugins to use the unified LoanedSample::data()

1.3.3 (2022/11/28)

Fixed

Implement the missing take() method in ApexOSPollingSubscriptionSubscriber

1.3.2 (2022/11/21)

Fixed

Capture the this pointer in the lambda in the iceoryx publisher

1.3.1 (2022/11/21)

Added

New Apex.OS plugin, compatible with the ThreadedRunners
- The INTER_THREAD and INTRA_THREAD execution strategies, combined with -c ApexOSPollingSubscription, will use the ThreadedRunner instances
- The new APEX_SINGLE_EXECUTOR execution strategy will add all publishers and subscribers to a single Apex.OS Executor
- The new APEX_EXECUTOR_PER_COMMUNICATOR execution strategy will add each publisher and each subscriber to its own Apex.OS Executor instance
- The new APEX_CHAIN execution strategy will add a publisher and subscriber as a chain of nodes to an Apex.OS Executor
  Changed
Refactored FastRTPS communicator plugin:
- Uses DDS compliant API
- Code generator updated
- Implementation for publish_loaned()
- Dockerfile improvements
  Removed
CLI arg --disable-async. Synchronous / asynchronous publishing should be configured externally depending on the communication mean used.

1.3.0 (2022/08/25)

Added

New execution strategy option:
- The default -e INTER_THREAD runs each publisher and subscriber in its own separate thread, which matches the previous behavior
- A new -e INTRA_THREAD, which runs a single publisher and subscriber in the same thread. The publisher writes, and the subscriber immediately takes it
- For Apex.OS specifically, some optimized execution strategies which use the proprietary Apex.OS executor
  Changed
Significantly refactored the communicator plugins:
- Each plugin is split into an implementation of a Publisher and a Subscriber, instead of a single Communicator
- The plugin is no longer responsible for managing the metrics, such as sample count, lost samples, and latency
- The plugin does not require any special logic to support roundtrip mode
- It is safe for the plugins to initialize their data writers and readers at construction time, instead of delaying the initialization to the first call of publish() or update_subscription()
- Split publish() into publish_copy() and publish_loaned()
Significantly refactored the runner framework:
- The runner framework is responsible for the experiment metrics
- It manages the roundtrip mode logic
- It is extensible for different execution strategies or thread configurations
The iceoryx plugin now uses the untyped API, for improved performance

1.2.1 (2022/06/30)

Fixed

Capture the timestamp as soon as a message is received, instead of just before storing the metrics, to reduce the reported latency to a more correct value

1.2.0 (2022/06/28)

Changed

The CLI arguments for specifying the output type have changed:
- For console output, updated every second, add --print-to-console
- For file output, use --logfile my_file.csv or --logfile my_file.json
  - The type will be deduced from the file name
- If neither of these options is specified, then a warning will print, and the experiment will still run
The linter configurations are now configured locally. This means that the output of colcon test should be the same no matter the installed ROS distribution.
The --zero-copy arg is now valid even if the publisher and subscriber(s) are in the same process
Removed
The publisher and subscriber loop reserve metrics are no longer recorded or reported
Fixed
CPU usage will no longer be stuck at 0

Removed

The pub/sub loop reserve time metrics

1.1.2 (2022/06/08)

Changed

Use steady_clock for all platforms, including QNX QOS

1.1.1 (2022/06/07)

Changed

Significant refactor to simplify the analysis pipeline
Fixed
Add some missing definitions when Apex.OS is enabled, but the rclcpp plugins are disabled

1.1.0 (2022/06/02)

Added

New Apex.OS Polling Subscription plugin
Compatibility with ROS 2 Humble

1.0.0 (2022/05/12)

Added

More expressive perf_test CLI args for QOS settings
A plugin for Cyclone DDS with C++ bindings v0.9.0b1
Changed
CLI args for QOS settings:
- --reliability <RELIABLE|BEST_EFFORT>
- --durability <TRANSIENT_LOCAL|VOLATILE>
- --history <KEEP_LAST|KEEP_ALL>
master branch is compatible with many ROS 2 distributions:
- dashing
- eloquent
- foxy
- galactic
- rolling
  Deprecated
CLI flags for QOS settings:
- --reliable
- --transient
- --keep-last
  Removed
The branches for specific ROS 2 distributions have been deleted
Fixed
CI jobs and Dockerfiles are decoupled from the middleware bundled with the ROS 2 distribution

Wiki Tutorials

This package does not provide any links to tutorials in it's rosindex metadata. You can check on the ROS Wiki Tutorials page for the package.

Package Dependencies

Deps	Name
	rclcpp
	ros_environment
	ament_cmake
	rosidl_default_generators
	rmw_implementation
	rosidl_default_runtime
	ament_cmake_gtest
	ament_lint_auto
	ament_lint_common

System Dependencies

Name
git

Dependant Packages

Name	Deps
performance_report

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `performance_test` at Robotics Stack Exchange

performance_test package from performance_test repo

performance_report performance_test performance_test_ros1_msgs performance_test_ros1_publisher

Package Summary

Tags	No category tags.
Version	2.3.0
License	Apache 2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Description
Checkout URI	https://gitlab.com/ApexAI/performance_test.git
VCS Type	git
VCS Version	master
Last Updated	2025-03-25
Dev Status	MAINTAINED
CI status	No Continuous Integration
Released	RELEASED
Tags	No category tags.
Contributing	Help Wanted (0) Good First Issues (0) Pull Requests to Review (0)

Package Description

Tool to test performance of ROS 2 and DDS data layers and communication.

Additional Links

No additional links.

Maintainers

Apex.AI, Inc.

Authors

No additional authors.

performance_test

[TOC]

The performance_test tool tests latency and other performance metrics of various middleware implementations that support a pub/sub pattern. It is used to simulate non-functional performance of your application.

The performance_test tool allows you to quickly set up a pub/sub configuration, e.g. number of publisher/subscribers, message size, QOS settings, middleware. The following metrics are automatically recorded when the application is running:

latency: corresponds to the time a message takes to travel from a publisher to subscriber. The latency is measured by timestamping the sample when it’s published and subtracting the timestamp (from the sample) from the measured time when the sample arrives at the subscriber (only logged when a subscriber is created)
CPU usage: percentage of the total system wide CPU usage (logged separately for each instance of perf_test)
resident memory: heap allocations, shared memory segments, stack (used for system’s internal work) (logged separately for each instance of perf_test)
sample statistics: number of samples received, sent, and lost per experiment run.

This master branch is compatible with the following ROS 2 versions

rolling
jazzy
iron
humble
galactic
foxy
eloquent
dashing
Apex.OS

How to use this document

Start here for a quick example of building and running the performance_test tool with the Cyclone DDS plugin.
If needed, find more detailed information about building and running
Or, if the quick example is good enough, skip ahead to the list of supported middleware plugins to learn how to test a specific middleware implementation.
Check out the tools for visualizing the results
If desired, read about the design and architecture of the tool.

Example

This example shows how to test the non-functional performance of the following configuration:

Option	Value
Plugin	Cyclone DDS
Message type	Array1k
Publishing rate	100Hz
Topic name	test_topic
Duration of the experiment	30s
Number of publisher(s)	1 (default)
Number of subscriber(s)	1 (default)

Install ROS 2
Install Cyclone DDS to /opt/cyclonedds
Build performance_test with the CMake build flag for Cyclone DDS:

    source /opt/ros/rolling/setup.bash
    cd ~/perf_test_ws
    colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
    source ./install/setup.bash
    

Run with the communication plugin option for Cyclone DDS:

mkdir experiment
./install/performance_test/lib/performance_test/perf_test --communication CycloneDDS
                                                          --msg Array1k
                                                          --rate 100
                                                          --topic test_topic
                                                          --max-runtime 30
                                                          --logfile experiment/log.csv

At the end of the experiment, a CSV log file will be generated in the experiment folder with a name that starts with log.

Building the performance_test tool

For a simple example, see Dockerfile.rclcpp.

The performance_test tool is structured as a ROS 2 package, so colcon is used to build it. Therefore, you must source a ROS 2 installation:

source /opt/ros/rolling/setup.bash

Select a middleware plugin from this list. Then build the performance_test tool with the selected middleware:

mkdir -p ~/perf_test_ws/src
cd ~/perf_test_ws/src
git clone https://gitlab.com/ApexAI/performance_test.git
cd ..
# At this stage, you need to choose which middleware you want to use
# The list of available flags is described in the middleware plugins section
# Square brackets denote optional arguments, like in the Python documentation.
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release -DPERFORMANCE_TEST_PLUGIN=<plugin>
source install/setup.bash

Running an experiment

The performance_test experiments are run through the perf_test executable. To find the available settings, run with --help (note the required and default arguments):

~/perf_test_ws$ ./install/performance_test/lib/performance_test/perf_test --help

The -c argument should match the selected middleware plugin from the build phase.
The --msg argument should be one of the supported message types, which are shown in the --help output.

Single machine or distributed system?

Based on the configuration you want to test, the usage of the performance_test tool differs. The different possibilities are explained below.

For running tests on a single machine, you can choose between the following options:

Intraprocess means that the publisher and subscriber threads are in the same process.

    perf_test <options> --num-sub-threads 1 --num-pub-threads 1

Interprocess means that the publisher and subscriber are in different processes. To test interprocess communication, two instances of the performance_test must be run, e.g.

    # Start the subscriber first
    perf_test <options> --num-sub-threads 1 --num-pub-threads 0 &
    sleep 1  # give the subscriber time to finish initializing
    perf_test <options> --num-sub-threads 0 --num-pub-threads 1
    

On a distributed system, testing latency is difficult, because the clocks are probably not perfectly synchronized between the two devices. To work around this, the performance_test tool supports relay mode, which allows for a round-trip style of communication:

# On the main machine
perf_test <options> --roundtrip-mode Main

# On the relay machine:
perf_test <options> --roundtrip-mode Relay

In relay mode, the Main machine sends messages to the Relay machine, which immediately sends the messages back. The Main machine receives the relayed message, and reports the round-trip latency. Therefore, the reported latency will be roughly double the latency compared to the latency reported in non-relay mode.

Single machine, single thread

An intra-thread configuration is experimentally supported, in which a publisher and subscriber both operate in the same thread. The publisher writes a messages, and the subscriber immediately takes it.

perf_test <options> -e INTRA_THREAD

Notes:

This is only available when zero copy transfer is enabled
This requires exactly one publisher and one subscriber
This is not compatible with roundtrip mode

Middleware plugins

The performance test tool can measure the performance of a variety of communication solutions from different vendors. In this case there is no rclcpp or rmw layer overhead over the publisher and subscriber routines.

The performance_test tool implements an executor that runs the publisher(s) and/or the subscriber(s) in their own thread.

The following plugins are currently implemented:

Eclipse Cyclone DDS

Eclipse Cyclone DDS 0.9.0b1
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
Communication plugin: -c CycloneDDS
Docker file: Dockerfile.CycloneDDS
Available transports:
- Cyclone DDS zero copy requires RouDi to be running. - | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP |

Eclipse Cyclone DDS C++ binding

Eclipse Cyclone DDS C++ bindings 0.9.0b1
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS_CXX
Communication plugin: -c CycloneDDS-CXX
Docker file: Dockerfile.CycloneDDS-CXX
Available transports:
- Cyclone DDS zero copy requires the RouDi to be running. - | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP |

Eclipse iceoryx

iceoryx (latest master as of Feb 13)
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=ICEORYX
Communication plugin: -c iceoryx
Docker file: Dockerfile.iceoryx
The iceoryx plugin is not a DDS implementation.
- The DDS-specific options (such as domain ID, durability, and reliability) do not apply.
To run with the iceoryx plugin, RouDi must be running.
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |———–|———————|———————————–| | LoanedSamples | LoanedSamples | Not supported by performance_test |

eProsima Fast DDS

FastDDS 2.6.2
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=FASTDDS
Communication plugin: -c FastRTPS
Docker file: Dockerfile.FastDDS
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), LoanedSamples (--zero-copy) | SHMEM (default), LoanedSamples (--zero-copy) | UDP |

OCI OpenDDS

OpenDDS 3.13.2
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=OPENDDS
Communication plugin: -c OpenDDS
Docker file: Dockerfile.OpenDDS
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | TCP | TCP | TCP |

RTI Connext DDS

RTI Connext DDS 5.3.1+
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CONNEXTDDS
Communication plugin: -c ConnextDDS
Docker file: Not available
A license is required
You need to source an RTI Connext DDS environment.
- If RTI Connext DDS was installed with ROS 2 (Linux only):
  - source /opt/rti.com/rti_connext_dds-5.3.1/setenv_ros2rti.bash
- If RTI Connext DDS is installed separately, you can source the following script to set the environment:
  - source <connextdds_install_path>/resource/scripts/rtisetenv_<arch>.bash
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |

RTI Connext DDS Micro

Connext DDS Micro 3.0.3
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CONNEXTDDSMICRO
Communication plugin: -c ConnextDDSMicro
Docker file: Not available
A license is required
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |

Framework plugins

The performance_test tool can also measure the end-to-end latency of a framework. In this case, the executor of the framework is used to run the publisher(s) and/or the subscriber(s). The potential overhead of the rclcpp or rmw layer is measured.

ROS 2

The performance test tool can also measure the performance of a variety of RMW implementations, through the ROS 2 rclcpp::publisher and rclcpp::subscriber API.

ROS 2 rclcpp::publisher and rclcpp::subscriber
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=ROS2 (default)
Communication plugin:
- Callback with Single Threaded Executor: -c rclcpp-single-threaded-executor
- Callback with Static Single Threaded Executor: -c rclcpp-static-single-threaded-executor
- rclcpp::WaitSet: -c rclcpp-waitset
Docker file: Dockerfile.rclcpp
Available underlying RMW implementations:
- ROS 2 Rolling is pre-configured to use rmw_fastrtps_cpp
- Follow these instructions to use a different RMW implementation
Available transports: depends on underlying RMW implementation
- LoanedSamples are available (--zero-copy) for ROS_DISTRO = foxy and above

Apex.OS

Apex.OS
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=APEX_OS
- It is also required to source /opt/ApexOS/setup.bash instead of a ROS 2 distribution
Communication plugin: -c ApexOSPollingSubscription
Docker file: Not available
Available underlying RMW implementations: rmw_apex_middleware
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero_copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero_copy) | UDP |

Analyze the results

After an experiment is run with the -l flag, a log file is recorded. Both CSV and JSON formats are supported. It is possible to add custom data to the log file by setting theAPEX_PERFORMANCE_TEST environment variable before running an experiment, e.g.

# JSON format
export APEX_PERFORMANCE_TEST="
{
\"My Version\": \"1.0.4\",
\"My Image Version\": \"5.2\",
\"My OS Version\": \"Ubuntu 16.04\"
}
"

Plot the results

To plot the results in the JSON or CSV log files, see the plotter README.

Architecture

Apex.AI’s Performance Testing in ROS 2 white paper (available here) describes how to design a fair and unbiased performance test, and is the basis for this project.

Each middleware has a different API. Thanks to the Plugin abstraction, the core logic of setting up and running an experiment is completely decoupled from the implementation details of sending and receiving individual messages.

Exactly one Plugin implementation is selected at build time. The design is similar to the Abstract Factory pattern. performance_test declares, but does not define, a static factory method in the PluginFactory class. Each middleware provides a definition for this factory method to create a concrete Plugin implementation, and perf_test calls this factory method directly.

An example plugin is available here.

Performance optimizations

On linux-based platforms, perf_test writes 0 to /dev/cpu_dma_latency and holds open the file handle, which will prevent the CPU from entering any idle states for the duration of the experiment. This should result in lower message latency and lower variance in that latency.

Future extensions and limitations

Communication frameworks like DDS have a huge amount of settings. This tool only allows the most common QOS settings to be configured. The other QOS settings are hardcoded in the application.
Only one publisher per topic is allowed, because the data verification logic does not support matching data to the different publishers.
Some communication plugins can get stuck in their internal loops if too much data is received. Figuring out ways around such issues is one of the goals of this tool.
FastRTPS wait-set does not support timeouts which can lead to the receiving not aborting. In that case the performance test must be manually killed.
Using Connext DDS Micro INTRA transport with reliable QoS and history kind set to keep_all is not supported with Connext Micro. Set keep-last as QoS history kind always when using reliable.

Possible additional communication which could be implemented are:

Raw UDP communication

Building with limited resources

When building this tool, the compiler must perform a lot of template expansion. This can be overwhelming for a system with a low-power CPU or limited RAM. There are some additional CMake options which can reduce the system load during compilation:

This tool includes many different message types, each with many different sizes. Reduce the number of messages, and thus the compilation load, by disabling one or more message types. For example, to build without PointCloud messages, add -DENABLE_MSGS_POINDCLOUD=OFF to the --cmake-args. The message types, and their options for enabling/disabling, can be found here.

CHANGELOG

Changelog for package performance_test

X.Y.Z (YYYY/MM/DD)

2.3.0 (2024/09/24)

Removed

Moved apex_performance_plotter to its own package here

2.2.0 (2024/05/15)

Added

performance_test can be built with ROS 2 Iron and Jazzy
Changed
Renamed the --dds-domain_id CLI arg to --dds-domain-id
When --dds-domain-id is unspecified, fall back to the ROS_DOMAIN_ID environment variable
--zero-copy has been separated into two flags:
- --shared-memory: Enable shared-memory transfer in the plugin. This is meant to replace the need to manually set runtime flags via CYCLONEDDS_URI, APEX_MIDDLEWARE_SETTINGS, etc.
- --loaned-samples: When publishing messages in the plugin, borrow loaned samples instead of publishing by copy
- --zero-copy is now an alias for --shared-memory --loaned-samples
- Supported plugins include:
  - -c CycloneDDS
  - -c CycloneDDS-CXX
  - -c ApexOSPollingSubscription
  - -c rclcpp-* with RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
  - -c rclcpp-* with RMW_IMPLEMENTATION=rmw_fastrtps_cpp

2.1.0 (2024/04/17)

Added

Add new function prepare() to the Publisher and Subscriber API, intended to allow participant discovery without blocking the main thread
Changed
Change the default --history arg from KEEP_ALL to KEEP_LAST
Change the default --history-depth arg from 1000 to 16
If --expected-num-pubs is unspecified, set it to the same value as -p
If --expected-num-subs is unspecified, set it to the same value as -s
Fixed
Removed an unused variable to fix a Clang build
Remove unused variable names in the Plugin abstract class
Fix a potential lockup in PublisherTask on QNX

2.0.0 (2024/03/19)

Added

Add experimental bazel support
- bazel build //performance_test --//:plugin_implementation=//path/to/a/plugin
Add a rudimentary socket-based plugin for testing the bazel support
- bazel run //performance_test --//:plugin_implementation=//performance_test/plugins/demo:demo_plugin -- --help
  Changed
Instead of enabling/disabling each plugin, you select exactly one with a CMake string option, for example:
- colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=ROS2
Renamed the --communication CLI arg to --communicator. The short -c is unchanged.
Removed
Removed the deprecated CLI flags for QOS settings:
- Instead of --reliable, use --reliability RELIABLE
- Instead of --transient, use --durability TRANSIENT_LOCAL
- Instead of --keep-last, use --history KEEP_LAST
Removed the obsolete BoundedSequenceFlat messages
Removed the superfluous --msg-list CLI flag. The --help message already lists the available messages.
Fixed
Update the Apex.OS Runner to use executor_runner::deferred instead of executor_runner::deferred_tag()
Ensure that the first few published samples are sent at the expected rate

1.5.2 (YYYY/MM/DD)

Added

--prevent-cpu-idle is available on QNX
Changed
JSON log files will contain all values in the APEX_PERFORMANCE_TEST dictionary, instead of the five specific values used previously
Switch to build as C++17 by default
Fixed
Zero copy transfer is again enabled for the rclcpp publisher

1.5.0 (2023/06/14)

Added

New CLI switch --prevent-cpu-idle (linux only). When specified, perf_test will use /dev/cpu_dma_latency to request that the CPU not enter any sleep states, to potentially give more consistent results
Some smaller Array messages, down to 32 bits
Added support to the FastDDS plugin for bounded and unbounded sequences
Changed
Update the README to better explain how to use this tool with Apex.OS
In the Runner, allocate the AnalysisResults on the stack instead of using shared_ptr
Subscriber methods accept a callback parameter, instead of returning a vector of results, to reduce heap usage
Refactored the interaction between SubscriberStats and AnalysisResult to remove the need for a std::vector of latency samples, to reduce heap usage
Adjusted the Array message sizes to make the name match the contents
Updated apex_os_communicator to use the new zero-copy API

1.4.2 (2023/03/15)

Added

Added perfplot support for JSON log files
Changed
Migrate the Apex.OS target to use rosidl_get_typesupport_target
Preallocate the JSON logger’s string buffer to prevent reallocations after the experiment begins

1.4.1 (2023/02/23)

Changed

Updated the iceoryx plugin to the latest master as of Feb 13

1.4.0 (2023/02/20)

Added

New message type BoundedSequenceFlat
- This is a BoundedSequence with the @flat annotation
- Sizes range from 1kB to 8MB, like Array and BoundedSequence
  Changed
Messages of different types can be optionally included via CMake args:
- -DENABLE_MSGS_ARRAY (default ON)
- -DENABLE_MSGS_STRUCT (default ON)
- -DENABLE_MSGS_POINT_CLOUD (default ON)
- -DENABLE_MSGS_BOUNDED_SEQUENCE (default OFF)
- -DENABLE_MSGS_BOUNDED_SEQUENCE_FLAT (default OFF)
- -DENABLE_MSGS_UNBOUNDED_SEQUENCE (default OFF)
- -DENABLE_MSGS_ALL (default OFF)
  - when ON, overrides the other defaults to ON
  - you can still optionally exclude some messages by explicitly setting them to OFF
    Removed
Removed a few messages:
- Range
- RadarTrack
- RadarDetection
- NavSatFix
  Fixed
In all cases, including loaned messages, capture the timestamp as the last step of initializing the message

1.3.7 (2023/01/04)

1.3.6 (2023/01/03)

Fixed

Set the correct IDL_GEN_ROOT for rclcpp plugins

1.3.5 (2022/12/05)

Fixed

Exit cleanly when a publisher process terminates before a subscriber process

1.3.4 (2022/11/28)

Changed

Updated Apex.OS plugins to use the unified LoanedSample::data()

1.3.3 (2022/11/28)

Fixed

Implement the missing take() method in ApexOSPollingSubscriptionSubscriber

1.3.2 (2022/11/21)

Fixed

Capture the this pointer in the lambda in the iceoryx publisher

1.3.1 (2022/11/21)

Added

New Apex.OS plugin, compatible with the ThreadedRunners
- The INTER_THREAD and INTRA_THREAD execution strategies, combined with -c ApexOSPollingSubscription, will use the ThreadedRunner instances
- The new APEX_SINGLE_EXECUTOR execution strategy will add all publishers and subscribers to a single Apex.OS Executor
- The new APEX_EXECUTOR_PER_COMMUNICATOR execution strategy will add each publisher and each subscriber to its own Apex.OS Executor instance
- The new APEX_CHAIN execution strategy will add a publisher and subscriber as a chain of nodes to an Apex.OS Executor
  Changed
Refactored FastRTPS communicator plugin:
- Uses DDS compliant API
- Code generator updated
- Implementation for publish_loaned()
- Dockerfile improvements
  Removed
CLI arg --disable-async. Synchronous / asynchronous publishing should be configured externally depending on the communication mean used.

1.3.0 (2022/08/25)

Added

New execution strategy option:
- The default -e INTER_THREAD runs each publisher and subscriber in its own separate thread, which matches the previous behavior
- A new -e INTRA_THREAD, which runs a single publisher and subscriber in the same thread. The publisher writes, and the subscriber immediately takes it
- For Apex.OS specifically, some optimized execution strategies which use the proprietary Apex.OS executor
  Changed
Significantly refactored the communicator plugins:
- Each plugin is split into an implementation of a Publisher and a Subscriber, instead of a single Communicator
- The plugin is no longer responsible for managing the metrics, such as sample count, lost samples, and latency
- The plugin does not require any special logic to support roundtrip mode
- It is safe for the plugins to initialize their data writers and readers at construction time, instead of delaying the initialization to the first call of publish() or update_subscription()
- Split publish() into publish_copy() and publish_loaned()
Significantly refactored the runner framework:
- The runner framework is responsible for the experiment metrics
- It manages the roundtrip mode logic
- It is extensible for different execution strategies or thread configurations
The iceoryx plugin now uses the untyped API, for improved performance

1.2.1 (2022/06/30)

Fixed

Capture the timestamp as soon as a message is received, instead of just before storing the metrics, to reduce the reported latency to a more correct value

1.2.0 (2022/06/28)

Changed

The CLI arguments for specifying the output type have changed:
- For console output, updated every second, add --print-to-console
- For file output, use --logfile my_file.csv or --logfile my_file.json
  - The type will be deduced from the file name
- If neither of these options is specified, then a warning will print, and the experiment will still run
The linter configurations are now configured locally. This means that the output of colcon test should be the same no matter the installed ROS distribution.
The --zero-copy arg is now valid even if the publisher and subscriber(s) are in the same process
Removed
The publisher and subscriber loop reserve metrics are no longer recorded or reported
Fixed
CPU usage will no longer be stuck at 0

Removed

The pub/sub loop reserve time metrics

1.1.2 (2022/06/08)

Changed

Use steady_clock for all platforms, including QNX QOS

1.1.1 (2022/06/07)

Changed

Significant refactor to simplify the analysis pipeline
Fixed
Add some missing definitions when Apex.OS is enabled, but the rclcpp plugins are disabled

1.1.0 (2022/06/02)

Added

New Apex.OS Polling Subscription plugin
Compatibility with ROS 2 Humble

1.0.0 (2022/05/12)

Added

More expressive perf_test CLI args for QOS settings
A plugin for Cyclone DDS with C++ bindings v0.9.0b1
Changed
CLI args for QOS settings:
- --reliability <RELIABLE|BEST_EFFORT>
- --durability <TRANSIENT_LOCAL|VOLATILE>
- --history <KEEP_LAST|KEEP_ALL>
master branch is compatible with many ROS 2 distributions:
- dashing
- eloquent
- foxy
- galactic
- rolling
  Deprecated
CLI flags for QOS settings:
- --reliable
- --transient
- --keep-last
  Removed
The branches for specific ROS 2 distributions have been deleted
Fixed
CI jobs and Dockerfiles are decoupled from the middleware bundled with the ROS 2 distribution

Wiki Tutorials

This package does not provide any links to tutorials in it's rosindex metadata. You can check on the ROS Wiki Tutorials page for the package.

Package Dependencies

Deps	Name
	rclcpp
	ros_environment
	ament_cmake
	rosidl_default_generators
	rmw_implementation
	rosidl_default_runtime
	ament_cmake_gtest
	ament_lint_auto
	ament_lint_common

System Dependencies

Name
git

Dependant Packages

Name	Deps
performance_report

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `performance_test` at Robotics Stack Exchange

performance_test package from ros2-performance repo

composition_benchmark irobot_benchmark irobot_interfaces_plugin memory_benchmark performance_metrics performance_test performance_test_examples performance_test_factory performance_test_msgs performance_test_plugin_cmake

Package Summary

Tags	No category tags.
Version	0.2.0
License	BSD 3.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Description	Framework to evaluate peformance of ROS 2
Checkout URI	https://github.com/irobot-ros/ros2-performance.git
VCS Type	git
VCS Version	rolling
Last Updated	2025-04-10
Dev Status	UNKNOWN
CI status	No Continuous Integration
Released	UNRELEASED
Tags	benchmark performance cpp ros2
Contributing	Help Wanted (0) Good First Issues (0) Pull Requests to Review (0)

Package Description

Classes for generating ROS2 nodes and measuring their performance

Additional Links

No additional links.

Maintainers

Alberto Soragna
Juan Oxoby

Authors

Alberto Soragna

ROS2 Performance Test

Create a sample application

This package defines the performance_test::PerformanceNode class. This class inherits from rclcpp::Node and provides APIs for easily adding any number of publishers, subscriptions, clients and servers to the node.

Each of the PerformanceNode::add_periodic_publisher, PerformanceNode::add_subscriber, etc. methods are based on a template in order to allow the creation of systems with different types of messages.

#include "performance_test/performance_node_base.hpp"
auto sub_node = std::make_shared<performance_test::PerformanceNode<rclcpp::Node>>("my_sub_node");
sub_node->add_subscriber<performance_test_msgs::msg::Stamped10b>("my_topic_name");

This snippet of code will create a new node called my_sub_node and this node will have a subscriber on my_topic_name where are published messages of type 10b, i.e. a message with an header field and a statically allocated array of size 10 bytes.

Similarly you can create a second node which publishes data periodically.

#include "performance_test/performance_node_base.hpp"
auto pub_node = std::make_shared<performance_test::PerformanceNode<rclcpp::Node>>("my_pub_node");
pub_node->add_periodic_publisher("my_topic_name", std::chrono::milliseconds(100));

Note: you can create nodes with any number of publishers/subscribers/clients/servers, with no constraints on their types.

Once you have defined all your nodes, you have to start them. We provide the performance_test::System class for managing the nodes execution.

#include "performance_test/system.hpp"
auto experiment_duration = std::chrono::seconds(10);
System ros2_system();
ros2_system.add_node(pub_node);
ros2_system.add_node(sub_node);
ros2_system.spin(experiment_duration);

This is enough for running your nodes. While they communicate, they will internally record latency and reliability statistics. At the end of the experiment, you can use the performance_test::System API to print them.

ros2_system.print_latency_stats();
ros2_system.save_latency_stats_to_file("my_output_file.txt");

Additionally, you can also monitor other type of statistics: resource usage and events. Resource usage consists of CPU and several RAM metrics (heap, RSS, VRT). Events on the other hand are for example the end of the discovery phase and late or lost messages.

In order to enable monitoring or resource usage you have to add the following snippet to your code:

#include "performance_metrics/resource_usage_logger.hpp"
performance_metrics::ResourceUsageLogger ru_logger("resource_usage_output.txt");
auto sampling_period = std::chrono::milliseconds(500);
ru_logger.start(sampling_period);

/**
 * your code to monitor goes here
 */

ru_logger.stop();

Visualizing the results

This repository contains Python scripts useful for plotting and aggregating data from multiple CSV files. You can find them under scripts/plot_scripts.

Note that these scripts require Python3 and should be run on your laptop once you have copied there the experiment results from the embedded platform.

There are two different types of CSV currently produced by our experiments. For this reason we have two different scripts to plot them.

Latency and reliability: scripts/plot_scripts/latency_reliability_plot.py
CPU and Memory: scripts/plot_scripts/cpu_ram_plot.py

The two scripts share most of the code, located under scripts/plot_scripts/plot_common.py. So let’s first describe some common features.

You have to use command line arguments to tell the script which CSV files you want to load and which data you want to plot.

For a full description of the command line options, as well as an up-to-date list of the accpted values, use

python3 scripts/plot_scripts/cpu_ram_plot.py --help

dir_paths: This is a mandatory positional argument which requires one or more paths to the CSV files. Note that it is possible to pass paths of files, directories or a mixture of them.
x: the metric to show on the X axis.
y: the metric(s) to show on the Y axis.
y2: the metric(s) to show on the secon Y axis (optional).
separator: a value according to which you want to separate your data creating different plot lines on the same axes (optional).

Starting from the easy things, you may want to plot the average CPU usage of running a ROS2 system.

python3 scripts/plot_scripts/cpu_ram_plot.py path_to_a_csv_file --x time --y cpu

You can add a second metric on a second Y axis using the --y2 option. For example you may want to check also the pyhsical memory usage

python3 scripts/plot_scripts/cpu_ram_plot.py path_to_a_csv_file --x time --y cpu --y2 rss

It is possible to have more than a single value plotted against the same axis. For example you may want to check both virtual as well as physical memory usage. Note that this is just an example, the two metrics should have values with a similar magnitude or the resulting plot will be difficult to understand.

python3 scripts/plot_scripts/cpu_ram_plot.py path_to_a_csv_file --x time --y cpu --y2 rss vsz

In all these examples, you could have also specified more than a single CSV file. The results would have been averaged.

python3 scripts/plot_scripts/cpu_ram_plot.py path_to_csv_file1 path_to_csv_file2 --x time --y cpu --y2 rss vsz

Now let’s assume that you want to compare data coming from different experiments, i.e. for different values of number of nodes, frequencies or message sizes. This can be done in the same plot using the --separator option.

export MAX_PUBLISHERS=1
export MAX_SUBSCRIBERS=5
export NUM_EXPERIMENTS=5
export MSG_TYPES=stamped10b
export PUBLISH_FREQUENCIES=100
export DIR_PATH=my_experiment
bash scripts/pub_sub_ros2.sh

python3 scripts/plot_scripts/cpu_ram_plot.py results/my_experiment/cpu_ram_* --x time --y rss --separator subs

The output will be a plot with 5 different “separated” lines, one for each possible number of subscribers. Each line will be the average of the 5 requested experiments.

Sometimes you may want to compare the content of different experiments directories, for example run with different DDS or ROS2 distributions. This can still be done using the --separator option and setting it to directory. This means that csv files will be divided according to the directory in which they are stored.

python3 scripts/plot_scripts/cpu_ram_plot.py results/my_experiment1/cpu_ram_* results/my_experiment2/cpu_ram_* --x time --y rss --separator directory

You can also specify more than one value for the --separator option, however, the resulting plot may become quite clogged.

NOTE: The --x time value is the only one which is meaningful to use if you want to plot a single csv file. This value can be used only with the scripts/plot_scripts/cpu_ram_plot.py script because latency and reliability are measured once for each node at the end of the execution so you don’t have instantaneous values.

NOTE: When using the scripts/plot_scripts/cpu_ram_plot.py script, if --x is set to something different than time, you will get only 1 value out of each CSV, i.e. the average of all the lines. Otherwise you will get one value for each line.

Some examples for using the scripts/plot_scripts/latency_reliability_plot.py script.

Plot the average latency for different numbers of subscriber nodes

python3 scripts/plot_scripts/latency_reliability_plot.py path_to_a_csv_directory --x subs --y latency

Separate the values according to the number of publishers

python3 scripts/plot_scripts/latency_reliability_plot.py path_to_a_csv_directory --x subs --y latency --separator pubs

CHANGELOG

No CHANGELOG found.

Wiki Tutorials

This package does not provide any links to tutorials in it's rosindex metadata. You can check on the ROS Wiki Tutorials page for the package.

Package Dependencies

Deps	Name
	ament_cmake
	ament_lint_common
	ament_lint_auto
	ament_cmake_gtest
	ament_cmake_pytest
	ament_cmake_cppcheck
	ament_cmake_cpplint
	ament_cmake_lint_cmake
	ament_cmake_uncrustify
	ament_cmake_xmllint
	rclcpp_lifecycle
	rclcpp
	rclcpp_action
	performance_metrics
	performance_test_msgs

System Dependencies

No direct system dependencies.

Dependant Packages

Name	Deps
performance_report
composition_benchmark
irobot_benchmark
performance_test_examples
performance_test_factory
performance_test_plugin_cmake
buildfarm_perf_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `performance_test` at Robotics Stack Exchange

No version for distro noetic. Known supported distros are highlighted in the buttons above.

No version for distro ardent. Known supported distros are highlighted in the buttons above.

No version for distro bouncy. Known supported distros are highlighted in the buttons above.

No version for distro crystal. Known supported distros are highlighted in the buttons above.

No version for distro eloquent. Known supported distros are highlighted in the buttons above.

No version for distro dashing. Known supported distros are highlighted in the buttons above.

performance_test package from performance_test repo

performance_report performance_test performance_test_ros1_msgs performance_test_ros1_publisher

Package Summary

Tags	No category tags.
Version	2.3.0
License	Apache 2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Description
Checkout URI	https://gitlab.com/ApexAI/performance_test.git
VCS Type	git
VCS Version	master
Last Updated	2025-03-25
Dev Status	MAINTAINED
CI status	No Continuous Integration
Released	RELEASED
Tags	No category tags.
Contributing	Help Wanted (0) Good First Issues (0) Pull Requests to Review (0)

Package Description

Tool to test performance of ROS 2 and DDS data layers and communication.

Additional Links

No additional links.

Maintainers

Apex.AI, Inc.

Authors

No additional authors.

performance_test

[TOC]

The performance_test tool tests latency and other performance metrics of various middleware implementations that support a pub/sub pattern. It is used to simulate non-functional performance of your application.

The performance_test tool allows you to quickly set up a pub/sub configuration, e.g. number of publisher/subscribers, message size, QOS settings, middleware. The following metrics are automatically recorded when the application is running:

latency: corresponds to the time a message takes to travel from a publisher to subscriber. The latency is measured by timestamping the sample when it’s published and subtracting the timestamp (from the sample) from the measured time when the sample arrives at the subscriber (only logged when a subscriber is created)
CPU usage: percentage of the total system wide CPU usage (logged separately for each instance of perf_test)
resident memory: heap allocations, shared memory segments, stack (used for system’s internal work) (logged separately for each instance of perf_test)
sample statistics: number of samples received, sent, and lost per experiment run.

This master branch is compatible with the following ROS 2 versions

rolling
jazzy
iron
humble
galactic
foxy
eloquent
dashing
Apex.OS

How to use this document

Start here for a quick example of building and running the performance_test tool with the Cyclone DDS plugin.
If needed, find more detailed information about building and running
Or, if the quick example is good enough, skip ahead to the list of supported middleware plugins to learn how to test a specific middleware implementation.
Check out the tools for visualizing the results
If desired, read about the design and architecture of the tool.

Example

This example shows how to test the non-functional performance of the following configuration:

Option	Value
Plugin	Cyclone DDS
Message type	Array1k
Publishing rate	100Hz
Topic name	test_topic
Duration of the experiment	30s
Number of publisher(s)	1 (default)
Number of subscriber(s)	1 (default)

Install ROS 2
Install Cyclone DDS to /opt/cyclonedds
Build performance_test with the CMake build flag for Cyclone DDS:

    source /opt/ros/rolling/setup.bash
    cd ~/perf_test_ws
    colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
    source ./install/setup.bash
    

Run with the communication plugin option for Cyclone DDS:

mkdir experiment
./install/performance_test/lib/performance_test/perf_test --communication CycloneDDS
                                                          --msg Array1k
                                                          --rate 100
                                                          --topic test_topic
                                                          --max-runtime 30
                                                          --logfile experiment/log.csv

At the end of the experiment, a CSV log file will be generated in the experiment folder with a name that starts with log.

Building the performance_test tool

For a simple example, see Dockerfile.rclcpp.

The performance_test tool is structured as a ROS 2 package, so colcon is used to build it. Therefore, you must source a ROS 2 installation:

source /opt/ros/rolling/setup.bash

Select a middleware plugin from this list. Then build the performance_test tool with the selected middleware:

mkdir -p ~/perf_test_ws/src
cd ~/perf_test_ws/src
git clone https://gitlab.com/ApexAI/performance_test.git
cd ..
# At this stage, you need to choose which middleware you want to use
# The list of available flags is described in the middleware plugins section
# Square brackets denote optional arguments, like in the Python documentation.
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release -DPERFORMANCE_TEST_PLUGIN=<plugin>
source install/setup.bash

Running an experiment

The performance_test experiments are run through the perf_test executable. To find the available settings, run with --help (note the required and default arguments):

~/perf_test_ws$ ./install/performance_test/lib/performance_test/perf_test --help

The -c argument should match the selected middleware plugin from the build phase.
The --msg argument should be one of the supported message types, which are shown in the --help output.

Single machine or distributed system?

Based on the configuration you want to test, the usage of the performance_test tool differs. The different possibilities are explained below.

For running tests on a single machine, you can choose between the following options:

Intraprocess means that the publisher and subscriber threads are in the same process.

    perf_test <options> --num-sub-threads 1 --num-pub-threads 1

Interprocess means that the publisher and subscriber are in different processes. To test interprocess communication, two instances of the performance_test must be run, e.g.

    # Start the subscriber first
    perf_test <options> --num-sub-threads 1 --num-pub-threads 0 &
    sleep 1  # give the subscriber time to finish initializing
    perf_test <options> --num-sub-threads 0 --num-pub-threads 1
    

On a distributed system, testing latency is difficult, because the clocks are probably not perfectly synchronized between the two devices. To work around this, the performance_test tool supports relay mode, which allows for a round-trip style of communication:

# On the main machine
perf_test <options> --roundtrip-mode Main

# On the relay machine:
perf_test <options> --roundtrip-mode Relay

In relay mode, the Main machine sends messages to the Relay machine, which immediately sends the messages back. The Main machine receives the relayed message, and reports the round-trip latency. Therefore, the reported latency will be roughly double the latency compared to the latency reported in non-relay mode.

Single machine, single thread

An intra-thread configuration is experimentally supported, in which a publisher and subscriber both operate in the same thread. The publisher writes a messages, and the subscriber immediately takes it.

perf_test <options> -e INTRA_THREAD

Notes:

This is only available when zero copy transfer is enabled
This requires exactly one publisher and one subscriber
This is not compatible with roundtrip mode

Middleware plugins

The performance test tool can measure the performance of a variety of communication solutions from different vendors. In this case there is no rclcpp or rmw layer overhead over the publisher and subscriber routines.

The performance_test tool implements an executor that runs the publisher(s) and/or the subscriber(s) in their own thread.

The following plugins are currently implemented:

Eclipse Cyclone DDS

Eclipse Cyclone DDS 0.9.0b1
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
Communication plugin: -c CycloneDDS
Docker file: Dockerfile.CycloneDDS
Available transports:
- Cyclone DDS zero copy requires RouDi to be running. - | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP |

Eclipse Cyclone DDS C++ binding

Eclipse Cyclone DDS C++ bindings 0.9.0b1
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS_CXX
Communication plugin: -c CycloneDDS-CXX
Docker file: Dockerfile.CycloneDDS-CXX
Available transports:
- Cyclone DDS zero copy requires the RouDi to be running. - | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP |

Eclipse iceoryx

iceoryx (latest master as of Feb 13)
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=ICEORYX
Communication plugin: -c iceoryx
Docker file: Dockerfile.iceoryx
The iceoryx plugin is not a DDS implementation.
- The DDS-specific options (such as domain ID, durability, and reliability) do not apply.
To run with the iceoryx plugin, RouDi must be running.
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |———–|———————|———————————–| | LoanedSamples | LoanedSamples | Not supported by performance_test |

eProsima Fast DDS

FastDDS 2.6.2
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=FASTDDS
Communication plugin: -c FastRTPS
Docker file: Dockerfile.FastDDS
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), LoanedSamples (--zero-copy) | SHMEM (default), LoanedSamples (--zero-copy) | UDP |

OCI OpenDDS

OpenDDS 3.13.2
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=OPENDDS
Communication plugin: -c OpenDDS
Docker file: Dockerfile.OpenDDS
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | TCP | TCP | TCP |

RTI Connext DDS

RTI Connext DDS 5.3.1+
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CONNEXTDDS
Communication plugin: -c ConnextDDS
Docker file: Not available
A license is required
You need to source an RTI Connext DDS environment.
- If RTI Connext DDS was installed with ROS 2 (Linux only):
  - source /opt/rti.com/rti_connext_dds-5.3.1/setenv_ros2rti.bash
- If RTI Connext DDS is installed separately, you can source the following script to set the environment:
  - source <connextdds_install_path>/resource/scripts/rtisetenv_<arch>.bash
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |

RTI Connext DDS Micro

Connext DDS Micro 3.0.3
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CONNEXTDDSMICRO
Communication plugin: -c ConnextDDSMicro
Docker file: Not available
A license is required
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |

Framework plugins

The performance_test tool can also measure the end-to-end latency of a framework. In this case, the executor of the framework is used to run the publisher(s) and/or the subscriber(s). The potential overhead of the rclcpp or rmw layer is measured.

ROS 2

The performance test tool can also measure the performance of a variety of RMW implementations, through the ROS 2 rclcpp::publisher and rclcpp::subscriber API.

ROS 2 rclcpp::publisher and rclcpp::subscriber
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=ROS2 (default)
Communication plugin:
- Callback with Single Threaded Executor: -c rclcpp-single-threaded-executor
- Callback with Static Single Threaded Executor: -c rclcpp-static-single-threaded-executor
- rclcpp::WaitSet: -c rclcpp-waitset
Docker file: Dockerfile.rclcpp
Available underlying RMW implementations:
- ROS 2 Rolling is pre-configured to use rmw_fastrtps_cpp
- Follow these instructions to use a different RMW implementation
Available transports: depends on underlying RMW implementation
- LoanedSamples are available (--zero-copy) for ROS_DISTRO = foxy and above

Apex.OS

Apex.OS
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=APEX_OS
- It is also required to source /opt/ApexOS/setup.bash instead of a ROS 2 distribution
Communication plugin: -c ApexOSPollingSubscription
Docker file: Not available
Available underlying RMW implementations: rmw_apex_middleware
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero_copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero_copy) | UDP |

Analyze the results

After an experiment is run with the -l flag, a log file is recorded. Both CSV and JSON formats are supported. It is possible to add custom data to the log file by setting theAPEX_PERFORMANCE_TEST environment variable before running an experiment, e.g.

# JSON format
export APEX_PERFORMANCE_TEST="
{
\"My Version\": \"1.0.4\",
\"My Image Version\": \"5.2\",
\"My OS Version\": \"Ubuntu 16.04\"
}
"

Plot the results

To plot the results in the JSON or CSV log files, see the plotter README.

Architecture

Apex.AI’s Performance Testing in ROS 2 white paper (available here) describes how to design a fair and unbiased performance test, and is the basis for this project.

Each middleware has a different API. Thanks to the Plugin abstraction, the core logic of setting up and running an experiment is completely decoupled from the implementation details of sending and receiving individual messages.

Exactly one Plugin implementation is selected at build time. The design is similar to the Abstract Factory pattern. performance_test declares, but does not define, a static factory method in the PluginFactory class. Each middleware provides a definition for this factory method to create a concrete Plugin implementation, and perf_test calls this factory method directly.

An example plugin is available here.

Performance optimizations

On linux-based platforms, perf_test writes 0 to /dev/cpu_dma_latency and holds open the file handle, which will prevent the CPU from entering any idle states for the duration of the experiment. This should result in lower message latency and lower variance in that latency.

Future extensions and limitations

Communication frameworks like DDS have a huge amount of settings. This tool only allows the most common QOS settings to be configured. The other QOS settings are hardcoded in the application.
Only one publisher per topic is allowed, because the data verification logic does not support matching data to the different publishers.
Some communication plugins can get stuck in their internal loops if too much data is received. Figuring out ways around such issues is one of the goals of this tool.
FastRTPS wait-set does not support timeouts which can lead to the receiving not aborting. In that case the performance test must be manually killed.
Using Connext DDS Micro INTRA transport with reliable QoS and history kind set to keep_all is not supported with Connext Micro. Set keep-last as QoS history kind always when using reliable.

Possible additional communication which could be implemented are:

Raw UDP communication

Building with limited resources

When building this tool, the compiler must perform a lot of template expansion. This can be overwhelming for a system with a low-power CPU or limited RAM. There are some additional CMake options which can reduce the system load during compilation:

This tool includes many different message types, each with many different sizes. Reduce the number of messages, and thus the compilation load, by disabling one or more message types. For example, to build without PointCloud messages, add -DENABLE_MSGS_POINDCLOUD=OFF to the --cmake-args. The message types, and their options for enabling/disabling, can be found here.

CHANGELOG

Changelog for package performance_test

X.Y.Z (YYYY/MM/DD)

2.3.0 (2024/09/24)

Removed

Moved apex_performance_plotter to its own package here

2.2.0 (2024/05/15)

Added

performance_test can be built with ROS 2 Iron and Jazzy
Changed
Renamed the --dds-domain_id CLI arg to --dds-domain-id
When --dds-domain-id is unspecified, fall back to the ROS_DOMAIN_ID environment variable
--zero-copy has been separated into two flags:
- --shared-memory: Enable shared-memory transfer in the plugin. This is meant to replace the need to manually set runtime flags via CYCLONEDDS_URI, APEX_MIDDLEWARE_SETTINGS, etc.
- --loaned-samples: When publishing messages in the plugin, borrow loaned samples instead of publishing by copy
- --zero-copy is now an alias for --shared-memory --loaned-samples
- Supported plugins include:
  - -c CycloneDDS
  - -c CycloneDDS-CXX
  - -c ApexOSPollingSubscription
  - -c rclcpp-* with RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
  - -c rclcpp-* with RMW_IMPLEMENTATION=rmw_fastrtps_cpp

2.1.0 (2024/04/17)

Added

Add new function prepare() to the Publisher and Subscriber API, intended to allow participant discovery without blocking the main thread
Changed
Change the default --history arg from KEEP_ALL to KEEP_LAST
Change the default --history-depth arg from 1000 to 16
If --expected-num-pubs is unspecified, set it to the same value as -p
If --expected-num-subs is unspecified, set it to the same value as -s
Fixed
Removed an unused variable to fix a Clang build
Remove unused variable names in the Plugin abstract class
Fix a potential lockup in PublisherTask on QNX

2.0.0 (2024/03/19)

Added

Add experimental bazel support
- bazel build //performance_test --//:plugin_implementation=//path/to/a/plugin
Add a rudimentary socket-based plugin for testing the bazel support
- bazel run //performance_test --//:plugin_implementation=//performance_test/plugins/demo:demo_plugin -- --help
  Changed
Instead of enabling/disabling each plugin, you select exactly one with a CMake string option, for example:
- colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=ROS2
Renamed the --communication CLI arg to --communicator. The short -c is unchanged.
Removed
Removed the deprecated CLI flags for QOS settings:
- Instead of --reliable, use --reliability RELIABLE
- Instead of --transient, use --durability TRANSIENT_LOCAL
- Instead of --keep-last, use --history KEEP_LAST
Removed the obsolete BoundedSequenceFlat messages
Removed the superfluous --msg-list CLI flag. The --help message already lists the available messages.
Fixed
Update the Apex.OS Runner to use executor_runner::deferred instead of executor_runner::deferred_tag()
Ensure that the first few published samples are sent at the expected rate

1.5.2 (YYYY/MM/DD)

Added

--prevent-cpu-idle is available on QNX
Changed
JSON log files will contain all values in the APEX_PERFORMANCE_TEST dictionary, instead of the five specific values used previously
Switch to build as C++17 by default
Fixed
Zero copy transfer is again enabled for the rclcpp publisher

1.5.0 (2023/06/14)

Added

New CLI switch --prevent-cpu-idle (linux only). When specified, perf_test will use /dev/cpu_dma_latency to request that the CPU not enter any sleep states, to potentially give more consistent results
Some smaller Array messages, down to 32 bits
Added support to the FastDDS plugin for bounded and unbounded sequences
Changed
Update the README to better explain how to use this tool with Apex.OS
In the Runner, allocate the AnalysisResults on the stack instead of using shared_ptr
Subscriber methods accept a callback parameter, instead of returning a vector of results, to reduce heap usage
Refactored the interaction between SubscriberStats and AnalysisResult to remove the need for a std::vector of latency samples, to reduce heap usage
Adjusted the Array message sizes to make the name match the contents
Updated apex_os_communicator to use the new zero-copy API

1.4.2 (2023/03/15)

Added

Added perfplot support for JSON log files
Changed
Migrate the Apex.OS target to use rosidl_get_typesupport_target
Preallocate the JSON logger’s string buffer to prevent reallocations after the experiment begins

1.4.1 (2023/02/23)

Changed

Updated the iceoryx plugin to the latest master as of Feb 13

1.4.0 (2023/02/20)

Added

New message type BoundedSequenceFlat
- This is a BoundedSequence with the @flat annotation
- Sizes range from 1kB to 8MB, like Array and BoundedSequence
  Changed
Messages of different types can be optionally included via CMake args:
- -DENABLE_MSGS_ARRAY (default ON)
- -DENABLE_MSGS_STRUCT (default ON)
- -DENABLE_MSGS_POINT_CLOUD (default ON)
- -DENABLE_MSGS_BOUNDED_SEQUENCE (default OFF)
- -DENABLE_MSGS_BOUNDED_SEQUENCE_FLAT (default OFF)
- -DENABLE_MSGS_UNBOUNDED_SEQUENCE (default OFF)
- -DENABLE_MSGS_ALL (default OFF)
  - when ON, overrides the other defaults to ON
  - you can still optionally exclude some messages by explicitly setting them to OFF
    Removed
Removed a few messages:
- Range
- RadarTrack
- RadarDetection
- NavSatFix
  Fixed
In all cases, including loaned messages, capture the timestamp as the last step of initializing the message

1.3.7 (2023/01/04)

1.3.6 (2023/01/03)

Fixed

Set the correct IDL_GEN_ROOT for rclcpp plugins

1.3.5 (2022/12/05)

Fixed

Exit cleanly when a publisher process terminates before a subscriber process

1.3.4 (2022/11/28)

Changed

Updated Apex.OS plugins to use the unified LoanedSample::data()

1.3.3 (2022/11/28)

Fixed

Implement the missing take() method in ApexOSPollingSubscriptionSubscriber

1.3.2 (2022/11/21)

Fixed

Capture the this pointer in the lambda in the iceoryx publisher

1.3.1 (2022/11/21)

Added

New Apex.OS plugin, compatible with the ThreadedRunners
- The INTER_THREAD and INTRA_THREAD execution strategies, combined with -c ApexOSPollingSubscription, will use the ThreadedRunner instances
- The new APEX_SINGLE_EXECUTOR execution strategy will add all publishers and subscribers to a single Apex.OS Executor
- The new APEX_EXECUTOR_PER_COMMUNICATOR execution strategy will add each publisher and each subscriber to its own Apex.OS Executor instance
- The new APEX_CHAIN execution strategy will add a publisher and subscriber as a chain of nodes to an Apex.OS Executor
  Changed
Refactored FastRTPS communicator plugin:
- Uses DDS compliant API
- Code generator updated
- Implementation for publish_loaned()
- Dockerfile improvements
  Removed
CLI arg --disable-async. Synchronous / asynchronous publishing should be configured externally depending on the communication mean used.

1.3.0 (2022/08/25)

Added

New execution strategy option:
- The default -e INTER_THREAD runs each publisher and subscriber in its own separate thread, which matches the previous behavior
- A new -e INTRA_THREAD, which runs a single publisher and subscriber in the same thread. The publisher writes, and the subscriber immediately takes it
- For Apex.OS specifically, some optimized execution strategies which use the proprietary Apex.OS executor
  Changed
Significantly refactored the communicator plugins:
- Each plugin is split into an implementation of a Publisher and a Subscriber, instead of a single Communicator
- The plugin is no longer responsible for managing the metrics, such as sample count, lost samples, and latency
- The plugin does not require any special logic to support roundtrip mode
- It is safe for the plugins to initialize their data writers and readers at construction time, instead of delaying the initialization to the first call of publish() or update_subscription()
- Split publish() into publish_copy() and publish_loaned()
Significantly refactored the runner framework:
- The runner framework is responsible for the experiment metrics
- It manages the roundtrip mode logic
- It is extensible for different execution strategies or thread configurations
The iceoryx plugin now uses the untyped API, for improved performance

1.2.1 (2022/06/30)

Fixed

Capture the timestamp as soon as a message is received, instead of just before storing the metrics, to reduce the reported latency to a more correct value

1.2.0 (2022/06/28)

Changed

The CLI arguments for specifying the output type have changed:
- For console output, updated every second, add --print-to-console
- For file output, use --logfile my_file.csv or --logfile my_file.json
  - The type will be deduced from the file name
- If neither of these options is specified, then a warning will print, and the experiment will still run
The linter configurations are now configured locally. This means that the output of colcon test should be the same no matter the installed ROS distribution.
The --zero-copy arg is now valid even if the publisher and subscriber(s) are in the same process
Removed
The publisher and subscriber loop reserve metrics are no longer recorded or reported
Fixed
CPU usage will no longer be stuck at 0

Removed

The pub/sub loop reserve time metrics

1.1.2 (2022/06/08)

Changed

Use steady_clock for all platforms, including QNX QOS

1.1.1 (2022/06/07)

Changed

Significant refactor to simplify the analysis pipeline
Fixed
Add some missing definitions when Apex.OS is enabled, but the rclcpp plugins are disabled

1.1.0 (2022/06/02)

Added

New Apex.OS Polling Subscription plugin
Compatibility with ROS 2 Humble

1.0.0 (2022/05/12)

Added

More expressive perf_test CLI args for QOS settings
A plugin for Cyclone DDS with C++ bindings v0.9.0b1
Changed
CLI args for QOS settings:
- --reliability <RELIABLE|BEST_EFFORT>
- --durability <TRANSIENT_LOCAL|VOLATILE>
- --history <KEEP_LAST|KEEP_ALL>
master branch is compatible with many ROS 2 distributions:
- dashing
- eloquent
- foxy
- galactic
- rolling
  Deprecated
CLI flags for QOS settings:
- --reliable
- --transient
- --keep-last
  Removed
The branches for specific ROS 2 distributions have been deleted
Fixed
CI jobs and Dockerfiles are decoupled from the middleware bundled with the ROS 2 distribution

Wiki Tutorials

This package does not provide any links to tutorials in it's rosindex metadata. You can check on the ROS Wiki Tutorials page for the package.

Package Dependencies

Deps	Name
	rclcpp
	ros_environment
	ament_cmake
	rosidl_default_generators
	rmw_implementation
	rosidl_default_runtime
	ament_cmake_gtest
	ament_lint_auto
	ament_lint_common

System Dependencies

Name
git

Dependant Packages

Name	Deps
performance_report

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `performance_test` at Robotics Stack Exchange

performance_test package from performance_test repo

performance_report performance_test performance_test_ros1_msgs performance_test_ros1_publisher

Package Summary

Tags	No category tags.
Version	2.3.0
License	Apache 2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Description
Checkout URI	https://gitlab.com/ApexAI/performance_test.git
VCS Type	git
VCS Version	master
Last Updated	2025-03-25
Dev Status	MAINTAINED
CI status	No Continuous Integration
Released	RELEASED
Tags	No category tags.
Contributing	Help Wanted (0) Good First Issues (0) Pull Requests to Review (0)

Package Description

Tool to test performance of ROS 2 and DDS data layers and communication.

Additional Links

No additional links.

Maintainers

Apex.AI, Inc.

Authors

No additional authors.

performance_test

[TOC]

The performance_test tool tests latency and other performance metrics of various middleware implementations that support a pub/sub pattern. It is used to simulate non-functional performance of your application.

The performance_test tool allows you to quickly set up a pub/sub configuration, e.g. number of publisher/subscribers, message size, QOS settings, middleware. The following metrics are automatically recorded when the application is running:

latency: corresponds to the time a message takes to travel from a publisher to subscriber. The latency is measured by timestamping the sample when it’s published and subtracting the timestamp (from the sample) from the measured time when the sample arrives at the subscriber (only logged when a subscriber is created)
CPU usage: percentage of the total system wide CPU usage (logged separately for each instance of perf_test)
resident memory: heap allocations, shared memory segments, stack (used for system’s internal work) (logged separately for each instance of perf_test)
sample statistics: number of samples received, sent, and lost per experiment run.

This master branch is compatible with the following ROS 2 versions

rolling
jazzy
iron
humble
galactic
foxy
eloquent
dashing
Apex.OS

How to use this document

Start here for a quick example of building and running the performance_test tool with the Cyclone DDS plugin.
If needed, find more detailed information about building and running
Or, if the quick example is good enough, skip ahead to the list of supported middleware plugins to learn how to test a specific middleware implementation.
Check out the tools for visualizing the results
If desired, read about the design and architecture of the tool.

Example

This example shows how to test the non-functional performance of the following configuration:

Option	Value
Plugin	Cyclone DDS
Message type	Array1k
Publishing rate	100Hz
Topic name	test_topic
Duration of the experiment	30s
Number of publisher(s)	1 (default)
Number of subscriber(s)	1 (default)

Install ROS 2
Install Cyclone DDS to /opt/cyclonedds
Build performance_test with the CMake build flag for Cyclone DDS:

    source /opt/ros/rolling/setup.bash
    cd ~/perf_test_ws
    colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
    source ./install/setup.bash
    

Run with the communication plugin option for Cyclone DDS:

mkdir experiment
./install/performance_test/lib/performance_test/perf_test --communication CycloneDDS
                                                          --msg Array1k
                                                          --rate 100
                                                          --topic test_topic
                                                          --max-runtime 30
                                                          --logfile experiment/log.csv

At the end of the experiment, a CSV log file will be generated in the experiment folder with a name that starts with log.

Building the performance_test tool

For a simple example, see Dockerfile.rclcpp.

The performance_test tool is structured as a ROS 2 package, so colcon is used to build it. Therefore, you must source a ROS 2 installation:

source /opt/ros/rolling/setup.bash

Select a middleware plugin from this list. Then build the performance_test tool with the selected middleware:

mkdir -p ~/perf_test_ws/src
cd ~/perf_test_ws/src
git clone https://gitlab.com/ApexAI/performance_test.git
cd ..
# At this stage, you need to choose which middleware you want to use
# The list of available flags is described in the middleware plugins section
# Square brackets denote optional arguments, like in the Python documentation.
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release -DPERFORMANCE_TEST_PLUGIN=<plugin>
source install/setup.bash

Running an experiment

The performance_test experiments are run through the perf_test executable. To find the available settings, run with --help (note the required and default arguments):

~/perf_test_ws$ ./install/performance_test/lib/performance_test/perf_test --help

The -c argument should match the selected middleware plugin from the build phase.
The --msg argument should be one of the supported message types, which are shown in the --help output.

Single machine or distributed system?

Based on the configuration you want to test, the usage of the performance_test tool differs. The different possibilities are explained below.

For running tests on a single machine, you can choose between the following options:

Intraprocess means that the publisher and subscriber threads are in the same process.

    perf_test <options> --num-sub-threads 1 --num-pub-threads 1

Interprocess means that the publisher and subscriber are in different processes. To test interprocess communication, two instances of the performance_test must be run, e.g.

    # Start the subscriber first
    perf_test <options> --num-sub-threads 1 --num-pub-threads 0 &
    sleep 1  # give the subscriber time to finish initializing
    perf_test <options> --num-sub-threads 0 --num-pub-threads 1
    

On a distributed system, testing latency is difficult, because the clocks are probably not perfectly synchronized between the two devices. To work around this, the performance_test tool supports relay mode, which allows for a round-trip style of communication:

# On the main machine
perf_test <options> --roundtrip-mode Main

# On the relay machine:
perf_test <options> --roundtrip-mode Relay

In relay mode, the Main machine sends messages to the Relay machine, which immediately sends the messages back. The Main machine receives the relayed message, and reports the round-trip latency. Therefore, the reported latency will be roughly double the latency compared to the latency reported in non-relay mode.

Single machine, single thread

An intra-thread configuration is experimentally supported, in which a publisher and subscriber both operate in the same thread. The publisher writes a messages, and the subscriber immediately takes it.

perf_test <options> -e INTRA_THREAD

Notes:

This is only available when zero copy transfer is enabled
This requires exactly one publisher and one subscriber
This is not compatible with roundtrip mode

Middleware plugins

The performance test tool can measure the performance of a variety of communication solutions from different vendors. In this case there is no rclcpp or rmw layer overhead over the publisher and subscriber routines.

The performance_test tool implements an executor that runs the publisher(s) and/or the subscriber(s) in their own thread.

The following plugins are currently implemented:

Eclipse Cyclone DDS

Eclipse Cyclone DDS 0.9.0b1
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
Communication plugin: -c CycloneDDS
Docker file: Dockerfile.CycloneDDS
Available transports:
- Cyclone DDS zero copy requires RouDi to be running. - | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP |

Eclipse Cyclone DDS C++ binding

Eclipse Cyclone DDS C++ bindings 0.9.0b1
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS_CXX
Communication plugin: -c CycloneDDS-CXX
Docker file: Dockerfile.CycloneDDS-CXX
Available transports:
- Cyclone DDS zero copy requires the RouDi to be running. - | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP |

Eclipse iceoryx

iceoryx (latest master as of Feb 13)
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=ICEORYX
Communication plugin: -c iceoryx
Docker file: Dockerfile.iceoryx
The iceoryx plugin is not a DDS implementation.
- The DDS-specific options (such as domain ID, durability, and reliability) do not apply.
To run with the iceoryx plugin, RouDi must be running.
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |———–|———————|———————————–| | LoanedSamples | LoanedSamples | Not supported by performance_test |

eProsima Fast DDS

FastDDS 2.6.2
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=FASTDDS
Communication plugin: -c FastRTPS
Docker file: Dockerfile.FastDDS
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), LoanedSamples (--zero-copy) | SHMEM (default), LoanedSamples (--zero-copy) | UDP |

OCI OpenDDS

OpenDDS 3.13.2
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=OPENDDS
Communication plugin: -c OpenDDS
Docker file: Dockerfile.OpenDDS
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | TCP | TCP | TCP |

RTI Connext DDS

RTI Connext DDS 5.3.1+
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CONNEXTDDS
Communication plugin: -c ConnextDDS
Docker file: Not available
A license is required
You need to source an RTI Connext DDS environment.
- If RTI Connext DDS was installed with ROS 2 (Linux only):
  - source /opt/rti.com/rti_connext_dds-5.3.1/setenv_ros2rti.bash
- If RTI Connext DDS is installed separately, you can source the following script to set the environment:
  - source <connextdds_install_path>/resource/scripts/rtisetenv_<arch>.bash
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |

RTI Connext DDS Micro

Connext DDS Micro 3.0.3
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CONNEXTDDSMICRO
Communication plugin: -c ConnextDDSMicro
Docker file: Not available
A license is required
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |

Framework plugins

The performance_test tool can also measure the end-to-end latency of a framework. In this case, the executor of the framework is used to run the publisher(s) and/or the subscriber(s). The potential overhead of the rclcpp or rmw layer is measured.

ROS 2

The performance test tool can also measure the performance of a variety of RMW implementations, through the ROS 2 rclcpp::publisher and rclcpp::subscriber API.

ROS 2 rclcpp::publisher and rclcpp::subscriber
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=ROS2 (default)
Communication plugin:
- Callback with Single Threaded Executor: -c rclcpp-single-threaded-executor
- Callback with Static Single Threaded Executor: -c rclcpp-static-single-threaded-executor
- rclcpp::WaitSet: -c rclcpp-waitset
Docker file: Dockerfile.rclcpp
Available underlying RMW implementations:
- ROS 2 Rolling is pre-configured to use rmw_fastrtps_cpp
- Follow these instructions to use a different RMW implementation
Available transports: depends on underlying RMW implementation
- LoanedSamples are available (--zero-copy) for ROS_DISTRO = foxy and above

Apex.OS

Apex.OS
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=APEX_OS
- It is also required to source /opt/ApexOS/setup.bash instead of a ROS 2 distribution
Communication plugin: -c ApexOSPollingSubscription
Docker file: Not available
Available underlying RMW implementations: rmw_apex_middleware
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero_copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero_copy) | UDP |

Analyze the results

After an experiment is run with the -l flag, a log file is recorded. Both CSV and JSON formats are supported. It is possible to add custom data to the log file by setting theAPEX_PERFORMANCE_TEST environment variable before running an experiment, e.g.

# JSON format
export APEX_PERFORMANCE_TEST="
{
\"My Version\": \"1.0.4\",
\"My Image Version\": \"5.2\",
\"My OS Version\": \"Ubuntu 16.04\"
}
"

Plot the results

To plot the results in the JSON or CSV log files, see the plotter README.

Architecture

Apex.AI’s Performance Testing in ROS 2 white paper (available here) describes how to design a fair and unbiased performance test, and is the basis for this project.

Each middleware has a different API. Thanks to the Plugin abstraction, the core logic of setting up and running an experiment is completely decoupled from the implementation details of sending and receiving individual messages.

Exactly one Plugin implementation is selected at build time. The design is similar to the Abstract Factory pattern. performance_test declares, but does not define, a static factory method in the PluginFactory class. Each middleware provides a definition for this factory method to create a concrete Plugin implementation, and perf_test calls this factory method directly.

An example plugin is available here.

Performance optimizations

On linux-based platforms, perf_test writes 0 to /dev/cpu_dma_latency and holds open the file handle, which will prevent the CPU from entering any idle states for the duration of the experiment. This should result in lower message latency and lower variance in that latency.

Future extensions and limitations

Communication frameworks like DDS have a huge amount of settings. This tool only allows the most common QOS settings to be configured. The other QOS settings are hardcoded in the application.
Only one publisher per topic is allowed, because the data verification logic does not support matching data to the different publishers.
Some communication plugins can get stuck in their internal loops if too much data is received. Figuring out ways around such issues is one of the goals of this tool.
FastRTPS wait-set does not support timeouts which can lead to the receiving not aborting. In that case the performance test must be manually killed.
Using Connext DDS Micro INTRA transport with reliable QoS and history kind set to keep_all is not supported with Connext Micro. Set keep-last as QoS history kind always when using reliable.

Possible additional communication which could be implemented are:

Raw UDP communication

Building with limited resources

When building this tool, the compiler must perform a lot of template expansion. This can be overwhelming for a system with a low-power CPU or limited RAM. There are some additional CMake options which can reduce the system load during compilation:

This tool includes many different message types, each with many different sizes. Reduce the number of messages, and thus the compilation load, by disabling one or more message types. For example, to build without PointCloud messages, add -DENABLE_MSGS_POINDCLOUD=OFF to the --cmake-args. The message types, and their options for enabling/disabling, can be found here.

CHANGELOG

Changelog for package performance_test

X.Y.Z (YYYY/MM/DD)

2.3.0 (2024/09/24)

Removed

Moved apex_performance_plotter to its own package here

2.2.0 (2024/05/15)

Added

performance_test can be built with ROS 2 Iron and Jazzy
Changed
Renamed the --dds-domain_id CLI arg to --dds-domain-id
When --dds-domain-id is unspecified, fall back to the ROS_DOMAIN_ID environment variable
--zero-copy has been separated into two flags:
- --shared-memory: Enable shared-memory transfer in the plugin. This is meant to replace the need to manually set runtime flags via CYCLONEDDS_URI, APEX_MIDDLEWARE_SETTINGS, etc.
- --loaned-samples: When publishing messages in the plugin, borrow loaned samples instead of publishing by copy
- --zero-copy is now an alias for --shared-memory --loaned-samples
- Supported plugins include:
  - -c CycloneDDS
  - -c CycloneDDS-CXX
  - -c ApexOSPollingSubscription
  - -c rclcpp-* with RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
  - -c rclcpp-* with RMW_IMPLEMENTATION=rmw_fastrtps_cpp

2.1.0 (2024/04/17)

Added

Add new function prepare() to the Publisher and Subscriber API, intended to allow participant discovery without blocking the main thread
Changed
Change the default --history arg from KEEP_ALL to KEEP_LAST
Change the default --history-depth arg from 1000 to 16
If --expected-num-pubs is unspecified, set it to the same value as -p
If --expected-num-subs is unspecified, set it to the same value as -s
Fixed
Removed an unused variable to fix a Clang build
Remove unused variable names in the Plugin abstract class
Fix a potential lockup in PublisherTask on QNX

2.0.0 (2024/03/19)

Added

Add experimental bazel support
- bazel build //performance_test --//:plugin_implementation=//path/to/a/plugin
Add a rudimentary socket-based plugin for testing the bazel support
- bazel run //performance_test --//:plugin_implementation=//performance_test/plugins/demo:demo_plugin -- --help
  Changed
Instead of enabling/disabling each plugin, you select exactly one with a CMake string option, for example:
- colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=ROS2
Renamed the --communication CLI arg to --communicator. The short -c is unchanged.
Removed
Removed the deprecated CLI flags for QOS settings:
- Instead of --reliable, use --reliability RELIABLE
- Instead of --transient, use --durability TRANSIENT_LOCAL
- Instead of --keep-last, use --history KEEP_LAST
Removed the obsolete BoundedSequenceFlat messages
Removed the superfluous --msg-list CLI flag. The --help message already lists the available messages.
Fixed
Update the Apex.OS Runner to use executor_runner::deferred instead of executor_runner::deferred_tag()
Ensure that the first few published samples are sent at the expected rate

1.5.2 (YYYY/MM/DD)

Added

--prevent-cpu-idle is available on QNX
Changed
JSON log files will contain all values in the APEX_PERFORMANCE_TEST dictionary, instead of the five specific values used previously
Switch to build as C++17 by default
Fixed
Zero copy transfer is again enabled for the rclcpp publisher

1.5.0 (2023/06/14)

Added

New CLI switch --prevent-cpu-idle (linux only). When specified, perf_test will use /dev/cpu_dma_latency to request that the CPU not enter any sleep states, to potentially give more consistent results
Some smaller Array messages, down to 32 bits
Added support to the FastDDS plugin for bounded and unbounded sequences
Changed
Update the README to better explain how to use this tool with Apex.OS
In the Runner, allocate the AnalysisResults on the stack instead of using shared_ptr
Subscriber methods accept a callback parameter, instead of returning a vector of results, to reduce heap usage
Refactored the interaction between SubscriberStats and AnalysisResult to remove the need for a std::vector of latency samples, to reduce heap usage
Adjusted the Array message sizes to make the name match the contents
Updated apex_os_communicator to use the new zero-copy API

1.4.2 (2023/03/15)

Added

Added perfplot support for JSON log files
Changed
Migrate the Apex.OS target to use rosidl_get_typesupport_target
Preallocate the JSON logger’s string buffer to prevent reallocations after the experiment begins

1.4.1 (2023/02/23)

Changed

Updated the iceoryx plugin to the latest master as of Feb 13

1.4.0 (2023/02/20)

Added

New message type BoundedSequenceFlat
- This is a BoundedSequence with the @flat annotation
- Sizes range from 1kB to 8MB, like Array and BoundedSequence
  Changed
Messages of different types can be optionally included via CMake args:
- -DENABLE_MSGS_ARRAY (default ON)
- -DENABLE_MSGS_STRUCT (default ON)
- -DENABLE_MSGS_POINT_CLOUD (default ON)
- -DENABLE_MSGS_BOUNDED_SEQUENCE (default OFF)
- -DENABLE_MSGS_BOUNDED_SEQUENCE_FLAT (default OFF)
- -DENABLE_MSGS_UNBOUNDED_SEQUENCE (default OFF)
- -DENABLE_MSGS_ALL (default OFF)
  - when ON, overrides the other defaults to ON
  - you can still optionally exclude some messages by explicitly setting them to OFF
    Removed
Removed a few messages:
- Range
- RadarTrack
- RadarDetection
- NavSatFix
  Fixed
In all cases, including loaned messages, capture the timestamp as the last step of initializing the message

1.3.7 (2023/01/04)

1.3.6 (2023/01/03)

Fixed

Set the correct IDL_GEN_ROOT for rclcpp plugins

1.3.5 (2022/12/05)

Fixed

Exit cleanly when a publisher process terminates before a subscriber process

1.3.4 (2022/11/28)

Changed

Updated Apex.OS plugins to use the unified LoanedSample::data()

1.3.3 (2022/11/28)

Fixed

Implement the missing take() method in ApexOSPollingSubscriptionSubscriber

1.3.2 (2022/11/21)

Fixed

Capture the this pointer in the lambda in the iceoryx publisher

1.3.1 (2022/11/21)

Added

New Apex.OS plugin, compatible with the ThreadedRunners
- The INTER_THREAD and INTRA_THREAD execution strategies, combined with -c ApexOSPollingSubscription, will use the ThreadedRunner instances
- The new APEX_SINGLE_EXECUTOR execution strategy will add all publishers and subscribers to a single Apex.OS Executor
- The new APEX_EXECUTOR_PER_COMMUNICATOR execution strategy will add each publisher and each subscriber to its own Apex.OS Executor instance
- The new APEX_CHAIN execution strategy will add a publisher and subscriber as a chain of nodes to an Apex.OS Executor
  Changed
Refactored FastRTPS communicator plugin:
- Uses DDS compliant API
- Code generator updated
- Implementation for publish_loaned()
- Dockerfile improvements
  Removed
CLI arg --disable-async. Synchronous / asynchronous publishing should be configured externally depending on the communication mean used.

1.3.0 (2022/08/25)

Added

New execution strategy option:
- The default -e INTER_THREAD runs each publisher and subscriber in its own separate thread, which matches the previous behavior
- A new -e INTRA_THREAD, which runs a single publisher and subscriber in the same thread. The publisher writes, and the subscriber immediately takes it
- For Apex.OS specifically, some optimized execution strategies which use the proprietary Apex.OS executor
  Changed
Significantly refactored the communicator plugins:
- Each plugin is split into an implementation of a Publisher and a Subscriber, instead of a single Communicator
- The plugin is no longer responsible for managing the metrics, such as sample count, lost samples, and latency
- The plugin does not require any special logic to support roundtrip mode
- It is safe for the plugins to initialize their data writers and readers at construction time, instead of delaying the initialization to the first call of publish() or update_subscription()
- Split publish() into publish_copy() and publish_loaned()
Significantly refactored the runner framework:
- The runner framework is responsible for the experiment metrics
- It manages the roundtrip mode logic
- It is extensible for different execution strategies or thread configurations
The iceoryx plugin now uses the untyped API, for improved performance

1.2.1 (2022/06/30)

Fixed

Capture the timestamp as soon as a message is received, instead of just before storing the metrics, to reduce the reported latency to a more correct value

1.2.0 (2022/06/28)

Changed

The CLI arguments for specifying the output type have changed:
- For console output, updated every second, add --print-to-console
- For file output, use --logfile my_file.csv or --logfile my_file.json
  - The type will be deduced from the file name
- If neither of these options is specified, then a warning will print, and the experiment will still run
The linter configurations are now configured locally. This means that the output of colcon test should be the same no matter the installed ROS distribution.
The --zero-copy arg is now valid even if the publisher and subscriber(s) are in the same process
Removed
The publisher and subscriber loop reserve metrics are no longer recorded or reported
Fixed
CPU usage will no longer be stuck at 0

Removed

The pub/sub loop reserve time metrics

1.1.2 (2022/06/08)

Changed

Use steady_clock for all platforms, including QNX QOS

1.1.1 (2022/06/07)

Changed

Significant refactor to simplify the analysis pipeline
Fixed
Add some missing definitions when Apex.OS is enabled, but the rclcpp plugins are disabled

1.1.0 (2022/06/02)

Added

New Apex.OS Polling Subscription plugin
Compatibility with ROS 2 Humble

1.0.0 (2022/05/12)

Added

More expressive perf_test CLI args for QOS settings
A plugin for Cyclone DDS with C++ bindings v0.9.0b1
Changed
CLI args for QOS settings:
- --reliability <RELIABLE|BEST_EFFORT>
- --durability <TRANSIENT_LOCAL|VOLATILE>
- --history <KEEP_LAST|KEEP_ALL>
master branch is compatible with many ROS 2 distributions:
- dashing
- eloquent
- foxy
- galactic
- rolling
  Deprecated
CLI flags for QOS settings:
- --reliable
- --transient
- --keep-last
  Removed
The branches for specific ROS 2 distributions have been deleted
Fixed
CI jobs and Dockerfiles are decoupled from the middleware bundled with the ROS 2 distribution

Wiki Tutorials

This package does not provide any links to tutorials in it's rosindex metadata. You can check on the ROS Wiki Tutorials page for the package.

Package Dependencies

Deps	Name
	rclcpp
	ros_environment
	ament_cmake
	rosidl_default_generators
	rmw_implementation
	rosidl_default_runtime
	ament_cmake_gtest
	ament_lint_auto
	ament_lint_common

System Dependencies

Name
git

Dependant Packages

Name	Deps
performance_report

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `performance_test` at Robotics Stack Exchange

performance_test package from performance_test repo

performance_report performance_test performance_test_ros1_msgs performance_test_ros1_publisher

Package Summary

Tags	No category tags.
Version	2.3.0
License	Apache 2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Description
Checkout URI	https://gitlab.com/ApexAI/performance_test.git
VCS Type	git
VCS Version	master
Last Updated	2025-03-25
Dev Status	MAINTAINED
CI status	No Continuous Integration
Released	RELEASED
Tags	No category tags.
Contributing	Help Wanted (0) Good First Issues (0) Pull Requests to Review (0)

Package Description

Tool to test performance of ROS 2 and DDS data layers and communication.

Additional Links

No additional links.

Maintainers

Apex.AI, Inc.

Authors

No additional authors.

performance_test

[TOC]

The performance_test tool tests latency and other performance metrics of various middleware implementations that support a pub/sub pattern. It is used to simulate non-functional performance of your application.

The performance_test tool allows you to quickly set up a pub/sub configuration, e.g. number of publisher/subscribers, message size, QOS settings, middleware. The following metrics are automatically recorded when the application is running:

latency: corresponds to the time a message takes to travel from a publisher to subscriber. The latency is measured by timestamping the sample when it’s published and subtracting the timestamp (from the sample) from the measured time when the sample arrives at the subscriber (only logged when a subscriber is created)
CPU usage: percentage of the total system wide CPU usage (logged separately for each instance of perf_test)
resident memory: heap allocations, shared memory segments, stack (used for system’s internal work) (logged separately for each instance of perf_test)
sample statistics: number of samples received, sent, and lost per experiment run.

This master branch is compatible with the following ROS 2 versions

rolling
jazzy
iron
humble
galactic
foxy
eloquent
dashing
Apex.OS

How to use this document

Start here for a quick example of building and running the performance_test tool with the Cyclone DDS plugin.
If needed, find more detailed information about building and running
Or, if the quick example is good enough, skip ahead to the list of supported middleware plugins to learn how to test a specific middleware implementation.
Check out the tools for visualizing the results
If desired, read about the design and architecture of the tool.

Example

This example shows how to test the non-functional performance of the following configuration:

Option	Value
Plugin	Cyclone DDS
Message type	Array1k
Publishing rate	100Hz
Topic name	test_topic
Duration of the experiment	30s
Number of publisher(s)	1 (default)
Number of subscriber(s)	1 (default)

Install ROS 2
Install Cyclone DDS to /opt/cyclonedds
Build performance_test with the CMake build flag for Cyclone DDS:

    source /opt/ros/rolling/setup.bash
    cd ~/perf_test_ws
    colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
    source ./install/setup.bash
    

Run with the communication plugin option for Cyclone DDS:

mkdir experiment
./install/performance_test/lib/performance_test/perf_test --communication CycloneDDS
                                                          --msg Array1k
                                                          --rate 100
                                                          --topic test_topic
                                                          --max-runtime 30
                                                          --logfile experiment/log.csv

At the end of the experiment, a CSV log file will be generated in the experiment folder with a name that starts with log.

Building the performance_test tool

For a simple example, see Dockerfile.rclcpp.

The performance_test tool is structured as a ROS 2 package, so colcon is used to build it. Therefore, you must source a ROS 2 installation:

source /opt/ros/rolling/setup.bash

Select a middleware plugin from this list. Then build the performance_test tool with the selected middleware:

mkdir -p ~/perf_test_ws/src
cd ~/perf_test_ws/src
git clone https://gitlab.com/ApexAI/performance_test.git
cd ..
# At this stage, you need to choose which middleware you want to use
# The list of available flags is described in the middleware plugins section
# Square brackets denote optional arguments, like in the Python documentation.
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release -DPERFORMANCE_TEST_PLUGIN=<plugin>
source install/setup.bash

Running an experiment

The performance_test experiments are run through the perf_test executable. To find the available settings, run with --help (note the required and default arguments):

~/perf_test_ws$ ./install/performance_test/lib/performance_test/perf_test --help

The -c argument should match the selected middleware plugin from the build phase.
The --msg argument should be one of the supported message types, which are shown in the --help output.

Single machine or distributed system?

Based on the configuration you want to test, the usage of the performance_test tool differs. The different possibilities are explained below.

For running tests on a single machine, you can choose between the following options:

Intraprocess means that the publisher and subscriber threads are in the same process.

    perf_test <options> --num-sub-threads 1 --num-pub-threads 1

Interprocess means that the publisher and subscriber are in different processes. To test interprocess communication, two instances of the performance_test must be run, e.g.

    # Start the subscriber first
    perf_test <options> --num-sub-threads 1 --num-pub-threads 0 &
    sleep 1  # give the subscriber time to finish initializing
    perf_test <options> --num-sub-threads 0 --num-pub-threads 1
    

On a distributed system, testing latency is difficult, because the clocks are probably not perfectly synchronized between the two devices. To work around this, the performance_test tool supports relay mode, which allows for a round-trip style of communication:

# On the main machine
perf_test <options> --roundtrip-mode Main

# On the relay machine:
perf_test <options> --roundtrip-mode Relay

In relay mode, the Main machine sends messages to the Relay machine, which immediately sends the messages back. The Main machine receives the relayed message, and reports the round-trip latency. Therefore, the reported latency will be roughly double the latency compared to the latency reported in non-relay mode.

Single machine, single thread

An intra-thread configuration is experimentally supported, in which a publisher and subscriber both operate in the same thread. The publisher writes a messages, and the subscriber immediately takes it.

perf_test <options> -e INTRA_THREAD

Notes:

This is only available when zero copy transfer is enabled
This requires exactly one publisher and one subscriber
This is not compatible with roundtrip mode

Middleware plugins

The performance test tool can measure the performance of a variety of communication solutions from different vendors. In this case there is no rclcpp or rmw layer overhead over the publisher and subscriber routines.

The performance_test tool implements an executor that runs the publisher(s) and/or the subscriber(s) in their own thread.

The following plugins are currently implemented:

Eclipse Cyclone DDS

Eclipse Cyclone DDS 0.9.0b1
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
Communication plugin: -c CycloneDDS
Docker file: Dockerfile.CycloneDDS
Available transports:
- Cyclone DDS zero copy requires RouDi to be running. - | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP |

Eclipse Cyclone DDS C++ binding

Eclipse Cyclone DDS C++ bindings 0.9.0b1
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS_CXX
Communication plugin: -c CycloneDDS-CXX
Docker file: Dockerfile.CycloneDDS-CXX
Available transports:
- Cyclone DDS zero copy requires the RouDi to be running. - | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero-copy) | UDP |

Eclipse iceoryx

iceoryx (latest master as of Feb 13)
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=ICEORYX
Communication plugin: -c iceoryx
Docker file: Dockerfile.iceoryx
The iceoryx plugin is not a DDS implementation.
- The DDS-specific options (such as domain ID, durability, and reliability) do not apply.
To run with the iceoryx plugin, RouDi must be running.
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |———–|———————|———————————–| | LoanedSamples | LoanedSamples | Not supported by performance_test |

eProsima Fast DDS

FastDDS 2.6.2
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=FASTDDS
Communication plugin: -c FastRTPS
Docker file: Dockerfile.FastDDS
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA (default), LoanedSamples (--zero-copy) | SHMEM (default), LoanedSamples (--zero-copy) | UDP |

OCI OpenDDS

OpenDDS 3.13.2
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=OPENDDS
Communication plugin: -c OpenDDS
Docker file: Dockerfile.OpenDDS
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | TCP | TCP | TCP |

RTI Connext DDS

RTI Connext DDS 5.3.1+
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CONNEXTDDS
Communication plugin: -c ConnextDDS
Docker file: Not available
A license is required
You need to source an RTI Connext DDS environment.
- If RTI Connext DDS was installed with ROS 2 (Linux only):
  - source /opt/rti.com/rti_connext_dds-5.3.1/setenv_ros2rti.bash
- If RTI Connext DDS is installed separately, you can source the following script to set the environment:
  - source <connextdds_install_path>/resource/scripts/rtisetenv_<arch>.bash
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |

RTI Connext DDS Micro

Connext DDS Micro 3.0.3
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=CONNEXTDDSMICRO
Communication plugin: -c ConnextDDSMicro
Docker file: Not available
A license is required
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |

Framework plugins

The performance_test tool can also measure the end-to-end latency of a framework. In this case, the executor of the framework is used to run the publisher(s) and/or the subscriber(s). The potential overhead of the rclcpp or rmw layer is measured.

ROS 2

The performance test tool can also measure the performance of a variety of RMW implementations, through the ROS 2 rclcpp::publisher and rclcpp::subscriber API.

ROS 2 rclcpp::publisher and rclcpp::subscriber
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=ROS2 (default)
Communication plugin:
- Callback with Single Threaded Executor: -c rclcpp-single-threaded-executor
- Callback with Static Single Threaded Executor: -c rclcpp-static-single-threaded-executor
- rclcpp::WaitSet: -c rclcpp-waitset
Docker file: Dockerfile.rclcpp
Available underlying RMW implementations:
- ROS 2 Rolling is pre-configured to use rmw_fastrtps_cpp
- Follow these instructions to use a different RMW implementation
Available transports: depends on underlying RMW implementation
- LoanedSamples are available (--zero-copy) for ROS_DISTRO = foxy and above

Apex.OS

Apex.OS
CMake build flag: -DPERFORMANCE_TEST_PLUGIN=APEX_OS
- It is also required to source /opt/ApexOS/setup.bash instead of a ROS 2 distribution
Communication plugin: -c ApexOSPollingSubscription
Docker file: Not available
Available underlying RMW implementations: rmw_apex_middleware
Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero_copy) | UDP (default), SHMEM (--shared-memory), LoanedSamples (--zero_copy) | UDP |

Analyze the results

After an experiment is run with the -l flag, a log file is recorded. Both CSV and JSON formats are supported. It is possible to add custom data to the log file by setting theAPEX_PERFORMANCE_TEST environment variable before running an experiment, e.g.

# JSON format
export APEX_PERFORMANCE_TEST="
{
\"My Version\": \"1.0.4\",
\"My Image Version\": \"5.2\",
\"My OS Version\": \"Ubuntu 16.04\"
}
"

Plot the results

To plot the results in the JSON or CSV log files, see the plotter README.

Architecture

Apex.AI’s Performance Testing in ROS 2 white paper (available here) describes how to design a fair and unbiased performance test, and is the basis for this project.

Each middleware has a different API. Thanks to the Plugin abstraction, the core logic of setting up and running an experiment is completely decoupled from the implementation details of sending and receiving individual messages.

Exactly one Plugin implementation is selected at build time. The design is similar to the Abstract Factory pattern. performance_test declares, but does not define, a static factory method in the PluginFactory class. Each middleware provides a definition for this factory method to create a concrete Plugin implementation, and perf_test calls this factory method directly.

An example plugin is available here.

Performance optimizations

On linux-based platforms, perf_test writes 0 to /dev/cpu_dma_latency and holds open the file handle, which will prevent the CPU from entering any idle states for the duration of the experiment. This should result in lower message latency and lower variance in that latency.

Future extensions and limitations

Communication frameworks like DDS have a huge amount of settings. This tool only allows the most common QOS settings to be configured. The other QOS settings are hardcoded in the application.
Only one publisher per topic is allowed, because the data verification logic does not support matching data to the different publishers.
Some communication plugins can get stuck in their internal loops if too much data is received. Figuring out ways around such issues is one of the goals of this tool.
FastRTPS wait-set does not support timeouts which can lead to the receiving not aborting. In that case the performance test must be manually killed.
Using Connext DDS Micro INTRA transport with reliable QoS and history kind set to keep_all is not supported with Connext Micro. Set keep-last as QoS history kind always when using reliable.

Possible additional communication which could be implemented are:

Raw UDP communication

Building with limited resources

When building this tool, the compiler must perform a lot of template expansion. This can be overwhelming for a system with a low-power CPU or limited RAM. There are some additional CMake options which can reduce the system load during compilation:

This tool includes many different message types, each with many different sizes. Reduce the number of messages, and thus the compilation load, by disabling one or more message types. For example, to build without PointCloud messages, add -DENABLE_MSGS_POINDCLOUD=OFF to the --cmake-args. The message types, and their options for enabling/disabling, can be found here.

CHANGELOG

Changelog for package performance_test

X.Y.Z (YYYY/MM/DD)

2.3.0 (2024/09/24)

Removed

Moved apex_performance_plotter to its own package here

2.2.0 (2024/05/15)

Added

performance_test can be built with ROS 2 Iron and Jazzy
Changed
Renamed the --dds-domain_id CLI arg to --dds-domain-id
When --dds-domain-id is unspecified, fall back to the ROS_DOMAIN_ID environment variable
--zero-copy has been separated into two flags:
- --shared-memory: Enable shared-memory transfer in the plugin. This is meant to replace the need to manually set runtime flags via CYCLONEDDS_URI, APEX_MIDDLEWARE_SETTINGS, etc.
- --loaned-samples: When publishing messages in the plugin, borrow loaned samples instead of publishing by copy
- --zero-copy is now an alias for --shared-memory --loaned-samples
- Supported plugins include:
  - -c CycloneDDS
  - -c CycloneDDS-CXX
  - -c ApexOSPollingSubscription
  - -c rclcpp-* with RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
  - -c rclcpp-* with RMW_IMPLEMENTATION=rmw_fastrtps_cpp

2.1.0 (2024/04/17)

Added

Add new function prepare() to the Publisher and Subscriber API, intended to allow participant discovery without blocking the main thread
Changed
Change the default --history arg from KEEP_ALL to KEEP_LAST
Change the default --history-depth arg from 1000 to 16
If --expected-num-pubs is unspecified, set it to the same value as -p
If --expected-num-subs is unspecified, set it to the same value as -s
Fixed
Removed an unused variable to fix a Clang build
Remove unused variable names in the Plugin abstract class
Fix a potential lockup in PublisherTask on QNX

2.0.0 (2024/03/19)

Added

Add experimental bazel support
- bazel build //performance_test --//:plugin_implementation=//path/to/a/plugin
Add a rudimentary socket-based plugin for testing the bazel support
- bazel run //performance_test --//:plugin_implementation=//performance_test/plugins/demo:demo_plugin -- --help
  Changed
Instead of enabling/disabling each plugin, you select exactly one with a CMake string option, for example:
- colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=ROS2
Renamed the --communication CLI arg to --communicator. The short -c is unchanged.
Removed
Removed the deprecated CLI flags for QOS settings:
- Instead of --reliable, use --reliability RELIABLE
- Instead of --transient, use --durability TRANSIENT_LOCAL
- Instead of --keep-last, use --history KEEP_LAST
Removed the obsolete BoundedSequenceFlat messages
Removed the superfluous --msg-list CLI flag. The --help message already lists the available messages.
Fixed
Update the Apex.OS Runner to use executor_runner::deferred instead of executor_runner::deferred_tag()
Ensure that the first few published samples are sent at the expected rate

1.5.2 (YYYY/MM/DD)

Added

--prevent-cpu-idle is available on QNX
Changed
JSON log files will contain all values in the APEX_PERFORMANCE_TEST dictionary, instead of the five specific values used previously
Switch to build as C++17 by default
Fixed
Zero copy transfer is again enabled for the rclcpp publisher

1.5.0 (2023/06/14)

Added

New CLI switch --prevent-cpu-idle (linux only). When specified, perf_test will use /dev/cpu_dma_latency to request that the CPU not enter any sleep states, to potentially give more consistent results
Some smaller Array messages, down to 32 bits
Added support to the FastDDS plugin for bounded and unbounded sequences
Changed
Update the README to better explain how to use this tool with Apex.OS
In the Runner, allocate the AnalysisResults on the stack instead of using shared_ptr
Subscriber methods accept a callback parameter, instead of returning a vector of results, to reduce heap usage
Refactored the interaction between SubscriberStats and AnalysisResult to remove the need for a std::vector of latency samples, to reduce heap usage
Adjusted the Array message sizes to make the name match the contents
Updated apex_os_communicator to use the new zero-copy API

1.4.2 (2023/03/15)

Added

Added perfplot support for JSON log files
Changed
Migrate the Apex.OS target to use rosidl_get_typesupport_target
Preallocate the JSON logger’s string buffer to prevent reallocations after the experiment begins

1.4.1 (2023/02/23)

Changed

Updated the iceoryx plugin to the latest master as of Feb 13

1.4.0 (2023/02/20)

Added

New message type BoundedSequenceFlat
- This is a BoundedSequence with the @flat annotation
- Sizes range from 1kB to 8MB, like Array and BoundedSequence
  Changed
Messages of different types can be optionally included via CMake args:
- -DENABLE_MSGS_ARRAY (default ON)
- -DENABLE_MSGS_STRUCT (default ON)
- -DENABLE_MSGS_POINT_CLOUD (default ON)
- -DENABLE_MSGS_BOUNDED_SEQUENCE (default OFF)
- -DENABLE_MSGS_BOUNDED_SEQUENCE_FLAT (default OFF)
- -DENABLE_MSGS_UNBOUNDED_SEQUENCE (default OFF)
- -DENABLE_MSGS_ALL (default OFF)
  - when ON, overrides the other defaults to ON
  - you can still optionally exclude some messages by explicitly setting them to OFF
    Removed
Removed a few messages:
- Range
- RadarTrack
- RadarDetection
- NavSatFix
  Fixed
In all cases, including loaned messages, capture the timestamp as the last step of initializing the message

1.3.7 (2023/01/04)

1.3.6 (2023/01/03)

Fixed

Set the correct IDL_GEN_ROOT for rclcpp plugins

1.3.5 (2022/12/05)

Fixed

Exit cleanly when a publisher process terminates before a subscriber process

1.3.4 (2022/11/28)

Changed

Updated Apex.OS plugins to use the unified LoanedSample::data()

1.3.3 (2022/11/28)

Fixed

Implement the missing take() method in ApexOSPollingSubscriptionSubscriber

1.3.2 (2022/11/21)

Fixed

Capture the this pointer in the lambda in the iceoryx publisher

1.3.1 (2022/11/21)

Added

New Apex.OS plugin, compatible with the ThreadedRunners
- The INTER_THREAD and INTRA_THREAD execution strategies, combined with -c ApexOSPollingSubscription, will use the ThreadedRunner instances
- The new APEX_SINGLE_EXECUTOR execution strategy will add all publishers and subscribers to a single Apex.OS Executor
- The new APEX_EXECUTOR_PER_COMMUNICATOR execution strategy will add each publisher and each subscriber to its own Apex.OS Executor instance
- The new APEX_CHAIN execution strategy will add a publisher and subscriber as a chain of nodes to an Apex.OS Executor
  Changed
Refactored FastRTPS communicator plugin:
- Uses DDS compliant API
- Code generator updated
- Implementation for publish_loaned()
- Dockerfile improvements
  Removed
CLI arg --disable-async. Synchronous / asynchronous publishing should be configured externally depending on the communication mean used.

1.3.0 (2022/08/25)

Added

New execution strategy option:
- The default -e INTER_THREAD runs each publisher and subscriber in its own separate thread, which matches the previous behavior
- A new -e INTRA_THREAD, which runs a single publisher and subscriber in the same thread. The publisher writes, and the subscriber immediately takes it
- For Apex.OS specifically, some optimized execution strategies which use the proprietary Apex.OS executor
  Changed
Significantly refactored the communicator plugins:
- Each plugin is split into an implementation of a Publisher and a Subscriber, instead of a single Communicator
- The plugin is no longer responsible for managing the metrics, such as sample count, lost samples, and latency
- The plugin does not require any special logic to support roundtrip mode
- It is safe for the plugins to initialize their data writers and readers at construction time, instead of delaying the initialization to the first call of publish() or update_subscription()
- Split publish() into publish_copy() and publish_loaned()
Significantly refactored the runner framework:
- The runner framework is responsible for the experiment metrics
- It manages the roundtrip mode logic
- It is extensible for different execution strategies or thread configurations
The iceoryx plugin now uses the untyped API, for improved performance

1.2.1 (2022/06/30)

Fixed

Capture the timestamp as soon as a message is received, instead of just before storing the metrics, to reduce the reported latency to a more correct value

1.2.0 (2022/06/28)

Changed

The CLI arguments for specifying the output type have changed:
- For console output, updated every second, add --print-to-console
- For file output, use --logfile my_file.csv or --logfile my_file.json
  - The type will be deduced from the file name
- If neither of these options is specified, then a warning will print, and the experiment will still run
The linter configurations are now configured locally. This means that the output of colcon test should be the same no matter the installed ROS distribution.
The --zero-copy arg is now valid even if the publisher and subscriber(s) are in the same process
Removed
The publisher and subscriber loop reserve metrics are no longer recorded or reported
Fixed
CPU usage will no longer be stuck at 0

Removed

The pub/sub loop reserve time metrics

1.1.2 (2022/06/08)

Changed

Use steady_clock for all platforms, including QNX QOS

1.1.1 (2022/06/07)

Changed

Significant refactor to simplify the analysis pipeline
Fixed
Add some missing definitions when Apex.OS is enabled, but the rclcpp plugins are disabled

1.1.0 (2022/06/02)

Added

New Apex.OS Polling Subscription plugin
Compatibility with ROS 2 Humble

1.0.0 (2022/05/12)

Added

More expressive perf_test CLI args for QOS settings
A plugin for Cyclone DDS with C++ bindings v0.9.0b1
Changed
CLI args for QOS settings:
- --reliability <RELIABLE|BEST_EFFORT>
- --durability <TRANSIENT_LOCAL|VOLATILE>
- --history <KEEP_LAST|KEEP_ALL>
master branch is compatible with many ROS 2 distributions:
- dashing
- eloquent
- foxy
- galactic
- rolling
  Deprecated
CLI flags for QOS settings:
- --reliable
- --transient
- --keep-last
  Removed
The branches for specific ROS 2 distributions have been deleted
Fixed
CI jobs and Dockerfiles are decoupled from the middleware bundled with the ROS 2 distribution

Wiki Tutorials

This package does not provide any links to tutorials in it's rosindex metadata. You can check on the ROS Wiki Tutorials page for the package.

Package Dependencies

Deps	Name
	rclcpp
	ros_environment
	ament_cmake
	rosidl_default_generators
	rmw_implementation
	rosidl_default_runtime
	ament_cmake_gtest
	ament_lint_auto
	ament_lint_common

System Dependencies

Name
git

Dependant Packages

Name	Deps
performance_report

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `performance_test` at Robotics Stack Exchange