audio_utils repository

No version for distro humble. Known supported distros are highlighted in the buttons above.

audio_utils repository

No version for distro jazzy. Known supported distros are highlighted in the buttons above.

audio_utils repository

No version for distro rolling. Known supported distros are highlighted in the buttons above.

audio_utils repository

audio_utils audio_utils_msgs

Repository Summary

Description	ROS node and utilities for audio streams.
Checkout URI	https://github.com/introlab/audio_utils.git
VCS Type	git
VCS Version	ros2
Last Updated	2025-01-22
Dev Status	UNKNOWN
CI status	No Continuous Integration
Released	UNRELEASED
Tags	No category tags.
Contributing	Help Wanted (0) Good First Issues (0) Pull Requests to Review (0)

Packages

Name	Version
audio_utils	0.0.0
audio_utils_msgs	0.0.0

README

audio_utils

ROS2 nodes and utilities for audio streams.

For ROS1, please see the ros1 branch.

Author(s): Marc-Antoine Maheux

Setup (Ubuntu)

The following subsections explain how to use the library on Ubuntu.

Install Dependencies

sudo apt-get install cmake build-essential gfortran texinfo libasound2-dev libpulse-dev libgfortran-*-dev

Install Python Dependencies

sudo pip install -r requirements.txt

or

sudo pip3 install -r requirements.txt

Setup Submodules

git submodule update --init --recursive

Nodes

`capture_node`

This node captures the sound from an ALSA or PulseAudio device and publishes it to a topic.

Parameters

backend (string): The backend to use (alsa or pulse_audio). The default value is alsa.
device (string): The device to capture (ex: hw:CARD=1,DEV=0 or default for ALSA, or alsa_input.usb-IntRoLab_16SoundsUSB_Audio_2.0-00.multichannel-input for PulseAudio). The default value is default.
format (string): The audio format ( see audio_utils_msgs/AudioFrame). The default value is signed_16.
channel_count (int): The device channel count. The default value is 1.
sampling_frequency (int): The device sampling frequency. The default value is 16000.
frame_sample_count (int): The number of samples in each frame. The default value is 1024.
merge (bool): Indicate to merge the channels or not. The default value is false.
gain (double): The gain to apply. The default value is 1.0.
latency_us (int): The capture latency in microseconds. The default value is 64000.
channel_map (Array of string): The PulseAudio channel mapping. If empty or omitted, the default mapping is used. This parameter must be set only with the PulseAudio backend. The default value is [].
queue_size (int): The publisher queue size. The default value is 1.

Published Topics

audio_out (audio_utils_msgs/AudioFrame) The captured sound.

`playback_node`

This node captures the sound from a topic and plays it to an ALSA or PulseAudio device.

Parameters

backend (string): The backend to use (alsa or pulse_audio). The default value is alsa.
device (string): The device to capture (ex: hw:CARD=1,DEV=0 or default for ALSA, or alsa_input.usb-IntRoLab_16SoundsUSB_Audio_2.0-00.multichannel-input for PulseAudio). The default value is default.
format (string): The audio format ( see audio_utils_msgs/AudioFrame). The default value is signed_16.
channel_count (int): The device channel count. The default value is 1.
sampling_frequency (int): The device sampling frequency. The default value is 16000.
frame_sample_count (int): The number of samples in each frame. The default value is 1024.
latency_us (int): The capture latency in microseconds. The default value is 64000.
channel_map (Array of string): The PulseAudio channel mapping. If empty or omitted, the default mapping is used. This parameter must be set only with the PulseAudio backend. The default value is [].
queue_size (int): The publisher queue size. The default value is 1.

Subscribed Topics

audio_in (audio_utils_msgs/AudioFrame) The sound to play.

`beat_detector_node`

This node estimates the song tempo and detects if the beat is in the current frame.

Parameters

sampling_frequency (int): The device sampling frequency. The default value is 44100.
frame_sample_count (int): The number of samples in each analysed frame. It must be a multiple of oss_fft_window_size. The default value is 128.
oss_fft_window_size (int): The onset strength signal window size. It must be greater than or equal to frame_sample_count. The default value is 1024.
flux_hamming_size (int): The flux hamming window size to calculate the onset strength signal. The default value is 15.
oss_bpm_window_size (int): The onset strength signal window size to calculate the BPM value. The default value is 1024.
min_bpm (double): The minimum valid BPM value. The default value is 50.
max_bpm (double): The maximum valid BPM value. The default value is 180.
bpm_candidate_count (int): The number of cross-correlations to perform to find the best BPM. The default value is 10.

Subscribed Topics

audio_in (audio_utils_msgs/AudioFrame) The sound to analyze. The channel count must be 1.

Published Topics

bpm (std_msgs/Float32): The tempo in bpm (beats per minute) for each frame.
beat (std_msgs/Bool): Indicate if the beat is in the current frame.

`vad_node`

This node performs voice activity detection with Silero VAD. The models folder contains the model trained by Silero VAD. The license of the model is MIT.

Parameters

silence_to_voice_threshold (double): The threshold to detect voice activity when silence was previously detected. The default value is 0.5.
voice_to_silence_threshold (double): The threshold to detect silence when voice activity was previously detected. It must be lower than silence_to_voice_threshold. The default value is 0.4.
min_silence_duration_ms (double): The minimum silence duration in ms. The default value is 500.

Subscribed Topics

audio_in (audio_utils_msgs/AudioFrame) The sound to analyze. The channel count must be 1. The samply frequency must be 16000 Hz. The frame sample count must be a multiple of 512.

Published Topics

voice_activity (audio_utils_msgs/VoiceActivity) The voice activity detection result.

`format_conversion_node.py`

This node converts the format of an audio topic.

Parameters

input_format (string): The input audio format ( see audio_utils_msgs/AudioFrame).
output_format (string): The output audio format ( see audio_utils_msgs/AudioFrame).

Subscribed Topics

audio_in (audio_utils_msgs/AudioFrame) The sound topic to convert.

Published Topics

audio_out (audio_utils_msgs/AudioFrame) The converted sound.

`resampling_node.py`

This node resamples an audio topic.

Parameters

input_format (string): The input audio format ( see audio_utils_msgs/AudioFrame).
output_format (string): The output audio format ( see audio_utils_msgs/AudioFrame).
channel_count (int): The device channel count.
input_sampling_frequency (int): The input sampling frequency.
output_sampling_frequency (int): The output sampling frequency.
input_frame_sample_count (int): The number of samples in each frame of the input.
dynamic_input_resampling (bool: default is false): If true, always adjust the input sampling informations ( format, sampling frequency and frame sample count) to the sampling informations of the reveiced frames, dynamically. In this mode, input_format, input_sampling_frequency and input_frame_sample_count are not required, but they can be used to save a recomputation if the starting input sampling informations are known.

Subscribed Topics

audio_in (audio_utils_msgs/AudioFrame) The sound topic to resample.

Published Topics

audio_out (audio_utils_msgs/AudioFrame) The resampled sound.

`split_channel_node.py`

This node split a multichannel audio topic into several mono audio topics.

Parameters

input_format (string): The input audio format ( see audio_utils_msgs/AudioFrame).
output_format (string): The output audio format ( see audio_utils_msgs/AudioFrame).
channel_count (int): The device channel count.