Repository Summary
Description | ROS node and utilities for audio streams. |
Checkout URI | https://github.com/introlab/audio_utils.git |
VCS Type | git |
VCS Version | ros2 |
Last Updated | 2025-01-22 |
Dev Status | UNKNOWN |
CI status | No Continuous Integration |
Released | UNRELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Packages
Name | Version |
---|---|
audio_utils | 0.0.0 |
audio_utils_msgs | 0.0.0 |
README
audio_utils
ROS2 nodes and utilities for audio streams.
For ROS1, please see the ros1
branch.
Author(s): Marc-Antoine Maheux
Setup (Ubuntu)
The following subsections explain how to use the library on Ubuntu.
Install Dependencies
sudo apt-get install cmake build-essential gfortran texinfo libasound2-dev libpulse-dev libgfortran-*-dev
Install Python Dependencies
sudo pip install -r requirements.txt
or
sudo pip3 install -r requirements.txt
Setup Submodules
git submodule update --init --recursive
Nodes
capture_node
This node captures the sound from an ALSA or PulseAudio device and publishes it to a topic.
Parameters
-
backend
(string): The backend to use (alsa
orpulse_audio
). The default value isalsa
. -
device
(string): The device to capture (ex:hw:CARD=1,DEV=0
ordefault
for ALSA, oralsa_input.usb-IntRoLab_16SoundsUSB_Audio_2.0-00.multichannel-input
for PulseAudio). The default value isdefault
. -
format
(string): The audio format ( see audio_utils_msgs/AudioFrame). The default value issigned_16
. -
channel_count
(int): The device channel count. The default value is1
. -
sampling_frequency
(int): The device sampling frequency. The default value is16000
. -
frame_sample_count
(int): The number of samples in each frame. The default value is1024
. -
merge
(bool): Indicate to merge the channels or not. The default value isfalse
. -
gain
(double): The gain to apply. The default value is1.0
. -
latency_us
(int): The capture latency in microseconds. The default value is64000
. -
channel_map
(Array of string): The PulseAudio channel mapping. If empty or omitted, the default mapping is used. This parameter must be set only with the PulseAudio backend. The default value is[]
. -
queue_size
(int): The publisher queue size. The default value is1
.
Published Topics
-
audio_out
(audio_utils_msgs/AudioFrame) The captured sound.
playback_node
This node captures the sound from a topic and plays it to an ALSA or PulseAudio device.
Parameters
-
backend
(string): The backend to use (alsa
orpulse_audio
). The default value isalsa
. -
device
(string): The device to capture (ex:hw:CARD=1,DEV=0
ordefault
for ALSA, oralsa_input.usb-IntRoLab_16SoundsUSB_Audio_2.0-00.multichannel-input
for PulseAudio). The default value isdefault
. -
format
(string): The audio format ( see audio_utils_msgs/AudioFrame). The default value issigned_16
. -
channel_count
(int): The device channel count. The default value is1
. -
sampling_frequency
(int): The device sampling frequency. The default value is16000
. -
frame_sample_count
(int): The number of samples in each frame. The default value is1024
. -
latency_us
(int): The capture latency in microseconds. The default value is64000
. -
channel_map
(Array of string): The PulseAudio channel mapping. If empty or omitted, the default mapping is used. This parameter must be set only with the PulseAudio backend. The default value is[]
. -
queue_size
(int): The publisher queue size. The default value is1
.
Subscribed Topics
-
audio_in
(audio_utils_msgs/AudioFrame) The sound to play.
beat_detector_node
This node estimates the song tempo and detects if the beat is in the current frame.
Parameters
-
sampling_frequency
(int): The device sampling frequency. The default value is44100
. -
frame_sample_count
(int): The number of samples in each analysed frame. It must be a multiple ofoss_fft_window_size
. The default value is128
. -
oss_fft_window_size
(int): The onset strength signal window size. It must be greater than or equal toframe_sample_count
. The default value is1024
. -
flux_hamming_size
(int): The flux hamming window size to calculate the onset strength signal. The default value is15
. -
oss_bpm_window_size
(int): The onset strength signal window size to calculate the BPM value. The default value is1024
. -
min_bpm
(double): The minimum valid BPM value. The default value is50
. -
max_bpm
(double): The maximum valid BPM value. The default value is180
. -
bpm_candidate_count
(int): The number of cross-correlations to perform to find the best BPM. The default value is10
.
Subscribed Topics
-
audio_in
(audio_utils_msgs/AudioFrame) The sound to analyze. The channel count must be 1.
Published Topics
-
bpm
(std_msgs/Float32): The tempo in bpm (beats per minute) for each frame. -
beat
(std_msgs/Bool): Indicate if the beat is in the current frame.
vad_node
This node performs voice activity detection with Silero VAD. The models folder contains the model trained by Silero VAD. The license of the model is MIT.
Parameters
-
silence_to_voice_threshold
(double): The threshold to detect voice activity when silence was previously detected. The default value is0.5
. -
voice_to_silence_threshold
(double): The threshold to detect silence when voice activity was previously detected. It must be lower thansilence_to_voice_threshold
. The default value is0.4
. -
min_silence_duration_ms
(double): The minimum silence duration in ms. The default value is500
.
Subscribed Topics
-
audio_in
(audio_utils_msgs/AudioFrame) The sound to analyze. The channel count must be 1. The samply frequency must be 16000 Hz. The frame sample count must be a multiple of 512.
Published Topics
-
voice_activity
(audio_utils_msgs/VoiceActivity) The voice activity detection result.
format_conversion_node.py
This node converts the format of an audio topic.
Parameters
-
input_format
(string): The input audio format ( see audio_utils_msgs/AudioFrame). -
output_format
(string): The output audio format ( see audio_utils_msgs/AudioFrame).
Subscribed Topics
-
audio_in
(audio_utils_msgs/AudioFrame) The sound topic to convert.
Published Topics
-
audio_out
(audio_utils_msgs/AudioFrame) The converted sound.
resampling_node.py
This node resamples an audio topic.
Parameters
-
input_format
(string): The input audio format ( see audio_utils_msgs/AudioFrame). -
output_format
(string): The output audio format ( see audio_utils_msgs/AudioFrame). -
channel_count
(int): The device channel count. -
input_sampling_frequency
(int): The input sampling frequency. -
output_sampling_frequency
(int): The output sampling frequency. -
input_frame_sample_count
(int): The number of samples in each frame of the input. -
dynamic_input_resampling
(bool: default isfalse
): Iftrue
, always adjust the input sampling informations ( format, sampling frequency and frame sample count) to the sampling informations of the reveiced frames, dynamically. In this mode,input_format
,input_sampling_frequency
andinput_frame_sample_count
are not required, but they can be used to save a recomputation if the starting input sampling informations are known.
Subscribed Topics
-
audio_in
(audio_utils_msgs/AudioFrame) The sound topic to resample.
Published Topics
-
audio_out
(audio_utils_msgs/AudioFrame) The resampled sound.
split_channel_node.py
This node split a multichannel audio topic into several mono audio topics.
Parameters
-
input_format
(string): The input audio format ( see audio_utils_msgs/AudioFrame). -
output_format
(string): The output audio format ( see audio_utils_msgs/AudioFrame). -
channel_count
(int): The device channel count.
Subscribed Topics
-
audio_in
(audio_utils_msgs/AudioFrame) The sound topic to split.
Published Topics
-
audio_out_0
(audio_utils_msgs/AudioFrame) The first channel sound. -
audio_out_1
(audio_utils_msgs/AudioFrame) The second channel sound. - …
raw_file_writer_node.py
This node writes the raw sound data to a file.
Parameters
-
output_path
(string): The output file path.
Subscribed Topics
-
audio_in
(audio_utils_msgs/AudioFrame) The sound topic to write.
License
Sponsor
IntRoLab - Intelligent / Interactive / Integrated / Interdisciplinary Robot Lab