audio_utils repository

audio_utils repository

audio_utils repository

Repository Summary

Description ROS node and utilities for audio streams.
Checkout URI https://github.com/introlab/audio_utils.git
VCS Type git
VCS Version ros2
Last Updated 2025-01-22
Dev Status UNKNOWN
CI status No Continuous Integration
Released UNRELEASED
Tags No category tags.
Contributing Help Wanted (0)
Good First Issues (0)
Pull Requests to Review (0)

Packages

Name Version
audio_utils 0.0.0
audio_utils_msgs 0.0.0

README

audio_utils

ROS2 nodes and utilities for audio streams.

For ROS1, please see the ros1 branch.

Author(s): Marc-Antoine Maheux

Setup (Ubuntu)

The following subsections explain how to use the library on Ubuntu.

Install Dependencies

sudo apt-get install cmake build-essential gfortran texinfo libasound2-dev libpulse-dev libgfortran-*-dev

Install Python Dependencies

sudo pip install -r requirements.txt

or

sudo pip3 install -r requirements.txt

Setup Submodules

git submodule update --init --recursive

Nodes

capture_node

This node captures the sound from an ALSA or PulseAudio device and publishes it to a topic.

Parameters

  • backend (string): The backend to use (alsa or pulse_audio). The default value is alsa.
  • device (string): The device to capture (ex: hw:CARD=1,DEV=0 or default for ALSA, or alsa_input.usb-IntRoLab_16SoundsUSB_Audio_2.0-00.multichannel-input for PulseAudio). The default value is default.
  • format (string): The audio format ( see audio_utils_msgs/AudioFrame). The default value is signed_16.
  • channel_count (int): The device channel count. The default value is 1.
  • sampling_frequency (int): The device sampling frequency. The default value is 16000.
  • frame_sample_count (int): The number of samples in each frame. The default value is 1024.
  • merge (bool): Indicate to merge the channels or not. The default value is false.
  • gain (double): The gain to apply. The default value is 1.0.
  • latency_us (int): The capture latency in microseconds. The default value is 64000.
  • channel_map (Array of string): The PulseAudio channel mapping. If empty or omitted, the default mapping is used. This parameter must be set only with the PulseAudio backend. The default value is [].
  • queue_size (int): The publisher queue size. The default value is 1.

Published Topics

playback_node

This node captures the sound from a topic and plays it to an ALSA or PulseAudio device.

Parameters

  • backend (string): The backend to use (alsa or pulse_audio). The default value is alsa.
  • device (string): The device to capture (ex: hw:CARD=1,DEV=0 or default for ALSA, or alsa_input.usb-IntRoLab_16SoundsUSB_Audio_2.0-00.multichannel-input for PulseAudio). The default value is default.
  • format (string): The audio format ( see audio_utils_msgs/AudioFrame). The default value is signed_16.
  • channel_count (int): The device channel count. The default value is 1.
  • sampling_frequency (int): The device sampling frequency. The default value is 16000.
  • frame_sample_count (int): The number of samples in each frame. The default value is 1024.
  • latency_us (int): The capture latency in microseconds. The default value is 64000.
  • channel_map (Array of string): The PulseAudio channel mapping. If empty or omitted, the default mapping is used. This parameter must be set only with the PulseAudio backend. The default value is [].
  • queue_size (int): The publisher queue size. The default value is 1.

Subscribed Topics

beat_detector_node

This node estimates the song tempo and detects if the beat is in the current frame.

Parameters

  • sampling_frequency (int): The device sampling frequency. The default value is 44100.
  • frame_sample_count (int): The number of samples in each analysed frame. It must be a multiple of oss_fft_window_size. The default value is 128.
  • oss_fft_window_size (int): The onset strength signal window size. It must be greater than or equal to frame_sample_count. The default value is 1024.
  • flux_hamming_size (int): The flux hamming window size to calculate the onset strength signal. The default value is 15.
  • oss_bpm_window_size (int): The onset strength signal window size to calculate the BPM value. The default value is 1024.
  • min_bpm (double): The minimum valid BPM value. The default value is 50.
  • max_bpm (double): The maximum valid BPM value. The default value is 180.
  • bpm_candidate_count (int): The number of cross-correlations to perform to find the best BPM. The default value is 10.

Subscribed Topics

Published Topics

  • bpm (std_msgs/Float32): The tempo in bpm (beats per minute) for each frame.
  • beat (std_msgs/Bool): Indicate if the beat is in the current frame.

vad_node

This node performs voice activity detection with Silero VAD. The models folder contains the model trained by Silero VAD. The license of the model is MIT.

Parameters

  • silence_to_voice_threshold (double): The threshold to detect voice activity when silence was previously detected. The default value is 0.5.
  • voice_to_silence_threshold (double): The threshold to detect silence when voice activity was previously detected. It must be lower than silence_to_voice_threshold. The default value is 0.4.
  • min_silence_duration_ms (double): The minimum silence duration in ms. The default value is 500.

Subscribed Topics

  • audio_in (audio_utils_msgs/AudioFrame) The sound to analyze. The channel count must be 1. The samply frequency must be 16000 Hz. The frame sample count must be a multiple of 512.

Published Topics

format_conversion_node.py

This node converts the format of an audio topic.

Parameters

Subscribed Topics

Published Topics

resampling_node.py

This node resamples an audio topic.

Parameters

  • input_format (string): The input audio format ( see audio_utils_msgs/AudioFrame).
  • output_format (string): The output audio format ( see audio_utils_msgs/AudioFrame).
  • channel_count (int): The device channel count.
  • input_sampling_frequency (int): The input sampling frequency.
  • output_sampling_frequency (int): The output sampling frequency.
  • input_frame_sample_count (int): The number of samples in each frame of the input.
  • dynamic_input_resampling (bool: default is false): If true, always adjust the input sampling informations ( format, sampling frequency and frame sample count) to the sampling informations of the reveiced frames, dynamically. In this mode, input_format, input_sampling_frequency and input_frame_sample_count are not required, but they can be used to save a recomputation if the starting input sampling informations are known.

Subscribed Topics

Published Topics

split_channel_node.py

This node split a multichannel audio topic into several mono audio topics.

Parameters

Subscribed Topics

Published Topics

raw_file_writer_node.py

This node writes the raw sound data to a file.

Parameters

  • output_path (string): The output file path.

Subscribed Topics

License

Sponsor

IntRoLab

IntRoLab - Intelligent / Interactive / Integrated / Interdisciplinary Robot Lab

CONTRIBUTING

No CONTRIBUTING.md found.

audio_utils repository

audio_utils repository

audio_utils repository

audio_utils repository

audio_utils repository

audio_utils repository

audio_utils repository

audio_utils repository

audio_utils repository

audio_utils repository

audio_utils repository

audio_utils repository

audio_utils repository

audio_utils repository

audio_utils repository