![]() |
lightnet_trt package from lightnet-trt repolightnet_trt |
Package Summary
Tags | No category tags. |
Version | 0.0.0 |
License | Apache License 2.0 |
Build type | AMENT_CMAKE |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://github.com/daniel89710/lightnet-trt.git |
VCS Type | git |
VCS Version | main |
Last Updated | 2023-10-03 |
Dev Status | UNMAINTAINED |
CI status | No Continuous Integration |
Released | UNRELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Koji Minoda
Authors
- Koji Minoda
LightNet-TRT: High-Efficiency and Real-Time CNN Implementation on Edge AI
LightNet-TRT is a CNN implementation optimized for edge AI devices that combines the advantages of LightNet [1] and TensorRT [2]. LightNet is a lightweight and high-performance neural network framework designed for edge devices, while TensorRT is a high-performance deep learning inference engine developed by NVIDIA for optimizing and running deep learning models on GPUs. LightNet-TRT uses the Network Definition API provided by TensorRT to integrate LightNet into TensorRT, allowing it to run efficiently and in real-time on edge devices.
Key Improvements
2:4 Structured Sparsity
LightNet-TRT utilizes 2:4 structured sparsity [3] to further optimize the network. 2:4 structured sparsity means that two values must be zero in each contiguous block of four values, resulting in a 50% reduction in the number of weights. This technique allows the network to use fewer weights and computations while maintaining accuracy.
NVDLA Execution
LightNet-TRT also supports the execution of the neural network on the NVIDIA Deep Learning Accelerator (NVDLA) [4] , a free and open architecture that provides high performance and low power consumption for deep learning inference on edge devices. By using NVDLA, LightNet-TRT can further improve the efficiency and performance of the network on edge devices.
Multi-Precision Quantization
In addition to post training quantization [5], LightNet-TRT also supports multi-precision quantization, which allows the network to use different precision for weights and activations. By using mixed precision, LightNet-TRT can further reduce the memory usage and computational requirements of the network while still maintaining accuracy. By writing it in CFG, you can set the precision for each layer of your CNN.
Multitask Execution (Detection/Segmentation)
LightNet-TRT also supports multitask execution, allowing the network to perform both object detection and segmentation tasks simultaneously. This enables the network to perform multiple tasks efficiently on edge devices, saving computational resources and power.
Installation
Requirements
- CUDA 11.0 or higher
- TensorRT 8.0 or higher
- OpenCV 3.0 or higher
Steps
- Clone the repository.
$ git clone https://github.com/daniel89710/lightNet-TRT.git
$ cd lightNet-TRT
- Install libraries.
$ sudo apt update
$ sudo apt install libgflags-dev
$ sudo apt install libboost-all-dev
- Compile the TensorRT implementation.
$ mkdir build
$ cmake ../
$ make -j
Model
| Model | Resolutions | GFLOPS | Params | Precision | Sparsity | DNN time on RTX3080 | DNN time on Jetson Orin NX 16GB GPU | DNN time on Jetson Orin NX 16GB DLA| DNN time on Jetson Orin Nano 4GB GPU | cfg | weights | |—|—|—|—|—|—|—|—|—|—|—|—| | lightNet | 1280x960 | 58.01 | 9.0M | int8 | 49.8% | 1.30ms | 7.6ms | 14.2ms | 14.9ms | github |GoogleDrive | | LightNet+Semseg | 1280x960 | 76.61 | 9.7M | int8 | 49.8% | 2.06ms | 15.3ms | 23.2ms | 26.2ms | github | GoogleDrive| | lightNet pruning | 1280x960 | 35.03 | 4.3M | int8 | 49.8% | 1.21ms | 8.89ms | 11.68ms | 13.75ms | github |GoogleDrive | | LightNet pruning +SemsegLight | 1280x960 | 44.35 | 4.9M | int8 | 49.8% | 1.80ms | 9.89ms | 15.26ms | 23.35ms | github | GoogleDrive|
- “DNN time” refers to the time measured by IProfiler during the enqueueV2 operation, excluding pre-process and post-process times.
- Orin NX has three independent AI processors, allowing lightNet to be parallelized across a GPU and two DLAs.
- Orin Nano 4GB has only iGPU with 512 CUDA cores.
Usage
Converting a LightNet model to a TensorRT engine
Build FP32 engine
$ ./lightNet-TRT --flagfile ../configs/lightNet-BDD100K-det-semaseg-1280x960.txt --precision kFLOAT
Build FP16(HALF) engine
$ ./lightNet-TRT --flagfile ../configs/lightNet-BDD100K-det-semaseg-1280x960.txt --precision kHALF
Build INT8 engine (You need to prepare a list for calibration in “configs/calibration_images.txt”.)
$ ./lightNet-TRT --flagfile ../configs/lightNet-BDD100K-det-semaseg-1280x960.txt --precision kINT8
Build DLA engine (Supported by only Xavier and Orin)
$ ./lightNet-TRT --flagfile ../configs/lightNet-BDD100K-det-semaseg-1280x960.txt --precision kINT8 --dla [0/1]
Inference with the TensorRT engine
Inference from images
$ ./lightNet-TRT --flagfile ../configs/lightNet-BDD100K-det-semaseg-1280x960.txt --precision [kFLOAT/kHALF/kINT8] {--dla [0/1]} --d DIRECTORY
Inference from images
$ ./lightNet-TRT --flagfile ../configs/lightNet-BDD100K-det-semaseg-1280x960.txt --precision [kFLOAT/kHALF/kINT8] {--dla [0/1]} --d VIDEO
Inference with ROS 2 node
You can build as follows. Note that current implementation requires Autoware installed and sourced.
mkdir lightnet_trt_ws/src -p
cd lightnet_trt_ws/src
git clone https://github.com/kminoda/lightNet-TRT-ROS2.git
cd ..
colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release -DBUILD_ROS2_LIGHTNET_TRT=ON
Then, you can run the node as follows.
ros2 launch lightnet_trt lightnet_trt.launch.xml
Implementation
LightNet-TRT is built on the LightNet framework and integrates with TensorRT using the Network Definition API. The implementation is based on the following repositories:
- LightNet: https://github.com/daniel89710/lightNet
- TensorRT: https://github.com/NVIDIA/TensorRT
- NVIDIA DeepStream SDK: https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/tree/restructure
- YOLO-TensorRT: https://github.com/enazoe/yolo-tensorrt
Conclusion
LightNet-TRT is a powerful and efficient implementation of CNNs using Edge AI. With its advanced features and integration with TensorRT, it is an excellent choice for real-time object detection and semantic segmentation applications on edge devices.
References
[1]. LightNet
[2]. TensorRT
[3]. Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT
[4]. NVDLA
[5]. Achieving FP32 Accuracy for INT8 Inference Using Quantization Aware Training with NVIDIA TensorRT
Wiki Tutorials
Package Dependencies
Deps | Name |
---|---|
ament_cmake_auto | |
autoware_cmake | |
ament_cmake_ros | |
ament_lint_auto | |
autoware_lint_common | |
sensor_msgs | |
rclcpp | |
cv_bridge | |
image_transport | |
rclcpp_components | |
tier4_autoware_utils | |
tf2_eigen |
System Dependencies
Dependant Packages
Launch files
- launch/lightnet_trt.launch.xml
-
- input/image [default: /sensing/camera/camera0/image_rect_color]
- input/camera_info [default: /sensing/camera/camera0/camera_info]
- output/objects [default: /perception/object_recognition/detection/rois0]
- model_cfg [default: $(find-pkg-share lightnet_trt)/configs/yolov8x-BDD100K-640x640-T4-lane4cls.cfg]
- model_weights [default: $(find-pkg-share lightnet_trt)/configs/yolov8x-BDD100K-640x640-T4-lane4cls-scratch-pseudo-finetune_last.weights]
- width [default: 640]
- height [default: 640]