ROS2 + Conda(CUDA/Torch)在 WSL2 上的混用指南

环境与目标

环境信息

  • WSL2 kernel:Linux-6.6.87.2-microsoft-standard-WSL2
  • ROS2:Humble(Ubuntu 22.04)
  • Conda env:audio2exp(torch nightly + CUDA 可用)
  • GPU:NVIDIA GeForce RTX 5080 Laptop GPU,capability 12.0
  • onnxruntime providers:TensorrtExecutionProvider, CUDAExecutionProvider, CPUExecutionProvider

目标拓扑

  • 节点 A:发布音频 /audio/pcm
  • 节点 B:订阅音频并推理,发布 /face/arkit52

核心概念扫盲

ROS_DOMAIN_ID
DDS 的“逻辑隔离号”,同一网络中:

  • Domain 不同:互相完全看不到
  • Domain 相同:才可能发现与通信

RMW_IMPLEMENTATION
ROS2 的中间件抽象层(RMW)选择,决定底层 DDS 实现:

  • Humble 默认:rmw_fastrtps_cpp(FastDDS)
  • 常用替代:rmw_cyclonedds_cpp(CycloneDDS,WSL2 下更稳定)

为什么 ros2 run 在 Conda 里找不到 Torch?
ros2 run 调用的是 workspace 安装目录里的入口脚本,该脚本第一行 shebang 会被构建时的 Python 固定(通常是 /usr/bin/python3)。因此即使激活 Conda 环境,ros2 run 仍会用系统 Python 运行,自然无法导入 Conda 里的 Torch。

踩坑记录与解决方案

问题 1:WSL2 上 ROS2 “能发但收不到 / 发现不到”
定位方法:先测 DDS 组播是否正常:

ros2 multicast receive
ros2 multicast send

解决步骤:

  1. receive 能收到 Hello World!,说明网络层 OK;
  2. 确保双方 ROS_DOMAIN_ID 一致;
  3. 尝试切换 RMW 为 rmw_cyclonedds_cpp(WSL2 + FastDDS 偶发 discovery 兼容问题)。

问题 2:设置 RMW_IMPLEMENTATION=rmw_cyclonedds_cpp 后终端崩溃
典型原因:

  1. 没装 CycloneDDS RMW 包;
  2. Conda 抢了 PYTHONPATH/LD_LIBRARY_PATH,导致 ROS2 Python 包路径异常;
  3. 脚本里 set -e:某个命令非 0 返回会让整个脚本直接退出。

解决:

  • 脚本顺序:先 ROS2,再 Conda,最后补 ROS2 的 PYTHONPATH
  • 先不强行设 RMW,等确认安装与可用后再打开;
  • 调试阶段可暂时去掉 set -e,或包一层 || true

问题 3:自定义 msg 包 colcon build 报错:缺 catkin_pkg / 缺 em
推荐解决路线(最稳):构建用系统 Python

conda deactivate
cd ~/ws_audio2exp
rm -rf build/ install/ log/

unset PYTHON_EXECUTABLE
unset Python3_EXECUTABLE
unset CMAKE_PREFIX_PATH
hash -r

source /opt/ros/humble/setup.bash
colcon build

问题 4:message 包报错:package.xml 需要 member_of_group
解决:在 audio_msgs/package.xml 中加入:

<export>
  <build_type>ament_cmake</build_type>
  <member_of_group>rosidl_interface_packages</member_of_group>
</export>

问题 5:Ctrl+C 退出时报 RCLError(双 shutdown)
解决:用 try_shutdown()

def main():
    rclpy.init()
    node = AudioSub()
    try:
        rclpy.spin(node)
    except KeyboardInterrupt:
        pass
    finally:
        node.destroy_node()
        rclpy.try_shutdown()

问题 6:在 Conda 环境里运行节点,Torch/ONNX Runtime 仍 “No module named”
根因:shebang 固定解释器(构建时用的系统 Python),导致 ros2 run 永远用系统 Python。

最终稳定方案

方案 S:构建用系统 Python;运行推理节点用 conda python -m
构建(一次性)

cd ~/ws_audio2exp
conda deactivate || true
source /opt/ros/humble/setup.bash
colcon build
source install/setup.bash

运行(每次开终端)
终端 A:subscriber / 推理节点

source /opt/ros/humble/setup.bash
cd ~/ws_audio2exp
source install/setup.bash
conda activate audio2exp
python -m audio_demo_nodes.audio_sub

终端 B:publisher

source /opt/ros/humble/setup.bash
cd ~/ws_audio2exp
source install/setup.bash
conda activate audio2exp
python -m audio_demo_nodes.audio_pub_1hz

核心收益

  • ROS2 构建链路不被 Conda 污染;
  • Torch/CUDA/ONNX Runtime 完整可用;
  • 避免 ros2 run 固定系统 Python 的坑。

脚本

#!/usr/bin/env bash
# 用法:source ~/ros2_conda_audio2exp_env.sh

# =========================
# 1) 基础环境
# =========================
ROS_DISTRO="humble"         # humble / jazzy
CONDA_ENV_NAME="audio2exp"  # conda 环境名
WS_DIR="$HOME/ws_audio2exp" # ROS2 工作区
ROS_PY_VER="3.10"

# =========================
# 2) DDS / ROS 参数
# =========================
# 0=不固定RMW(系统默认)  1=固定 CycloneDDS
USE_CYCLONEDDS=1

# Domain ID(同机/同网要一致)
ROS_DOMAIN_ID_VALUE=0

# =========================
# 3) 自检开关
# =========================
# --- 开关:启动后是否做自检输出 ---
ENABLE_SELF_CHECK=1
# --- 开关:自检时是否检测 torch/cuda ---
ENABLE_TORCH_CHECK=1
# --- 开关:自检时是否检测 onnxruntime ---
ENABLE_ONNXRUNTIME_CHECK=1

# =========================
# 内部函数
# =========================
_is_sourced() {
    [[ "${BASH_SOURCE[0]}" != "${0}" ]]
}

_fail() {
    local code="${1:-1}"
    if _is_sourced; then
        return "$code"
    else
        exit "$code"
    fi
}

_log() {
    echo "[ROS2+Conda] $*"
}

_warn() {
    echo "[ROS2+Conda][WARN] $*" >&2
}

# =========================
# 4) source ROS2
# =========================
ROS_SETUP="/opt/ros/${ROS_DISTRO}/setup.bash"
if [[ ! -f "$ROS_SETUP" ]]; then
    _warn "找不到 ROS2 环境脚本: $ROS_SETUP"
    _fail 1
fi
# shellcheck disable=SC1090
source "$ROS_SETUP" || _fail 1

# =========================
# 5) source conda + activate env
# =========================
CONDA_SH="$HOME/miniconda3/etc/profile.d/conda.sh"
if [[ ! -f "$CONDA_SH" ]]; then
    _warn "找不到 conda.sh: $CONDA_SH"
    _fail 1
fi
# shellcheck disable=SC1090
source "$CONDA_SH" || _fail 1
conda activate "$CONDA_ENV_NAME" || _fail 1

# =========================
# 6) source workspace(如果已 build)
# =========================
WS_SETUP="${WS_DIR}/install/setup.bash"
if [[ -f "$WS_SETUP" ]]; then
    # shellcheck disable=SC1090
    source "$WS_SETUP" || _fail 1
else
    _warn "未找到工作区 setup: $WS_SETUP(如果还没 colcon build,这条可忽略)"
fi

# =========================
# 7) PYTHONPATH 兜底(避免 conda 覆盖 rclpy)
# =========================
ROS_PY_PATH="/opt/ros/${ROS_DISTRO}/lib/python${ROS_PY_VER}/site-packages"
if [[ -d "$ROS_PY_PATH" ]]; then
    export PYTHONPATH="${ROS_PY_PATH}:${PYTHONPATH}"
else
    _warn "ROS Python 路径不存在: $ROS_PY_PATH(检查 ROS_PY_VER)"
fi

# =========================
# 8) 设置 ROS_DOMAIN_ID / RMW
# =========================
export ROS_DOMAIN_ID="${ROS_DOMAIN_ID_VALUE}"

if [[ "$USE_CYCLONEDDS" == "1" ]]; then
    export RMW_IMPLEMENTATION="rmw_cyclonedds_cpp"
else
    # 不固定RMW,清掉让系统默认生效
    unset RMW_IMPLEMENTATION
fi

# =========================
# 9) 自检(可选)
# =========================
if [[ "$ENABLE_SELF_CHECK" == "1" ]]; then
    export ENABLE_TORCH_CHECK
    export ENABLE_ONNXRUNTIME_CHECK

    _log "Environment ready"
    python - << 'PY'
import os, sys, platform

print("==== Runtime Self Check ====")
print("sys.executable:", sys.executable)
print("python:", sys.version.replace("\n", " "))
print("platform:", platform.platform())
print("cwd:", os.getcwd())
print("CONDA_PREFIX:", os.getenv("CONDA_PREFIX"))
print("ROS_DOMAIN_ID:", os.getenv("ROS_DOMAIN_ID"))
print("RMW_IMPLEMENTATION:", os.getenv("RMW_IMPLEMENTATION"))
print("PYTHONPATH:", os.getenv("PYTHONPATH", ""))

try:
    import rclpy
    print("rclpy: OK")
except Exception as e:
    print("rclpy: FAIL", repr(e))

if os.getenv("ENABLE_TORCH_CHECK", "1") == "1":
    try:
        import torch
        print("torch:", torch.__version__)
        print("torch.cuda.is_available:", torch.cuda.is_available())
        if torch.cuda.is_available():
            print("torch.version.cuda:", torch.version.cuda)
            print("torch.backends.cudnn.enabled:", torch.backends.cudnn.enabled)
            try:
                print("torch.backends.cudnn.version:", torch.backends.cudnn.version())
            except Exception:
                pass
            try:
                print("cuda device[0]:", torch.cuda.get_device_name(0))
                print("capability:", torch.cuda.get_device_capability(0))
                print("total_mem(MB):", round(torch.cuda.get_device_properties(0).total_memory / 1024 / 1024, 1))
                a = torch.randn((256, 256), device="cuda")
                b = torch.randn((256, 256), device="cuda")
                c = a @ b
                _ = c.mean().item()
                print("torch CUDA matmul: OK")
            except Exception as e:
                print("torch CUDA runtime test: FAIL", repr(e))
    except Exception as e:
        print("torch: FAIL", repr(e))
else:
    print("torch check: SKIPPED")

if os.getenv("ENABLE_ONNXRUNTIME_CHECK", "1") == "1":
    try:
        import onnxruntime as ort
        print("onnxruntime:", ort.__version__)
        print("onnxruntime providers:", ort.get_available_providers())
    except Exception as e:
        print("onnxruntime: FAIL", repr(e))
else:
    print("onnxruntime check: SKIPPED")

print("==== Self Check Done ====")
PY
fi

# 1) 先确保 conda 命令可用
_conda_sh="$HOME/miniconda3/etc/profile.d/conda.sh"
if [[ -f "$_conda_sh" ]]; then
    # shellcheck disable=SC1090
    source "$_conda_sh" || return 1
else
    echo "[a2eenv][ERR] missing: $_conda_sh" >&2
    return 1
fi

# 2) 激活 conda 环境(参数优先)
_env="${1:-$CONDA_ENV_NAME}"
conda activate "$_env" || return 1

可选方案

路线 1:让系统 Python “看到” Conda 的 site-packages(不推荐但可用)
在环境脚本中适当配置 PYTHONPATH,但可能导致系统 Python 与 Conda 环境冲突。

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐