UMI（Universal Manipulation Interface）复现过程问题及解决方法记录

我们在复现UMI的过程中遇到了一系列问题，解决后在此记录问题及解决方案，以方便其他想要复现的朋友。

zeros1703

1945人浏览 · 2025-01-17 18:23:02

zeros1703 · 2025-01-17 18:23:02 发布

UMI（Universal Manipulation Interface），是斯坦福、哥伦比亚大学、丰田研究院共同提出的用于机器人操作的通用数据采集接口。项目开源了数据采集装置的结构件3D图纸、采购清单、组装指南、数据采集指南、数据集制作、训练和部署的代码（其中模仿学习的策略使用的是Diffusion Policy）。我们在复现的过程中遇到了一系列问题，解决后在此记录问题及解决方案，以方便其他想要复现的朋友。

0.采购

1.3M的TB641在国内很难买到，可以使用其他类型的防滑胶带。随便买的磨砂防滑胶带用起来就还可以。

2.GoPro：注意可能需要采购GoPro的媒体组件，尽管在网上看到可以直接连接的教程，但是我们只用GoPro 9通过USB-C连接到电脑没试成功。（更新：又试了另一台GoPro，用gopro_as_webcam_on_linux仓库成功了，表现是GoPro显示为webcam模式。）

1.数据采集

1.1.Tag打印质量的影响

使用普通打印纸的Tag和一般的打印机，喷墨有很大问题，导致打印出的Tag黑色区域有非常多白色噪点，效果不佳。后续改为在打印店打印带背胶直接粘贴的Tag，效果好了很多。

1.2.盖板的影响

我们使用的原版UMI盖板和FastUMI的盖板都进行了尝试，结果发现用FastUMI的盖板采集的数据失败了，猜想是因为后者的Tag距离相机太近，畸变太大导致没有很好识别到。

后续FastUMI也给出了新版的图片，在里面看到Tag往指尖的方向移动了。

1.3.SLAM失败的可能原因

1.背景缺乏纹理。

2.运动过快。

2.训练过程

2.1.HuggingFace版本问题

ImportError: cannot import name 'cached_download' from 'huggingface_hub'

原因：huggingface 0.26.0移除了cached_download，需要降级

解决：

pip install huggingface_hub==0.25.2

2.2.HuggingFace连接问题

SSLError(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443)

原因：连不上huggingface

解决：

export HF_ENDPOINT=https://hf-mirror.com

2.3.wandb注册登录

注意需要注册登录wandb账号，获得key后填到对应位置。

3.部署

3.1.安装

安装时注意机械臂末端TCP坐标朝向与GoPro相机坐标系朝向保持一致。

1.相机：手持相机拍摄，右侧为x正方向，下方为y正方向，前方为z正方向；

2.TCP：面朝机械臂工具端法兰，圆柱形连接器的方位为y负方向，机械臂末端法兰向外为z正方向。

因此，相机的上方应与机械臂工具端法兰连接器的方向保持一致。

3.2.无法读取相机数据

Unable to read camera feed from GoPro · Issue #58 · real-stanford/universal_manipulation_interface · GitHub中提到了这个问题。解决方案如下：

将umi/real_world/uvc_camera.py做如下修改：

examples['camera_capture_timestamp'] = 0.0
examples['camera_receive_timestamp'] = 0.0
examples['timestamp'] = 0.0
examples['step_idx'] = 0
 
vis_ring_buffer = SharedMemoryRingBuffer.create_from_examples(
    shm_manager=shm_manager,
    examples=examples if vis_transform is None
        else vis_transform(dict(examples)),
    get_max_k=1,
    get_time_budget=0.2,
    put_desired_frequency=capture_fps
)

改为：

examples['camera_capture_timestamp'] = 0.0
examples['camera_receive_timestamp'] = 0.0
examples['timestamp'] = 0.0
examples['step_idx'] = 0
 
vis_examples = copy.deepcopy(examples)
 
if vis_transform is not None:
    vis_shape = (720, 960, 3)
else:
    vis_shape = shape+(3,)
vis_examples['color'] = np.empty(shape=vis_shape, dtype=np.uint8)
vis_ring_buffer = SharedMemoryRingBuffer.create_from_examples(
    shm_manager=shm_manager,
    examples=vis_examples,
    get_max_k=1,
    get_time_budget=0.2,
    put_desired_frequency=capture_fps
)

3.3.启动等待

启动时候start_wait()会等待GoPro、机械臂、夹爪和屏幕可视化初始化完成，任何一个start_wait()没有通过都无法开始运行。

3.4.机械臂连接问题

ur_rtde-issue-235

主机与机械臂的网络连接不佳会导致Exception: End of file问题，在出现问题后，实际运行时会获取不到正确的数据，导致在对齐环境观测时一直使用旧的数据帧，进而导致机器人原地抽搐。建议使用可靠的有线网络而非WiFi与机器人进行通讯。

3.5.夹爪

我们的Robotiq夹爪在部署过程中意外弄坏送修了，WSG50太贵了，选用了大寰PGI-140-80夹爪。

此处注意大寰PGI-140-80夹爪的最大行程为80mm，因此寄存器读写位置（占最大位置的千分比数值，int）都应进行换算。

选用夹爪后，由于我们使用的是485接口，所以修改了wsg_binary_driver.py，用python的serial包进行数据收发。

3.6.数据传输延迟（Latency）标定

确保上面的其他硬件没有问题后，使用官方仓库scripts下的calibrate_xx_latency.py代码进行标定。标定后填写到example/eval_robots_config.yaml中。

我们得到的数据是UR5 Latency为0.023，PGI Latency为0.05

关于Latency对齐的机制，后续会专门写一篇来讲。

3.7.屏蔽SpaceMouse及跳出human control loop

由于我们没有space mouse，也懒得买，所以注释了eval_real.py的以下内容：

        # with Spacemouse(shm_manager=shm_manager) as sm

和以下内容：

                    # # get teleop command
                    # sm_state = sm.get_motion_state_transformed()
                    # # print(sm_state)
                    # dpos = sm_state[:3] * (0.5 / frequency)
                    # drot_xyz = sm_state[3:] * (1.5 / frequency)

                    # drot = st.Rotation.from_euler('xyz', drot_xyz)
                    # for robot_idx in control_robot_idx_list:
                    #     target_pose[robot_idx, :3] += dpos
                    #     target_pose[robot_idx, 3:] = (drot * st.Rotation.from_rotvec(
                    #         target_pose[robot_idx, 3:])).as_rotvec()

                    # dpos = 0
                    # if sm.is_button_pressed(0):
                    #     # close gripper
                    #     dpos = -gripper_speed / frequency
                    # if sm.is_button_pressed(1):
                    #     dpos = gripper_speed / frequency
                    # for robot_idx in control_robot_idx_list:
                    #     gripper_target_pos[robot_idx] = np.clip(gripper_target_pos[robot_idx] + dpos, 0, max_gripper_width)

                    # # solve collision with table
                    # for robot_idx in control_robot_idx_list:
                    #     solve_table_collision(
                    #         ee_pose=target_pose[robot_idx],
                    #         gripper_width=gripper_target_pos[robot_idx],
                    #         height_threshold=robots_config[robot_idx]['height_threshold'])
                    
                    # # solve collison between two robots
                    # solve_sphere_collision(
                    #     ee_poses=target_pose,
                    #     robots_config=robots_config
                    # )

                    # action = np.zeros((7 * target_pose.shape[0],))

                    # for robot_idx in range(target_pose.shape[0]):
                    #     action[7 * robot_idx + 0: 7 * robot_idx + 6] = target_pose[robot_idx]
                    #     action[7 * robot_idx + 6] = gripper_target_pos[robot_idx]


                    # # execute teleop command
                    # env.exec_actions(
                    #     actions=[action], 
                    #     timestamps=[t_command_target-time.monotonic()+time.time()],
                    #     compensate_latency=False)
                    # precise_wait(t_cycle_end)
                    # iter_idx += 1

并且把for key_stroke in press_events:前的start_policy = False改为True，跳出human control loop，进入Policy control loop。