OpenShift AI - 用 Hardware profiles 为运行环境分配可用的硬件规格

说明：本文已经在 OpenShift 4.19 + OpenShift AI 2.29 的环境中验证。

dawnsky.liu

970人浏览 · 2025-08-01 17:08:44

dawnsky.liu · 2025-08-01 17:08:44 发布

《OpenShift / RHEL / DevSecOps 汇总目录》
说明：本文已经在 OpenShift 4.19 + OpenShift AI 2.22 的环境中验证

文章目录

启用 Hardware profiles 功能
创建 Hardware profile
使用 Hardware profile
将原有的 Accelerator profiles 迁移至 Hardware profiles
参考

启用 Hardware profiles 功能

截止到 OpenShift AI 2.22，Hardware profiles 虽然还处于 Technology Preview 阶段，但因为原有的 Accelerator profiles 功能即将被淘汰，因此本文就介绍功能更强的 Hardware profiles。

执行命令编辑 OdhDashboardConfig 对象，添加 disableHardwareProfiles: false 一行即可启用 Hardware profiles 功能。

$ oc edit OdhDashboardConfig odh-dashboard-config
...
spec:
  dashboardConfig:
    disableAcceleratorProfiles: false
    disableBYONImageStream: false
    disableClusterManager: false
    disableCustomServingRuntimes: false
    disableDistributedWorkloads: false
    disableHardwareProfiles: false ### add this line and save ###
...

完成后可进入 OpenShift AI 控制台的 Settings -> Hardware profiles 菜单。注意：Hardware profiles 菜单出现后，原有 Accelerator profiles 菜单就消失了。

创建 Hardware profile

在 Hardware profiles 页面中点击 Create new hardware profile 按钮。
在 Create hardware profile 页面中，将 Name 设为 Small with L4，即带有 NVIDIA L4 的小型运行环境。
可以将 Visibility 设为只对 Workbenchs 可见。
在 Resource requests and limits 部分通过 Add resource 按钮添加一个 Accelerator 资源类型，并设置每种资源的缺省和最大最小用量。
在 Node selectors 部分点击 Add node selector 按钮，根据配有 GPU 的节点的 nvidia.com/gpu.product: NVIDIA-L4 标签添加一个 selector。
最后点击 Create hardware profile 即可。

使用 Hardware profile

在 Workbench 的配置页面中选择 Small with L4 的 Hardware profile，并且可以进一步调整 CPU、内存和 GPU 的使用数量。
确认 Workbench 可以运行。

将原有的 Accelerator profiles 迁移至 Hardware profiles

OpenShift AI 原先使用 OdhDashboardConfig 对象保存 Workbench notebook 和 ModelServing 可用的 CPU 和内存规格。

$ oc get OdhDashboardConfigs odh-dashboard-config
...
spec:
  modelServerSizes:
    - name: Small
      resources:
        limits:
          cpu: '2'
          memory: 8Gi
        requests:
          cpu: '1'
          memory: 4Gi
    - name: Medium
      resources:
        limits:
          cpu: '8'
          memory: 10Gi
        requests:
          cpu: '4'
          memory: 8Gi
    - name: Large
      resources:
        limits:
          cpu: '10'
          memory: 20Gi
        requests:
          cpu: '6'
          memory: 16Gi
  notebookSizes:
    - name: Small
      resources:
        limits:
          cpu: '2'
          memory: 8Gi
        requests:
          cpu: '1'
          memory: 8Gi
    - name: Medium
      resources:
        limits:
          cpu: '6'
          memory: 24Gi
        requests:
          cpu: '3'
          memory: 24Gi
    - name: Large
      resources:
        limits:
          cpu: '14'
          memory: 56Gi
        requests:
          cpu: '7'
          memory: 56Gi
    - name: X Large
      resources:
        limits:
          cpu: '30'
          memory: 120Gi
        requests:
          cpu: '15'
          memory: 120Gi
...

而 GPU 和节点的对应关系是在 Accelerator profiles 中定义的。
在这里插入图片描述

当为 OpenShift AI 启用新的 Hardware profiles 功能后，在该功能的 Hide legacy profiles 区域可以找到当前在 OdhDashboardConfig 和 Accelerator profiles 定义的配置。这些配置可通过下图 Migrate 菜单迁移到新的 Hardware profiles 中。
在这里插入图片描述

参考

https://ai-on-openshift.io/odh-rhoai/configuration/
https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.22/html/working_with_accelerators/working-with-hardware-profiles_accelerators
https://medium.com/@roeywer/optimized-users-workload-resources-with-openshift-ai-hardware-profiles-22efc018ef9d
https://github.com/rh-aiservices-bu/accelerator-profiles-guide/tree/main
https://blog.csdn.net/weixin_43220532/article/details/111051773