基于Aarch64架构的OpenEuler系统在昇腾910B模型离线部署方案
本文详细记录了在openEuler 22.03 LTS操作系统上安装和配置Ascend 910B NPU的完整流程。主要包括:1)驱动、固件和MCU的安装与升级步骤;2)创建逻辑卷并挂载存储;3)配置Docker及Docker Compose环境;4)部署两种大模型推理服务(vLLM-Ascend和MindIE)的具体方法。文档提供了详细的命令行操作和配置文件内容,涵盖了从硬件驱动安装到AI模型部
一、驱动、固件安装及MCU升级
1.查询服务器的操作系统架构及版本
uname -m && cat /etc/*release
aarch64
openEuler release 22.03 LTS
NAME="openEuler"
VERSION="22.03 LTS"
ID="openEuler"
VERSION_ID="22.03"
PRETTY_NAME="openEuler 22.03 LTS"
ANSI_COLOR="0;31"
openEuler release 22.03 LTS
2.查询 CPU 信息
lscpu
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Vendor ID: HiSilicon
BIOS Vendor ID: HiSilicon
Model name: Kunpeng-920
BIOS Model name: HUAWEI Kunpeng 920 5250
Model: 0
Thread(s) per core: 1
Core(s) per socket: 48
Socket(s): 4
Stepping: 0x1
Frequency boost: disabled
CPU max MHz: 2600.0000
CPU min MHz: 200.0000
BogoMIPS: 200.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
Caches (sum of all):
L1d: 12 MiB (192 instances)
L1i: 12 MiB (192 instances)
L2: 96 MiB (192 instances)
L3: 192 MiB (8 instances)
NUMA:
NUMA node(s): 8
NUMA node0 CPU(s): 0-23
NUMA node1 CPU(s): 24-47
NUMA node2 CPU(s): 48-71
NUMA node3 CPU(s): 72-95
NUMA node4 CPU(s): 96-119
NUMA node5 CPU(s): 120-143
NUMA node6 CPU(s): 144-167
NUMA node7 CPU(s): 168-191
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; __user pointer sanitization
Spectre v2: Not affected
Srbds: Not affected
Tsx async abort: Not affected
3.下载软件包
获取地址:魔乐社区
驱动:Ascend-hdk-910b-npu-driver_25.0.rc1.1_linux-aarch64.run
固件:Ascend-hdk-910b-npu-firmware_7.7.0.1.231.run
MCU:Ascend-hdk-910b-mcu_25.50.10.zip
4.创建运行用户
概念解析
- 安装用户为安装驱动和固件所使用的用户
- 运行用户为驱动固件安装完成后,后续运行推理或训练业务时启动运行驱动和固件的用户
注意:
- 如果创建的用户和用户组是HwHiAiUser,安装软件包时无需指定运行用户,默认即为HwHiAiUser。
- 如果创建的用户和用户组是非HwHiAiUser(含root),安装软件包时必须指定运行用户(通过--install-username=username --install-usergroup=usergroup参数指定)。因此如果对运行用户名称没有特殊要求,建议使用HwHiAiUser。
创建用户
groupadd usergroup
useradd -g usergroup -d /home/username -m username -s /bin/bash
5.安装
检查环境
lsmod | grep drv_pcie_host
检查芯片
lspci | grep d802
5.1驱动安装
赋权
chmod +x Ascend-hdk-910b-npu-driver_25.0.rc1.1_linux-aarch64.run
校验安装包一致性和完整性
./Ascend-hdk-910b-npu-driver_25.0.rc1.1_linux-aarch64.run --check
Makeself logfile: /root/log/makeself/makeself.log
Verifying archive integrity... 100% SHA256 checksums are OK. All good.
Verifying archive integrity... 100% SHA256 checksums are OK. All good.
Uncompressing ASCEND DRIVER RUN PACKAGE 100%
[Driver] [2025-07-17 10:31:56] [INFO]Start time: 2025-07-17 10:31:56
[Driver] [2025-07-17 10:31:56] [INFO]LogFile: /var/log/ascend_seclog/ascend_install.log
[Driver] [2025-07-17 10:31:56] [INFO]OperationLogFile: /var/log/ascend_seclog/operation.log
[Driver] [2025-07-17 10:31:56] [INFO]End time: 2025-07-17 10:31:56
执行安装
默认安装路径/usr/local/Ascend
./Ascend-hdk-910b-npu-driver_25.0.rc1.1_linux-aarch64.run --full --install-username=root --install-usergroup=root --install-for-all
Verifying archive integrity... 100% SHA256 checksums are OK. All good.
Uncompressing ASCEND DRIVER RUN PACKAGE 100%
[Driver] [2025-07-17 10:35:21] [INFO]Start time: 2025-07-17 10:35:21
[Driver] [2025-07-17 10:35:21] [INFO]LogFile: /var/log/ascend_seclog/ascend_install.log
[Driver] [2025-07-17 10:35:21] [INFO]OperationLogFile: /var/log/ascend_seclog/operation.log
[Driver] [2025-07-17 10:35:21] [WARNING]Do not power off or restart the system during the installation/upgrade
[Driver] [2025-07-17 10:35:21] [INFO]set username and usergroup, root:root
[Driver] [2025-07-17 10:35:22] [INFO]driver install type: DKMS
[Driver] [2025-07-17 10:35:22] [INFO]upgradePercentage:10%
[Driver] [2025-07-17 10:35:30] [INFO]upgradePercentage:30%
[Driver] [2025-07-17 10:35:30] [INFO]upgradePercentage:40%
[Driver] [2025-07-17 10:35:44] [INFO]upgradePercentage:90%
[Driver] [2025-07-17 10:35:45] [INFO]Waiting for device startup...
[Driver] [2025-07-17 10:35:49] [INFO]Device startup success
[Driver] [2025-07-17 10:36:06] [INFO]upgradePercentage:100%
[Driver] [2025-07-17 10:36:24] [INFO]Driver package installed successfully! The new version takes effect immediately.
[Driver] [2025-07-17 10:36:24] [INFO]End time: 2025-07-17 10:36:24
注意:
root用户安装驱动时,需使用--install-username=root --install-usergroup=root --install-for-all参数。
查看驱动加载
npu-smi info
5.2固件安装
赋权
chmod +x Ascend-hdk-910b-npu-firmware_7.7.0.1.231.run
校验安装包一致性和完整性
./Ascend-hdk-910b-npu-firmware_7.7.0.1.231.run --check
Makeself logfile: /root/log/makeself/makeself.log
Verifying archive integrity... 100% SHA256 checksums are OK. All good.
Verifying archive integrity... 100% SHA256 checksums are OK. All good.
Uncompressing ASCEND-HDK-910B-NPU FIRMWARE RUN PACKAGE 100%
[Firmware] [2025-07-17 10:42:55] [INFO]Start time: 2025-07-17 10:42:55
[Firmware] [2025-07-17 10:42:55] [INFO]LogFile: /var/log/ascend_seclog/ascend_install.log
[Firmware] [2025-07-17 10:42:55] [INFO]OperationLogFile: /var/log/ascend_seclog/operation.log
[Firmware] [2025-07-17 10:42:56] [INFO]End time: 2025-07-17 10:42:56
执行安装
./Ascend-hdk-910b-npu-firmware_7.7.0.1.231.run --full
Verifying archive integrity... 100% SHA256 checksums are OK. All good.
Uncompressing ASCEND-HDK-910B-NPU FIRMWARE RUN PACKAGE 100%
[Firmware] [2025-07-17 10:43:18] [INFO]Start time: 2025-07-17 10:43:18
[Firmware] [2025-07-17 10:43:18] [INFO]LogFile: /var/log/ascend_seclog/ascend_install.log
[Firmware] [2025-07-17 10:43:18] [INFO]OperationLogFile: /var/log/ascend_seclog/operation.log
[Firmware] [2025-07-17 10:43:18] [WARNING]Do not power off or restart the system during the installation/upgrade
[Firmware] [2025-07-17 10:43:20] [INFO]upgradePercentage: 0%
[Firmware] [2025-07-17 10:43:32] [INFO]upgradePercentage: 90%
[Firmware] [2025-07-17 10:43:42] [INFO]upgradePercentage: 90%
[Firmware] [2025-07-17 10:43:52] [INFO]upgradePercentage: 90%
[Firmware] [2025-07-17 10:43:53] [INFO]upgradePercentage: 100%
[Firmware] [2025-07-17 10:43:53] [INFO]The firmware of [8] chips are successfully upgraded.
[Firmware] [2025-07-17 10:43:54] [INFO]Firmware package installed successfully! Reboot now or after driver installation for the installation/upgrade to take effect.
[Firmware] [2025-07-17 10:43:54] [INFO]End time: 2025-07-17 10:43:54
5.3升级MCU
解压
unzip Ascend-hdk-910b-mcu_25.50.10.zip
Archive: Ascend-hdk-910b-mcu_25.50.10.zip
inflating: Ascend-hdk-910b-mcu_25.50.10.hpm
inflating: Ascend-hdk-910b-mcu_25.50.10.hpm.cms
inflating: crldata.crl
inflating: version.xml
inflating: version.xml.cms
查看所有设备映射信息
npu-smi info -m
显示所有设备的拓扑信息
npu-smi info -l
查询MCU版本号
npu-smi upgrade -b mcu -i 0
升级指定NPU的MCU
npu-smi upgrade -t mcu -i 0 -f Ascend-hdk-910b-mcu_25.50.10.hpm
[WARNING]: Do not power off or restart the system during the upgrade.
Validity : success
file_len(554991)--offset(554991) [100].
transfile : successfully
Status : start to upgrade
Start upgrade [100].
Status : OK
Message : Start device upgrade successfully
Message : need active mcu
新版本生效
npu-smi upgrade -a mcu -i 0
Status : OK
Message : The upgrade has taken effect after performed reboot successfully.
查询MCU版本号
npu-smi upgrade -b mcu -i 0
Version : 25.50.10
MCU升级方式
单个 NPU_ID
./upgrade_mcu.sh 3
./upgrade_mcu.sh 5 Ascend-hdk-910b-mcu_25.50.10.hpm
多个 NPU_ID
./upgrade_mcu.sh 2,4,6
./upgrade_mcu.sh 1,3 Ascend-hdk-910b-mcu_25.50.10.hpm
升级全部NPU_ID
./upgrade_mcu.sh all
./upgrade_mcu.sh all Ascend-hdk-910b-mcu_25.50.10.hpm
保存
upgrade_mcu.sh
chmod +x upgrade_mcu.sh
upgrade_mcu.sh配置文件
#!/bin/bash
# upgrade_mcu.sh <NPU_ID|ID1,ID2,...|all> [Ascend-hdk-xxx-mcu_Y.hpm]
usage() {
echo "Usage:"
echo " Single: $0 <0-7> [fw.hpm]"
echo " Multi: $0 <id1,id2,...> [fw.hpm]"
echo " All: $0 all [fw.hpm]"
exit 1
}
#---------- 参数校验 ----------
[[ $# -eq 0 ]] && usage
# 解析 NPU_ID 列表
case "$1" in
all|ALL)
NPU_LIST=(0 1 2 3 4 5 6 7)
;;
*)
# 尝试按逗号分割
IFS=',' read -ra NPU_LIST <<< "$1"
for id in "${NPU_LIST[@]}"; do
[[ "$id" =~ ^[0-7]$ ]] || usage
done
;;
esac
# 固件文件
FW_FILE=${2:-"Ascend-hdk-910b-mcu_25.50.10.hpm"}
#---------- 循环升级 ----------
for NPU_ID in "${NPU_LIST[@]}"; do
echo "=== MCU upgrade for NPU_ID $NPU_ID ==="
echo "Firmware: $FW_FILE"
npu-smi upgrade -b mcu -i "$NPU_ID"
npu-smi upgrade -t mcu -i "$NPU_ID" -f "$FW_FILE"
npu-smi upgrade -a mcu -i "$NPU_ID"
npu-smi upgrade -b mcu -i "$NPU_ID"
echo "NPU_ID $NPU_ID done."
echo
done
echo "All requested NPUs upgraded."
二、创建逻辑卷
1.查看磁盘信息
lsblk
2.创建物理卷pv
for d in /dev/nvme{0..3}n1; do
pvcreate "$d"
done
Physical volume "/dev/nvme0n1" successfully created.
Physical volume "/dev/nvme1n1" successfully created.
Physical volume "/dev/nvme2n1" successfully created.
Physical volume "/dev/nvme3n1" successfully created.
查看物理卷
pvs
3.创建卷组VG
vgcreate vg_data /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
Volume group "vg_data" successfully created
查看卷组
vgs
4.创建逻辑卷LV
lvcreate -l 100%VG -n lv_data vg_data
Logical volume "lv_data" created.
查看逻辑卷
lvs
5.格式化文件系统
mkfs.xfs /dev/vg_data/lv_data || mkfs.ext4 /dev/vg_data/lv_data
meta-data=/dev/vg_data/lv_data isize=512 agcount=14, agsize=268435455 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=0 inobtcount=0
data = bsize=4096 blocks=3750735872, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Discarding blocks...Done.
6.临时挂载
mkdir -p /data
mount /dev/vg_data/lv_data /data
7.开机自动挂载
blkid /dev/vg_data/lv_data
/dev/vg_data/lv_data: UUID="eee7da5b-7d61-4a57-8b4d-b409da39d551" BLOCK_SIZE="512" TYPE="xfs"
编辑/etc/fstab文件
UUID=eee7da5b-7d61-4a57-8b4d-b409da39d551 /data xfs defaults 0 0
8.验证挂载
mount -a # 无报错即成功
9.创建软链接
mkdir -p /data/models
ln -s /data/models /models
三、模型库
1.Hugging Face
Hugging Face – The AI community building the future.We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.sucp.cn/
2.HF-Mirror
HF-Mirror加速访问Hugging Face的门户。作为一个公益项目,我们致力于提供稳定、快速的镜像服务,帮助国内用户无障碍访问Hugging Face的资源。
https://hf-mirror.com/
3.魔搭(ModelScope)
ModelScope 魔搭社区ModelScope——汇聚各领域先进的机器学习模型,提供模型探索体验、推理、训练、部署和应用的一站式服务。在这里,共建模型开源社区,发现、学习、定制和分享心仪的模型。
https://modelscope.cn/
4.魔乐(openmind)
魔乐社区魔乐社区(Modelers.cn)是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。
https://modelers.cn/models
四、Docker安装
1.上传离线docker包
Docker:docker-28.3.3.tgz
2.解压
tar -zxvf docker-28.3.3.tgz
3.赋权
suod chmod +x /opt/docker/*
4.迁移
cp /opt/docker/* /usr/bin/
5.注册系统服务
vim /etc/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd --default-ulimit nofile=65535:65535
ExecReload=/bin/kill -s HUP $MAINPID
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
#TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
# restart the docker process if it exits prematurely
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s
[Install]
WantedBy=multi-user.target
6.赋权docker.service
sudo chmod 755 /etc/systemd/system/docker.service
7.启动服务
systemctl start docker
8.开机启动
systemctl enable docker
9.检查服务状态
systemctl status docker
五、Docker Compose安装
1.上传docker-composer包
docker-composer:docker-compose-linux-aarch64
2.重命名
mv docker-compose-linux-aarch64 docker-compose
3.赋权
sudo chmod +x docker-compose
4.迁移
sudo cp docker-compose /usr/local/bin/
5.检查服务状态
docker-compose version
六、模型部署
6.1基于vLLM-Ascend部署
在线环境下载并导出
docker pull quay.io/ascend/vllm-ascend:v0.10.0rc1-openeuler
docker save -o vllm-ascend-openeuler.tar quay.io/ascend/vllm-ascend:v0.10.0rc1-openeuler
ls -lh vllm-ascend-openeuler.tar
上传vLLM-Ascend包
vLLM-Ascend:vllm-ascend-openeuler.tar
拉取镜像
docker load -i vllm-ascend-openeuler.tar
6.1.1Docker部署vLLM-Ascend服务
设置docker环境变量
# 从 ModelScope 加载模型以加快下载速度
export VLLM_USE_MODELSCOPE=True
# 设置 max_split_size_mb 以减少内存碎片并避免内存不足
export PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256
运行容器
docker run -it --rm\
--name vllm \
--network host \
--shm-size=1g \
--device /dev/davinci_manager \
--device /dev/hisi_hdc \
--device /dev/devmm_svm \
--device /dev/davinci0 \
--device /dev/davinci1 \
--device /dev/davinci2 \
--device /dev/davinci3 \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /root/.cache:/root/.cache \
-v /models:/models \
--entrypoint "vllm" \
quay.io/ascend/vllm-ascend:v0.9.2rc1 \
serve /models/Qwen/Qwen2.5-7B-Instruct \
--served-model-name Qwen/Qwen2.5-7B-Instruct \
--tensor-parallel-size 4 \
--port 8000 \
--max-model-len 26240
测试
curl http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-30B-A3B",
"messages": [
{"role": "user", "content": "你是谁?"}
]
}'
6.1.2Docker Compose部署vLLM-Ascend服务
编写docker-compose.yml
services:
vllm-instance1:
image: quay.io/ascend/vllm-ascend:v0.9.2rc1
container_name: vllm1
restart: unless-stopped
init: true
# 网络 & 设备
network_mode: host
shm_size: 1g
devices:
- /dev/davinci_manager
- /dev/hisi_hdc
- /dev/devmm_svm
- /dev/davinci0
- /dev/davinci1
- /dev/davinci2
- /dev/davinci3
# 卷映射
volumes:
- /usr/local/dcmi:/usr/local/dcmi
- /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
- /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/
- /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info
- /etc/ascend_install.info:/etc/ascend_install.info
- /root/.cache:/root/.cache
- /models:/models
environment:
- VLLM_USE_MODELSCOPE=True
- PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256
# 命令
entrypoint: ["vllm"]
command: >
serve /models/Qwen/Qwen3-30B-A3B
--served-model-name Qwen/Qwen3-30B-A3B
--tensor-parallel-size 4
--enable_expert_parallel
--port 8001
--max-model-len 32768
vllm-instance2:
image: quay.io/ascend/vllm-ascend:v0.9.2rc1
container_name: vllm2
restart: unless-stopped
init: true
# 网络 & 设备
network_mode: host
shm_size: 1g
devices:
- /dev/davinci_manager
- /dev/hisi_hdc
- /dev/devmm_svm
- /dev/davinci4
- /dev/davinci5
- /dev/davinci6
- /dev/davinci7
# 卷映射
volumes:
- /usr/local/dcmi:/usr/local/dcmi
- /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
- /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/
- /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info
- /etc/ascend_install.info:/etc/ascend_install.info
- /root/.cache:/root/.cache
- /models:/models
environment:
- VLLM_USE_MODELSCOPE=True
- PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256
# 命令
entrypoint: ["vllm"]
command: >
serve /models/Qwen/Qwen2.5-Coder-32B-Instruct
--served-model-name Qwen/Qwen2.5-Coder-32B-Instruct
--tensor-parallel-size 4
--port 8002
--max-model-len 32768
测试
curl http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-30B-A3B",
"messages": [
{"role": "user", "content": "你是谁?"}
]
}'
6.2基于MindIE部署
上传MindIE包
MindIE:2.0.RC1-800I-A2-py311-openeuler24.03-lts.tar.gz
导入镜像
docker load -i 2.0.RC1-800I-A2-py311-openeuler24.03-lts.tar.gz
6.2.1Docker部署MindIE服务
启动容器
docker run --net=host --ipc=host -it \
--name mindie \
--device=/dev/davinci_manager \
--device=/dev/hisi_hdc \
--device=/dev/devmm_svm \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
-v /usr/local/sbin:/usr/local/sbin:ro \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /models:/models \
swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.RC1-800I-A2-py311-openeuler24.03-lts /bin/bash
说明:
--device: 表示映射的设备,可以挂载一个或者多个设备。
需要挂载的设备如下:
/dev/davinci_manager:davinci相关的管理设备。
/dev/hisi_hdc:hdc相关管理设备。
/dev/devmm_svm:内存管理相关设备。
/dev/davinci0:需要挂载的卡号。
配置MindIE服务
vim /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
{
"Version" : "1.0.0",
"ServerConfig" :
{
"ipAddress" : "127.0.0.1",
"managementIpAddress" : "127.0.0.2",
"port" : 1025,
"managementPort" : 1026,
"metricsPort" : 1027,
"allowAllZeroIpListening" : false,
"maxLinkNum" : 1000,
"httpsEnabled" : false,
"fullTextEnabled" : false,
"tlsCaPath" : "security/ca/",
"tlsCaFile" : ["ca.pem"],
"tlsCert" : "security/certs/server.pem",
"tlsPk" : "security/keys/server.key.pem",
"tlsPkPwd" : "security/pass/key_pwd.txt",
"tlsCrlPath" : "security/certs/",
"tlsCrlFiles" : ["server_crl.pem"],
"managementTlsCaFile" : ["management_ca.pem"],
"managementTlsCert" : "security/certs/management/server.pem",
"managementTlsPk" : "security/keys/management/server.key.pem",
"managementTlsPkPwd" : "security/pass/management/key_pwd.txt",
"managementTlsCrlPath" : "security/management/certs/",
"managementTlsCrlFiles" : ["server_crl.pem"],
"kmcKsfMaster" : "tools/pmt/master/ksfa",
"kmcKsfStandby" : "tools/pmt/standby/ksfb",
"inferMode" : "standard",
"interCommTLSEnabled" : true,
"interCommPort" : 1121,
"interCommTlsCaPath" : "security/grpc/ca/",
"interCommTlsCaFiles" : ["ca.pem"],
"interCommTlsCert" : "security/grpc/certs/server.pem",
"interCommPk" : "security/grpc/keys/server.key.pem",
"interCommPkPwd" : "security/grpc/pass/key_pwd.txt",
"interCommTlsCrlPath" : "security/grpc/certs/",
"interCommTlsCrlFiles" : ["server_crl.pem"],
"openAiSupport" : "vllm",
"tokenTimeout" : 600,
"e2eTimeout" : 600,
"distDPServerEnabled":false
},
"BackendConfig" : {
"backendName" : "mindieservice_llm_engine",
"modelInstanceNumber" : 1,
"npuDeviceIds" : [[0,1,2,3]],
"tokenizerProcessNumber" : 8,
"multiNodesInferEnabled" : false,
"multiNodesInferPort" : 1120,
"interNodeTLSEnabled" : true,
"interNodeTlsCaPath" : "security/grpc/ca/",
"interNodeTlsCaFiles" : ["ca.pem"],
"interNodeTlsCert" : "security/grpc/certs/server.pem",
"interNodeTlsPk" : "security/grpc/keys/server.key.pem",
"interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",
"interNodeTlsCrlPath" : "security/grpc/certs/",
"interNodeTlsCrlFiles" : ["server_crl.pem"],
"interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",
"interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",
"ModelDeployConfig" :
{
"maxSeqLen" : 2560,
"maxInputTokenLen" : 2048,
"truncation" : false,
"ModelConfig" : [
{
"modelInstanceType" : "Standard",
"modelName" : "deepseek-r1",
"modelWeightPath" : "/models/DeepSeek-R1-Distill-Qwen-7B",
"worldSize" : 4,
"cpuMemSize" : 5,
"npuMemSize" : -1,
"backendType" : "atb",
"trustRemoteCode" : false
}
]
},
"ScheduleConfig" :
{
"templateType" : "Standard",
"templateName" : "Standard_LLM",
"cacheBlockSize" : 128,
"maxPrefillBatchSize" : 50,
"maxPrefillTokens" : 8192,
"prefillTimeMsPerReq" : 150,
"prefillPolicyType" : 0,
"decodeTimeMsPerReq" : 50,
"decodePolicyType" : 0,
"maxBatchSize" : 200,
"maxIterTimes" : 512,
"maxPreemptCount" : 0,
"supportSelectBatch" : false,
"maxQueueDelayMicroseconds" : 5000
}
}
}
说明:
"httpsEnabled" : false:不启用 HTTPS。
"modelName" : "deepseek-r1":模型名称。
"modelWeightPath" : "/models/DeepSeek-R1-Distill-Qwen-7B":模型权重路径。
"worldSize" : 4:NPU 卡数。
"maxSeqLen" : 2560:最大序列长度。等于 maxInputTokenLen + maxIterTimes。
"maxInputTokenLen" : 2048:最大输入 token 长度。
"maxIterTimes" : 512:最大输出 token 长度。
启动MindIE服务
cd /usr/local/Ascend/mindie/latest/mindie-service/bin
./mindieservice_daemon
测试MindIE服务
curl 'http://localhost:1025/v1/chat/completions' -H "Content-Type: application/json" -d '{
"model": "deepseek-r1",
"messages": [
{ "role": "system", "content": "你是位人工智能专家。" },
{ "role": "user", "content": "解释人工智能" }
]
}'
{"id":"endpoint_common_1","object":"chat.completion","created":1752806878,"model":"deepseek-r1","choices":[{"index":0,"message":{"role":"assistant","tool_calls":null,"content":"嗯,用户问的是“解释人工智能”,这看起来是一个比较常见的问题。首先,我得理解用户的需求是什么。可能他们想了解人工智能的基本概念,或者是在某个特定领域应用人工智能,但不太清楚具体是什么。也有可能他们对AI的概念感到困惑,想通过简单的解释来打消疑虑。\n\n我应该先从AI的定义入手,说明它是一个多学科交叉的领域,涉及计算机科学、认知科学等。然后,分点解释,这样结构清晰,用户容易理解。比如,AI的定义、主要特点、主要技术、应用场景和挑战。\n\n接下来,每个部分要详细一点,但不要太深入,因为用户可能只是想有个初步了解。比如,在定义部分,可以提到AI是模拟人类智能的系统,同时也要指出它不是人类智能的复制,而是智能的模拟。\n\n在主要特点部分,要强调学习和自适应能力,以及处理复杂任务的能力。这样用户能明白AI在解决问题时的效率和灵活性。\n\n技术方面,要简单解释一些关键的技术,比如机器学习、深度学习,以及自然语言处理,这样用户能有个大致的概念。同时,应用场景要列举一些常见的例子,比如自动驾驶、医疗诊断、客服系统等,这样用户能联想到实际生活中的应用。\n\n最后,挑战部分也很重要,说明AI虽然取得了进展,但还有许多问题,比如伦理和安全问题,以及技术瓶颈。这样用户能有一个全面的认识,知道AI的发展还有待完善。\n\n整体上,语言要简洁明了,避免使用太多专业术语,让用户容易理解。同时,结构要清晰,分点列出,方便阅读和记忆。这样用户不仅能了解AI的基本概念,还能对它的应用和发展方向有一个初步的了解。\n</think>\n\n人工智能(Artificial Intelligence,AI)是指模拟人类智能的系统或技术,能够执行如学习、推理、问题解决、感知和语言理解等任务。简单来说,人工智能是让计算机具备人类-like的智能特征。\n\n人工智能的核心目标是开发能够执行复杂任务的系统,而这些任务通常需要人类智能才能高效完成。以下是一些关键点来解释人工智能:\n\n### 1. **定义**\n人工智能是研究、开发和应用能够模拟人类智能的系统或技术。AI系统通过算法和数据训练,能够执行任务,如:\n - **学习**:从经验中自适应和改进。\n - **推理**:基于数据和规则做出决策或预测。\n - **感知**:"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":12,"completion_tokens":512,"total_tokens":524},"prefill_time":625,"decode_time_arr":[52,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,9,10,10,10,9,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,9,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,9,10,10,10,10,10,10,10,10,9,9,10,10,9,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,13,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,10,10,10,10,10,10,10,10,10,10,11,11,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,10,10,10,10,10,10,10,10,10,12,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,10,10,10,10,10,10,10,10,13,12,11,10,10,10,10,10,11,10,10,10,10,10,10,10,10,10,10,10,10,10,10,13,12,12,12,11,10,10,10,10,10,10,10,11,11,10,10,10,10,10,10,10,10,10,11,10,10,10,10,10,10,10,10,12,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11]}
6.2.2Docker Compose部署MindIE服务
创建目录文件
mkdir -p template
cd template
touch config.json.template compose.yml entrypoint.sh
mkdir -p logs
chmod 750 logs
chmod +x entrypoint.sh
配置文件config.json.template 文件
{
"Version" : "1.0.0",
"ServerConfig" :
{
"ipAddress" : "127.0.0.1",
"managementIpAddress" : "127.0.0.2",
"port" : ${MINDIE_PORT},
"managementPort" : ${MINDIE_MANAGEMENT_PORT},
"metricsPort" : ${MINDIE_METRICS_PORT},
"allowAllZeroIpListening" : false,
"maxLinkNum" : 1000,
"httpsEnabled" : false,
"fullTextEnabled" : false,
"tlsCaPath" : "security/ca/",
"tlsCaFile" : ["ca.pem"],
"tlsCert" : "security/certs/server.pem",
"tlsPk" : "security/keys/server.key.pem",
"tlsPkPwd" : "security/pass/key_pwd.txt",
"tlsCrlPath" : "security/certs/",
"tlsCrlFiles" : ["server_crl.pem"],
"managementTlsCaFile" : ["management_ca.pem"],
"managementTlsCert" : "security/certs/management/server.pem",
"managementTlsPk" : "security/keys/management/server.key.pem",
"managementTlsPkPwd" : "security/pass/management/key_pwd.txt",
"managementTlsCrlPath" : "security/management/certs/",
"managementTlsCrlFiles" : ["server_crl.pem"],
"kmcKsfMaster" : "tools/pmt/master/ksfa",
"kmcKsfStandby" : "tools/pmt/standby/ksfb",
"inferMode" : "standard",
"interCommTLSEnabled" : true,
"interCommPort" : 1121,
"interCommTlsCaPath" : "security/grpc/ca/",
"interCommTlsCaFiles" : ["ca.pem"],
"interCommTlsCert" : "security/grpc/certs/server.pem",
"interCommPk" : "security/grpc/keys/server.key.pem",
"interCommPkPwd" : "security/grpc/pass/key_pwd.txt",
"interCommTlsCrlPath" : "security/grpc/certs/",
"interCommTlsCrlFiles" : ["server_crl.pem"],
"openAiSupport" : "vllm",
"tokenTimeout" : 600,
"e2eTimeout" : 600,
"distDPServerEnabled":false
},
"BackendConfig" : {
"backendName" : "mindieservice_llm_engine",
"modelInstanceNumber" : 1,
"npuDeviceIds" : [[${NPU_DEVICE_IDS}]],
"tokenizerProcessNumber" : 8,
"multiNodesInferEnabled" : false,
"multiNodesInferPort" : 1120,
"interNodeTLSEnabled" : true,
"interNodeTlsCaPath" : "security/grpc/ca/",
"interNodeTlsCaFiles" : ["ca.pem"],
"interNodeTlsCert" : "security/grpc/certs/server.pem",
"interNodeTlsPk" : "security/grpc/keys/server.key.pem",
"interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",
"interNodeTlsCrlPath" : "security/grpc/certs/",
"interNodeTlsCrlFiles" : ["server_crl.pem"],
"interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",
"interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",
"ModelDeployConfig" :
{
"maxSeqLen" : ${MAX_TOKEN_LEN},
"maxInputTokenLen" : ${MAX_INPUT_TOKEN_LEN},
"truncation" : false,
"ModelConfig" : [
{
"modelInstanceType" : "Standard",
"modelName" : "${MODEL_NAME}",
"modelWeightPath" : "${MODEL_WEIGHT_PATH}",
"worldSize" : ${NPU_DEVICE_COUNT},
"cpuMemSize" : 5,
"npuMemSize" : -1,
"backendType" : "atb",
"trustRemoteCode" : false
}
]
},
"ScheduleConfig" :
{
"templateType" : "Standard",
"templateName" : "Standard_LLM",
"cacheBlockSize" : 128,
"maxPrefillBatchSize" : 50,
"maxPrefillTokens" : 8192,
"prefillTimeMsPerReq" : 150,
"prefillPolicyType" : 0,
"decodeTimeMsPerReq" : 50,
"decodePolicyType" : 0,
"maxBatchSize" : 200,
"maxIterTimes" : ${MAX_OUTPUT_TOKEN_LEN},
"maxPreemptCount" : 0,
"supportSelectBatch" : false,
"maxQueueDelayMicroseconds" : 5000
}
}
}
${MINDIE_PORT}: MindIE 服务端口${MINDIE_MANAGEMENT_PORT}: MindIE 管理端口${MINDIE_METRICS_PORT}: MindIE 监控端口${MODEL_NAME}: 模型名称${MODEL_WEIGHT_PATH}: 模型权重路径${NPU_DEVICE_IDS}: NPU 设备 ID 列表${NPU_DEVICE_COUNT}: NPU 设备数量${MAX_TOKEN_LEN}: 最大序列长度(MAX_TOKEN_LEN=MAX_INPUT_TOKEN_LEN+MAX_OUTPUT_TOKEN_LEN。)${MAX_INPUT_TOKEN_LEN}: 最大输入令牌长度${MAX_OUTPUT_TOKEN_LEN}: 最大输出令牌长度
注意:
外网访问 MindIE 服务时,需要:
ipAddress:0.0.0.0 # 默认值:127.0.0.1managementIpAddress:0.0.0.0 # 默认值:127.0.0.2allowAllZeroIpListening:true
入口脚本entrypoint.sh 文件
#!/bin/bash
# 检查 envsubst 是否存在,如果不存在则尝试安装(取决于基础镜像)
# 在基于 Debian/Ubuntu 的镜像中,envsubst 在 gettext 包里
if ! command -v envsubst &> /dev/null
then
echo "envsubst not found, attempting to install gettext..."
# 假设容器内有 apt-get 或 yum,这取决于 MindIE 镜像的基础系统
# 如果 MindIE 镜像不支持包管理,你可能需要构建一个带有 envsubst 的自定义镜像
if command -v apt-get &> /dev/null; then
apt-get update && apt-get install -y gettext-base
elif command -v yum &> /dev/null; then
yum install -y gettext
else
echo "Could not install envsubst. Please ensure 'gettext' is available in your image."
exit 1
fi
fi
echo "##############################################################################################"
env
# 确保所有必要的环境变量都已设置,否则使用默认值
: "${MINDIE_PORT:=1025}"
: "${MINDIE_MANAGEMENT_PORT:=1026}"
: "${MINDIE_METRICS_PORT:=1027}"
: "${MODEL_NAME:=qwen2.5}"
: "${MODEL_WEIGHT_PATH:=/models/Qwen/Qwen2.5-0.5B-Instruct}"
: "${NPU_DEVICE_IDS:=0,1,2,3}"
: "${NPU_DEVICE_COUNT:=4}"
: "${MAX_TOKEN_LEN:=8192}"
: "${MAX_INPUT_TOKEN_LEN:=4096}"
: "${MAX_OUTPUT_TOKEN_LEN:=4096}"
# 使用 envsubst 替换模板文件中的变量,并将其写入目标路径
envsubst < /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json.template \
> /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
echo "Generated config.json:"
cat /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
# 执行 MindIE 服务的原始守护进程
exec /usr/local/Ascend/mindie/latest/mindie-service/bin/mindieservice_daemon
模型部署
cp -r template/ deepseek-r1_qwen2.5/
切换目录
cd deepseek-r1_qwen2.5/
创建目录日志
mkdir -p logs/mindie1 logs/mindie2
chmod 750 logs/mindie1 logs/mindie2
Dockerfile安装envsubst工具
FROM swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.RC1-800I-A2-py311-openeuler24.03-lts
# 使用 root(通常基础镜像已经是,保险起见再声明一次)
USER root
# openEuler 24.03 LTS 默认是 dnf,直接装 gettext 就行
RUN dnf install -y gettext && \
dnf clean all && \
rm -rf /var/cache/dnf
# 设置工作目录
WORKDIR /usr/local/Ascend/mindie/latest/mindie-service/bin
ENTRYPOINT ["./mindieservice_daemon"]
构建镜像
docker build -t mindie:2.0.RC1-800I-A2-py311-openeuler24.03-lts
修改 compose.yml 文件
name: mindie
services:
mindie-instance-1:
# image: swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.RC1-800I-A2-py311-openeuler24.03-lts
image: mindie:2.0.RC1-800I-A2-py311-openeuler24.03-lts
container_name: mindie1
restart: unless-stopped
init: true
network_mode: host
shm_size: "1g"
# 设备映射
devices:
- /dev/davinci_manager
- /dev/hisi_hdc
- /dev/devmm_svm
- /dev/davinci0
- /dev/davinci1
- /dev/davinci2
- /dev/davinci3
# 卷映射
volumes:
- /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro
- /usr/local/sbin:/usr/local/sbin:ro
- /usr/local/dcmi:/usr/local/dcmi
- /models:/models
- ./config.json.template:/usr/local/Ascend/mindie/latest/mindie-service/conf/config.json.template:ro
- ./entrypoint.sh:/usr/local/Ascend/mindie/latest/mindie-service/entrypoint.sh:ro
- ./logs/mindie1:/root/mindie
# 定义此实例的环境变量
environment:
- MINDIE_PORT=1025
- MINDIE_MANAGEMENT_PORT=1026
- MINDIE_METRICS_PORT=1027
- MODEL_NAME=deepseek-r1
- MODEL_WEIGHT_PATH=/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- NPU_DEVICE_IDS=0,1,2,3
- NPU_DEVICE_COUNT=4
- MAX_TOKEN_LEN=8192
- MAX_INPUT_TOKEN_LEN=4096
- MAX_OUTPUT_TOKEN_LEN=4096
entrypoint: ["/usr/local/Ascend/mindie/latest/mindie-service/entrypoint.sh"]
mindie-instance-2:
# image: swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.RC1-800I-A2-py311-openeuler24.03-lts
image: mindie:2.0.RC1-800I-A2-py311-openeuler24.03-lts
container_name: mindie2
restart: unless-stopped
init: true
network_mode: host
shm_size: "1g"
# 设备映射
devices:
- /dev/davinci_manager
- /dev/hisi_hdc
- /dev/devmm_svm
- /dev/davinci4
- /dev/davinci5
- /dev/davinci6
- /dev/davinci7
# 卷映射
volumes:
- /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro
- /usr/local/sbin:/usr/local/sbin:ro
- /usr/local/dcmi:/usr/local/dcmi
- /models:/models
- ./config.json.template:/usr/local/Ascend/mindie/latest/mindie-service/conf/config.json.template:ro
- ./entrypoint.sh:/usr/local/Ascend/mindie/latest/mindie-service/entrypoint.sh:ro
- ./logs/mindie2:/root/mindie
# 定义此实例的环境变量
environment:
- MINDIE_PORT=2025
- MINDIE_MANAGEMENT_PORT=2026
- MINDIE_METRICS_PORT=2027
- MODEL_NAME=qwen2.5
- MODEL_WEIGHT_PATH=/models/Qwen/Qwen2.5-7B-Instruct
- NPU_DEVICE_IDS=0,1,2,3
- NPU_DEVICE_COUNT=4
- MAX_TOKEN_LEN=8192
- MAX_INPUT_TOKEN_LEN=4096
- MAX_OUTPUT_TOKEN_LEN=4096
entrypoint: ["/usr/local/Ascend/mindie/latest/mindie-service/entrypoint.sh"]
启动MindIE服务
docker compose up -d
测试MindIE服务
curl 'http://localhost:1025/v1/chat/completions' -H "Content-Type: application/json" -d '{
"model": "deepseek-r1",
"messages": [
{ "role": "system", "content": "你是为用户解决问题的AI助手。" },
{ "role": "user", "content": "你是谁?" }
]
}'更多推荐

所有评论(0)