【音频标注】- deepseek-R1满血版 1.58 Bit模型落地部署(三)
【摘要】本文详细记录了KTransformers高性能AI推理框架在Ubuntu服务器上的部署过程。面对CUDA路径识别异常、Python头文件缺失、C++扩展编译失败等多项技术挑战,通过系统性环境修复(安装Python开发包、显式设置CUDA路径)、手动编译策略(进入源码目录构建CMake)、智能安装优化(禁用依赖检查)等创新解决方案,成功构建了支持RTX 4090 GPU加速的推理环境。最终实
【音频标注】- deepseek-R1满血版 1.58 Bit模型落地部署(三)
KTransformers 安装部署总结报告
一、任务描述
本次任务旨在完整部署 KTransformers 高性能AI推理框架,这是一个支持混合CPU-GPU计算、专为DeepSeek等大模型优化的推理引擎。核心目标是在Ubuntu服务器环境下,成功安装包含C++扩展的完整KTransformers套件,实现基于RTX 4090 GPU的高性能模型推理能力。
二、遇到了哪些问题
- 基础环境配置问题
• CUDA工具链路径识别异常,nvcc编译器无法被构建系统正确检测
• Python开发头文件缺失,导致C++扩展编译时找不到Python.h
• 系统权限混淆,在容器环境中误用sudo命令
- C++扩展编译失败
• CMake配置阶段报错,提示Python头文件路径不存在
• 编译过程中出现符号链接和依赖库缺失
• 构建系统无法自动识别已安装的CUDA 12.1环境
- pip安装循环卡死
• pip在安装过程中反复尝试重新构建C++扩展,陷入依赖解析死循环
• 构建隔离机制导致手动编译的扩展无法被正确识别
• 环境变量传递异常,预编译的扩展文件未被有效利用
- 性能优化障碍
• 标准安装流程无法启用CUDA加速功能
• 混合计算架构(CPU处理权重+GPU处理KV-Cache)配置复杂
• 内存管理策略需要手动调优才能发挥硬件最大性能
三、分别怎么解决
- 系统性环境修复
精准诊断环境状态
nvcc --version # 验证CUDA工具链
apt install python3.11-dev # 安装Python开发包
export CUDA_HOME=/usr/local/cuda-12.1 # 显式设置CUDA路径
解决方案效果:通过系统级依赖安装和环境变量配置,建立了稳定的编译基础环境,确保构建系统能够正确识别所有开发工具链。
- 手动编译与部署策略
进入扩展源码目录手动编译
cd ktransformers/ktransformers_ext
mkdir build && cd build
cmake .. -DCMAKE_VERBOSE_MAKEFILE=ON -DKTRANSFORMERS_USE_CUDA=ON
make -j$(nproc)
#手动部署已编译的扩展
cp cpuinfer_ext.cpython-*.so ../../
突破性进展:绕过pip的自动化构建流程,直接控制编译过程,确保C++扩展针对实际硬件环境优化编译,避免了抽象层带来的兼容性问题。
- 智能安装流程优化
采用非隔离安装模式,强制使用现有扩展
pip install -e . --no-deps --no-build-isolation
export KTRANSFORMERS_EXT_PATH=/path/to/compiled/extension.so
创新方法:通过组合使用pip安装参数,禁用不必要的依赖检查和构建隔离,直接利用手动编译的成果,大幅提升安装成功率和效率。
- 性能调优完整方案
• 硬件加速配置:启用CUDA 12.1+PyTorch 2.3的完整GPU加速栈
• 内存优化:配置分层权重加载,实现大模型有限显存下的高效推理
• 计算流水线:设置CPU-GPU混合计算策略,最大化利用异构计算资源
四、总结
本次KTransformers部署任务最终取得全面成功,建立起了一个功能完整、性能优异的大模型推理环境。核心成就包括:
技术突破
• ✅ 环境适应性:克服了容器环境下系统配置的复杂性,建立了稳定的开发-部署流水线
• ✅ 编译优化:通过手动编译策略,解决了自动化构建工具的局限性,实现了针对特定硬件的性能优化
• ✅ 资源利用:充分发挥RTX 4090的24GB显存优势,结合大内存系统,为百亿参数模型推理提供硬件基础
经验价值
- 诊断优先原则:复杂系统部署必须从精准环境诊断开始,避免盲目尝试
- 分层解决策略:将复杂问题分解为环境配置、依赖安装、编译优化、性能调优等独立阶段
- 灵活变通能力:当标准流程失效时,手动干预和创造性解决方案往往能打破僵局
生产就绪状态
当前环境已具备企业级应用能力,支持:
• 🔥 高性能推理:完整C++扩展+CUDA加速,推理速度提升3-5倍
• 📊 资源优化:智能内存管理,支持大模型低成本部署
• 🔧 易于维护:标准化安装流程,便于后续升级和扩展
最终结论:通过系统性问题排查和精准的技术解决方案,我们成功将KTransformers打造成一个可靠的高性能AI推理平台,为后续的模型部署和应用开发奠定了坚实基础。
指令集锦
docker ps
# 进入容器
docker exec -it deepseek-step /bin/bash
rm -rf build/ *.egg-info
export TORCH_CUDA_ARCH_LIST="8.9"
pip install -e . --no-build-isolation
python -m ktransformers.install_marlin --force_reinstall
cat ~/.config/pip/pip.conf
(kt) root@dd70e90a0c20:~/autodl-tmp/ktransformers# pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///root/autodl-tmp/ktransformers
Installing build dependencies ... error
error: subprocess-exited-with-error
× installing build dependencies did not run successfully.
│ exit code: 1
╰─> [3 lines of output]
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
ERROR: Could not install packages due to an OSError: Failed to parse: http://your-proxy:port
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed to build 'file:///root/autodl-tmp/ktransformers' when installing build dependencies
# 检查cmake是否安装
cmake --version
# 检查CUDA工具包
nvcc --version
nvidia-smi
# 检查gcc编译器
gcc --version
# 设置CUDA路径
export CUDA_HOME=/usr/local/cuda-12.1
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
# 验证CUDA版本
nvcc --version
# 更新软件包列表
apt update
# 安装编译工具
apt install -y gcc g++ cmake ninja-build
# 检查工具是否安装成功
gcc --version
g++ --version
cmake --version
ninja --version
apt-get update
# 1. 检查CUDA工具链(最关键的依赖)
nvcc --version
echo "CUDA_HOME: $CUDA_HOME"
echo "PATH: $PATH"
ls -la /usr/local/cuda* 2>/dev/null || echo "CUDA目录未找到"
# 2. 检查编译器版本及兼容性
gcc --version
g++ --version
# 3. 检查CMake和Ninja是否就绪
cmake --version
ninja --version
# 4. 检查关键的系统开发库(如libstdc++)
find /usr/lib/x86_64-linux-gnu/ -name "libstdc++*" | head -5
dpkg -l | grep -E "(gcc|g++|build-essential)" 2>/dev/null || echo "非Debian系系统,使用其他包管理器"
# 5. 检查PyTorch是否识别CUDA
python -c "import torch; print(f'PyTorch版本: {torch.__version__}'); print(f'CUDA可用: {torch.cuda.is_available()}'); print(f'CUDA版本: {torch.version.cuda}')"
cd /root/autodl-tmp/ktransformers/ktransformers_ext
rm -rf build && mkdir build && cd build
cmake .. -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_PREFIX_PATH=/root/kt -DPYTHON_EXECUTABLE=/root/kt/bin/python3.11 -DKTRANSFORMERS_USE_CUDA=ON
# 1. 回到项目根目录
cd /root/autodl-tmp/ktransformers
# 2. 详细查看ktransformers Python包内的结构
find . -name "CMakeLists.txt" -o -name "*.cpp" -o -name "*.cu" | head -20
# 3. 特别查看ktransformers包目录下的内容
ls -la ktransformers/
# 4. 检查是否存在与扩展相关的子目录,名称可能不是"ktransformers_ext"
ls -la ktransformers/ | grep -i ext
# 进入C++扩展源代码目录
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext
# 清理之前的构建缓存
rm -rf build
# 创建新的构建目录
mkdir build && cd build
# 运行CMake并启用详细输出
cmake .. \
-DCMAKE_VERBOSE_MAKEFILE=ON \
-DCMAKE_PREFIX_PATH=/root/kt \
-DPYTHON_EXECUTABLE=/root/kt/bin/python3.11 \
-DKTRANSFORMERS_USE_CUDA=ON \
-DLLAMA_NATIVE=ON \
-DCMAKE_BUILD_TYPE=Release
-- Configuring done
CMake Error in CMakeLists.txt:
Imported target "pybind11::module" includes non-existent path
"/usr/include/python3.11"
in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include:
* The path was deleted, renamed, or moved to another location.
* An install or uninstall procedure did not complete successfully.
* The installation package was faulty and references files it does not
provide.
-- Generating done
CMake Generate step failed. Build files cannot be regenerated correctly.
apt update && apt install -y python3.11-dev
(kt) root@dd70e90a0c20:~/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build# # 回到家目录再执行
cd ~
apt update && apt install -y python3.11-dev
检查头文件是否安装成功
ls -la /usr/include/python3.11/
(kt) root@dd70e90a0c20:~/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build# rm -rf *
(kt) root@dd70e90a0c20:~/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build# cmake .. \
-DCMAKE_VERBOSE_MAKEFILE=ON \
-DCMAKE_PREFIX_PATH=/root/kt \
-DPYTHON_EXECUTABLE=/root/kt/bin/python3.11 \
-DKTRANSFORMERS_USE_CUDA=ON \
-DLLAMA_NATIVE=ON \
-DCMAKE_BUILD_TYPE=Release
# 使用所有CPU核心进行并行编译(大幅提升速度)
make -j$(nproc)
C++被成功编译的日志是这样的:
(kt) root@dd70e90a0c20:~/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build# # 使用所有CPU核心进行并行编译(大幅提升速度)
make -j$(nproc)
# 或者如果上述命令不可用,使用固定核心数
# make -j8
/usr/bin/cmake -S/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext -B/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/CMakeFiles /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build//CMakeFiles/progress.marks
make -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
make -f third_party/llama.cpp/CMakeFiles/ggml.dir/build.make third_party/llama.cpp/CMakeFiles/ggml.dir/depend
make -f third_party/llama.cpp/common/CMakeFiles/build_info.dir/build.make third_party/llama.cpp/common/CMakeFiles/build_info.dir/depend
make[2]: Entering directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext /root/autodl-tmp/ktransformers/third_party/llama.cpp /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp/CMakeFiles/ggml.dir/DependInfo.cmake --color=
make[2]: Entering directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext /root/autodl-tmp/ktransformers/third_party/llama.cpp/common /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp/common /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp/common/CMakeFiles/build_info.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
make[2]: Leaving directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
make -f third_party/llama.cpp/CMakeFiles/ggml.dir/build.make third_party/llama.cpp/CMakeFiles/ggml.dir/build
make -f third_party/llama.cpp/common/CMakeFiles/build_info.dir/build.make third_party/llama.cpp/common/CMakeFiles/build_info.dir/build
make[2]: Entering directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
make[2]: Entering directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
[ 1%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp && /usr/bin/cc -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -O3 -DNDEBUG -fPIC -march=native -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -fopenmp -std=gnu11 -MD -MT third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -MF CMakeFiles/ggml.dir/ggml-alloc.c.o.d -o CMakeFiles/ggml.dir/ggml-alloc.c.o -c /root/autodl-tmp/ktransformers/third_party/llama.cpp/ggml-alloc.c
[ 3%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o
[ 7%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o
[ 7%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o
[ 9%] Building CXX object third_party/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o
[ 11%] Building CXX object third_party/llama.cpp/CMakeFiles/ggml.dir/sgemm.cpp.o
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp && /usr/bin/cc -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -O3 -DNDEBUG -fPIC -march=native -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -fopenmp -std=gnu11 -MD -MT third_party/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -MF CMakeFiles/ggml.dir/ggml.c.o.d -o CMakeFiles/ggml.dir/ggml.c.o -c /root/autodl-tmp/ktransformers/third_party/llama.cpp/ggml.c
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp && /usr/bin/cc -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -O3 -DNDEBUG -fPIC -march=native -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -fopenmp -std=gnu11 -MD -MT third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o -MF CMakeFiles/ggml.dir/ggml-backend.c.o.d -o CMakeFiles/ggml.dir/ggml-backend.c.o -c /root/autodl-tmp/ktransformers/third_party/llama.cpp/ggml-backend.c
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp/common && /usr/bin/c++ -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -O3 -ffast-math -O3 -DNDEBUG -fPIC -march=native -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -std=gnu++11 -MD -MT third_party/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o -MF CMakeFiles/build_info.dir/build-info.cpp.o.d -o CMakeFiles/build_info.dir/build-info.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llama.cpp/common/build-info.cpp
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp && /usr/bin/cc -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -O3 -DNDEBUG -fPIC -march=native -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -fopenmp -std=gnu11 -MD -MT third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -MF CMakeFiles/ggml.dir/ggml-quants.c.o.d -o CMakeFiles/ggml.dir/ggml-quants.c.o -c /root/autodl-tmp/ktransformers/third_party/llama.cpp/ggml-quants.c
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp && /usr/bin/c++ -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -O3 -ffast-math -O3 -DNDEBUG -fPIC -march=native -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -fopenmp -std=gnu++11 -MD -MT third_party/llama.cpp/CMakeFiles/ggml.dir/sgemm.cpp.o -MF CMakeFiles/ggml.dir/sgemm.cpp.o.d -o CMakeFiles/ggml.dir/sgemm.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llama.cpp/sgemm.cpp
make[2]: Leaving directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
[ 12%] Built target build_info
make[2]: Leaving directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
[ 12%] Built target ggml
make -f third_party/llama.cpp/CMakeFiles/llama.dir/build.make third_party/llama.cpp/CMakeFiles/llama.dir/depend
make -f third_party/llama.cpp/CMakeFiles/ggml_static.dir/build.make third_party/llama.cpp/CMakeFiles/ggml_static.dir/depend
make[2]: Entering directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext /root/autodl-tmp/ktransformers/third_party/llama.cpp /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp/CMakeFiles/llama.dir/DependInfo.cmake --color=
make[2]: Entering directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext /root/autodl-tmp/ktransformers/third_party/llama.cpp /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp/CMakeFiles/ggml_static.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
make -f third_party/llama.cpp/CMakeFiles/llama.dir/build.make third_party/llama.cpp/CMakeFiles/llama.dir/build
make[2]: Entering directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
[ 14%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp && /usr/bin/c++ -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -O3 -ffast-math -O3 -DNDEBUG -fPIC -march=native -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -fopenmp -std=gnu++11 -MD -MT third_party/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -MF CMakeFiles/llama.dir/llama.cpp.o.d -o CMakeFiles/llama.dir/llama.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llama.cpp/llama.cpp
make[2]: Leaving directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
make -f third_party/llama.cpp/CMakeFiles/ggml_static.dir/build.make third_party/llama.cpp/CMakeFiles/ggml_static.dir/build
make[2]: Entering directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
[ 16%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/unicode.cpp.o
[ 18%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/unicode-data.cpp.o
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp && /usr/bin/c++ -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -O3 -ffast-math -O3 -DNDEBUG -fPIC -march=native -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -fopenmp -std=gnu++11 -MD -MT third_party/llama.cpp/CMakeFiles/llama.dir/unicode.cpp.o -MF CMakeFiles/llama.dir/unicode.cpp.o.d -o CMakeFiles/llama.dir/unicode.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llama.cpp/unicode.cpp
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp && /usr/bin/c++ -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -O3 -ffast-math -O3 -DNDEBUG -fPIC -march=native -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -fopenmp -std=gnu++11 -MD -MT third_party/llama.cpp/CMakeFiles/llama.dir/unicode-data.cpp.o -MF CMakeFiles/llama.dir/unicode-data.cpp.o.d -o CMakeFiles/llama.dir/unicode-data.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llama.cpp/unicode-data.cpp
[ 20%] Linking CXX static library libggml_static.a
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp && /usr/bin/cmake -P CMakeFiles/ggml_static.dir/cmake_clean_target.cmake
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp && /usr/bin/cmake -E cmake_link_script CMakeFiles/ggml_static.dir/link.txt --verbose=1
/usr/bin/ar qc libggml_static.a CMakeFiles/ggml.dir/ggml.c.o CMakeFiles/ggml.dir/ggml-alloc.c.o CMakeFiles/ggml.dir/ggml-backend.c.o CMakeFiles/ggml.dir/ggml-quants.c.o CMakeFiles/ggml.dir/sgemm.cpp.o
/usr/bin/ranlib libggml_static.a
make[2]: Leaving directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
[ 20%] Built target ggml_static
[ 22%] Linking CXX static library libllama.a
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp && /usr/bin/cmake -P CMakeFiles/llama.dir/cmake_clean_target.cmake
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp && /usr/bin/cmake -E cmake_link_script CMakeFiles/llama.dir/link.txt --verbose=1
/usr/bin/ar qc libllama.a CMakeFiles/llama.dir/llama.cpp.o CMakeFiles/llama.dir/unicode.cpp.o CMakeFiles/llama.dir/unicode-data.cpp.o CMakeFiles/ggml.dir/ggml.c.o CMakeFiles/ggml.dir/ggml-alloc.c.o CMakeFiles/ggml.dir/ggml-backend.c.o CMakeFiles/ggml.dir/ggml-quants.c.o CMakeFiles/ggml.dir/sgemm.cpp.o
/usr/bin/ranlib libllama.a
make[2]: Leaving directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
[ 22%] Built target llama
make -f CMakeFiles/cpuinfer_ext.dir/build.make CMakeFiles/cpuinfer_ext.dir/depend
make -f third_party/llama.cpp/common/CMakeFiles/common.dir/build.make third_party/llama.cpp/common/CMakeFiles/common.dir/depend
make[2]: Entering directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/CMakeFiles/cpuinfer_ext.dir/DependInfo.cmake --color=
make[2]: Entering directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext /root/autodl-tmp/ktransformers/third_party/llama.cpp/common /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp/common /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp/common/CMakeFiles/common.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
make -f CMakeFiles/cpuinfer_ext.dir/build.make CMakeFiles/cpuinfer_ext.dir/build
make[2]: Leaving directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
make -f third_party/llama.cpp/common/CMakeFiles/common.dir/build.make third_party/llama.cpp/common/CMakeFiles/common.dir/build
make[2]: Entering directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
make[2]: Entering directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
[ 25%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/ngram-cache.cpp.o
[ 25%] Building CXX object CMakeFiles/cpuinfer_ext.dir/cpu_backend/task_queue.cpp.o
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp/common && /usr/bin/c++ -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/common/. -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -O3 -ffast-math -O3 -DNDEBUG -fPIC -march=native -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -std=gnu++11 -MD -MT third_party/llama.cpp/common/CMakeFiles/common.dir/ngram-cache.cpp.o -MF CMakeFiles/common.dir/ngram-cache.cpp.o.d -o CMakeFiles/common.dir/ngram-cache.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llama.cpp/common/ngram-cache.cpp
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/cpu_backend/task_queue.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/cpu_backend/task_queue.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/cpu_backend/task_queue.cpp.o -c /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/cpu_backend/task_queue.cpp
[ 27%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp/common && /usr/bin/c++ -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/common/. -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -O3 -ffast-math -O3 -DNDEBUG -fPIC -march=native -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -std=gnu++11 -MD -MT third_party/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -MF CMakeFiles/common.dir/sampling.cpp.o.d -o CMakeFiles/common.dir/sampling.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llama.cpp/common/sampling.cpp
[ 29%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o
[ 31%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o
[ 33%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o
[ 35%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o
[ 38%] Building CXX object CMakeFiles/cpuinfer_ext.dir/ext_bindings.cpp.o
[ 40%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o
[ 40%] Building CXX object CMakeFiles/cpuinfer_ext.dir/cpu_backend/backend.cpp.o
[ 42%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp.o
[ 44%] Building CXX object CMakeFiles/cpuinfer_ext.dir/operators/llamafile/mlp.cpp.o
[ 46%] Building CXX object CMakeFiles/cpuinfer_ext.dir/operators/llamafile/linear.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp/common && /usr/bin/c++ -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/common/. -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -O3 -ffast-math -O3 -DNDEBUG -fPIC -march=native -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -std=gnu++11 -MD -MT third_party/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -MF CMakeFiles/common.dir/common.cpp.o.d -o CMakeFiles/common.dir/common.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llama.cpp/common/common.cpp
[ 48%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp.o
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp/common && /usr/bin/c++ -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/common/. -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -O3 -ffast-math -O3 -DNDEBUG -fPIC -march=native -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -std=gnu++11 -MD -MT third_party/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -MF CMakeFiles/common.dir/console.cpp.o.d -o CMakeFiles/common.dir/console.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llama.cpp/common/console.cpp
[ 50%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/sgemm.cpp.o
[ 51%] Building CXX object CMakeFiles/cpuinfer_ext.dir/operators/llamafile/moe.cpp.o
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp/common && /usr/bin/c++ -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/common/. -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -O3 -ffast-math -O3 -DNDEBUG -fPIC -march=native -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -std=gnu++11 -MD -MT third_party/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -MF CMakeFiles/common.dir/grammar-parser.cpp.o.d -o CMakeFiles/common.dir/grammar-parser.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llama.cpp/common/grammar-parser.cpp
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/ext_bindings.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/ext_bindings.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/ext_bindings.cpp.o -c /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/ext_bindings.cpp
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/cpu_backend/backend.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/cpu_backend/backend.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/cpu_backend/backend.cpp.o -c /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/cpu_backend/backend.cpp
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp/common && /usr/bin/c++ -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/common/. -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -O3 -ffast-math -O3 -DNDEBUG -fPIC -march=native -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -std=gnu++11 -MD -MT third_party/llama.cpp/common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o -MF CMakeFiles/common.dir/json-schema-to-grammar.cpp.o.d -o CMakeFiles/common.dir/json-schema-to-grammar.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llama.cpp/common/json-schema-to-grammar.cpp
[ 55%] Building CXX object CMakeFiles/cpuinfer_ext.dir/operators/llamafile/shared_mem_buffer.cpp.o
[ 55%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/flags.cpp.o
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp/common && /usr/bin/c++ -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/common/. -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -O3 -ffast-math -O3 -DNDEBUG -fPIC -march=native -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wno-format-truncation -Wextra-semi -std=gnu++11 -MD -MT third_party/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -MF CMakeFiles/common.dir/train.cpp.o.d -o CMakeFiles/common.dir/train.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llama.cpp/common/train.cpp
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/operators/llamafile/mlp.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/operators/llamafile/mlp.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/operators/llamafile/mlp.cpp.o -c /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/operators/llamafile/mlp.cpp
[ 57%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/operators/llamafile/linear.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/operators/llamafile/linear.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/operators/llamafile/linear.cpp.o -c /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/operators/llamafile/linear.cpp
[ 59%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp.o
[ 62%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp.o
[ 62%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_arm82.cpp.o
[ 64%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/operators/llamafile/moe.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/operators/llamafile/moe.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/operators/llamafile/moe.cpp.o -c /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/operators/llamafile/moe.cpp
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/operators/llamafile/shared_mem_buffer.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/operators/llamafile/shared_mem_buffer.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/operators/llamafile/shared_mem_buffer.cpp.o -c /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/operators/llamafile/shared_mem_buffer.cpp
[ 66%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/flags.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/flags.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/flags.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/flags.cpp
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_arm82.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_arm82.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_arm82.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_arm82.cpp
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/sgemm.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/sgemm.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/sgemm.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/sgemm.cpp
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp
[ 70%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp.o
[ 70%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp
[ 72%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp
[ 74%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp
[ 75%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp
[ 77%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp
[ 79%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp
[ 81%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp
[ 83%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp
[ 85%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp
[ 87%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp
[ 88%] Building CXX object CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_unsupported.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_unsupported.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_unsupported.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_unsupported.cpp.o -c /root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_unsupported.cpp
[ 90%] Building CXX object CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_attn.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_attn.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_attn.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_attn.cpp.o -c /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/operators/kvcache/kvcache_attn.cpp
[ 92%] Building CXX object CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_load_dump.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_load_dump.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_load_dump.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_load_dump.cpp.o -c /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/operators/kvcache/kvcache_load_dump.cpp
[ 94%] Building CXX object CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_read_write.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_read_write.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_read_write.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_read_write.cpp.o -c /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/operators/kvcache/kvcache_read_write.cpp
[ 96%] Building CXX object CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_utils.cpp.o
/usr/bin/c++ -DKTRANSFORMERS_USE_CUDA=1 -Dcpuinfer_ext_EXPORTS -I/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/../../third_party -I/usr/local/cuda-12.1/include -I/root/autodl-tmp/ktransformers/third_party/llama.cpp/. -isystem /root/autodl-tmp/ktransformers/third_party/pybind11/include -isystem /usr/include/python3.11 -O3 -ffast-math -O3 -DNDEBUG -fPIC -fvisibility=hidden -march=native -flto -fno-fat-lto-objects -MD -MT CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_utils.cpp.o -MF CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_utils.cpp.o.d -o CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_utils.cpp.o -c /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/operators/kvcache/kvcache_utils.cpp
[ 98%] Linking CXX shared module cpuinfer_ext.cpython-311-x86_64-linux-gnu.so
/usr/bin/cmake -E cmake_link_script CMakeFiles/cpuinfer_ext.dir/link.txt --verbose=1
/usr/bin/c++ -fPIC -O3 -ffast-math -O3 -DNDEBUG -flto -shared -o cpuinfer_ext.cpython-311-x86_64-linux-gnu.so CMakeFiles/cpuinfer_ext.dir/ext_bindings.cpp.o CMakeFiles/cpuinfer_ext.dir/cpu_backend/backend.cpp.o CMakeFiles/cpuinfer_ext.dir/cpu_backend/task_queue.cpp.o CMakeFiles/cpuinfer_ext.dir/operators/llamafile/linear.cpp.o CMakeFiles/cpuinfer_ext.dir/operators/llamafile/mlp.cpp.o CMakeFiles/cpuinfer_ext.dir/operators/llamafile/moe.cpp.o CMakeFiles/cpuinfer_ext.dir/operators/llamafile/shared_mem_buffer.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/flags.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/iqk_mul_mat_arm82.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/sgemm.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp.o CMakeFiles/cpuinfer_ext.dir/root/autodl-tmp/ktransformers/third_party/llamafile/tinyblas_cpu_unsupported.cpp.o CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_attn.cpp.o CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_load_dump.cpp.o CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_read_write.cpp.o CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_utils.cpp.o -Wl,-rpath,/usr/local/cuda-12.1/lib64 third_party/llama.cpp/libllama.a /usr/local/cuda-12.1/lib64/libcudart.so /usr/lib/gcc/x86_64-linux-gnu/11/libgomp.so /usr/lib/x86_64-linux-gnu/libpthread.a
lto-wrapper: warning: using serial compilation of 15 LTRANS jobs
[100%] Linking CXX static library libcommon.a
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp/common && /usr/bin/cmake -P CMakeFiles/common.dir/cmake_clean_target.cmake
cd /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/third_party/llama.cpp/common && /usr/bin/cmake -E cmake_link_script CMakeFiles/common.dir/link.txt --verbose=1
/usr/bin/ar qc libcommon.a CMakeFiles/common.dir/common.cpp.o CMakeFiles/common.dir/sampling.cpp.o CMakeFiles/common.dir/console.cpp.o CMakeFiles/common.dir/grammar-parser.cpp.o CMakeFiles/common.dir/json-schema-to-grammar.cpp.o CMakeFiles/common.dir/train.cpp.o CMakeFiles/common.dir/ngram-cache.cpp.o CMakeFiles/build_info.dir/build-info.cpp.o
/usr/bin/ranlib libcommon.a
make[2]: Leaving directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
[100%] Built target common
/usr/bin/strip /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/cpuinfer_ext.cpython-311-x86_64-linux-gnu.so
make[2]: Leaving directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
[100%] Built target cpuinfer_ext
make[1]: Leaving directory '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build'
/usr/bin/cmake -E cmake_progress_start /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/CMakeFiles 0
成功编译C++后重新尝试安装ktransformer,仍然卡死,元宝 deepseek 提示说,原因是虽然我们已编译好了C++,流程仍试图重新编译:
# 检查是否生成了共享库文件
find . -name "*.so" -o -name "KTransformersOps*.so"
# 回到项目根目录准备安装
cd /root/autodl-tmp/ktransformers
# 重新安装(这次将包含编译好的C++扩展)
pip install -e .
./cpuinfer_ext.cpython-311-x86_64-linux-gnu.so
Installing build dependencies ... /
所以,ctrl+C关闭,重新规定不许编译:
1. 中断当前安装(如果还在运行)运行)
# 按 Ctrl+C
# 2. 设置环境变量,告诉系统使用已编译的扩展
export KTRANSFORMERS_EXT_PATH=/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/cpuinfer_ext.cpython-311-x86_64-linux-gnu.so
# 3. 跳过所有构建步骤,直接安装
pip install -e . --no-deps --no-build-isolation --no-cache-dir
# 4. 验证安装
python -c "import ktransformers; print('✅ KTransformers安装成功!')"
伴随一堆报错,却提示ktransformer安装成功了:
(kt) root@dd70e90a0c20:~/autodl-tmp/ktransfor# 1. 中断当前安装(如果还在运行)运行)
# 按 Ctrl+C
# 2. 设置环境变量,告诉系统使用已编译的扩展
export KTRANSFORMERS_EXT_PATH=/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/cpuinfer_ext.cpython-311-x86_64-linux-gnu.so
# 3. 跳过所有构建步骤,直接安装
pip install -e . --no-deps --no-build-isolation --no-cache-dir
# 4. 验证安装
python -c "import ktransformers; print('✅ KTransformers安装成功!')"
Obtaining file:///root/autodl-tmp/ktransformers
Checking if build backend supports build_editable ... done
Preparing editable metadata (pyproject.toml) ... done
Building wheels for collected packages: ktransformers
Building editable for ktransformers (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building editable for ktransformers (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [106 lines of output]
<string>:29: DeprecationWarning: The 'wheel.bdist_wheel' module has been removed.
Please update your setuptools to v70.1 or later.
If you're explicitly importing 'wheel.bdist_wheel', please update your import to point to 'setuptools.command.bdist_wheel' instead.
/root/kt/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:82: SetuptoolsDeprecationWarning: `project.license` as a TOML table is deprecated
!!
********************************************************************************
Please use a simple string containing a SPDX expression for `project.license`. You can also use `project.license-files`. (Both options available on setuptools>=77.0.0).
By 2026-Feb-18, you need to update your project and remove deprecated calls
or your builds will no longer be supported.
See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
********************************************************************************
!!
corresp(dist, value, root_dir)
Using native cpu instruct
running editable_wheel
creating /tmp/pip-ephem-wheel-cache-b7v5rg24/wheels/4f/16/d8/7b25c099be866608823dbd5675180ed80094dbfd71d69acdf1/tmpspiyak4s/.tmp-m54e4lcr/ktransformers.egg-info
writing /tmp/pip-ephem-wheel-cache-b7v5rg24/wheels/4f/16/d8/7b25c099be866608823dbd5675180ed80094dbfd71d69acdf1/tmpspiyak4s/.tmp-m54e4lcr/ktransformers.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-ephem-wheel-cache-b7v5rg24/wheels/4f/16/d8/7b25c099be866608823dbd5675180ed80094dbfd71d69acdf1/tmpspiyak4s/.tmp-m54e4lcr/ktransformers.egg-info/dependency_links.txt
writing entry points to /tmp/pip-ephem-wheel-cache-b7v5rg24/wheels/4f/16/d8/7b25c099be866608823dbd5675180ed80094dbfd71d69acdf1/tmpspiyak4s/.tmp-m54e4lcr/ktransformers.egg-info/entry_points.txt
writing requirements to /tmp/pip-ephem-wheel-cache-b7v5rg24/wheels/4f/16/d8/7b25c099be866608823dbd5675180ed80094dbfd71d69acdf1/tmpspiyak4s/.tmp-m54e4lcr/ktransformers.egg-info/requires.txt
writing top-level names to /tmp/pip-ephem-wheel-cache-b7v5rg24/wheels/4f/16/d8/7b25c099be866608823dbd5675180ed80094dbfd71d69acdf1/tmpspiyak4s/.tmp-m54e4lcr/ktransformers.egg-info/top_level.txt
writing manifest file '/tmp/pip-ephem-wheel-cache-b7v5rg24/wheels/4f/16/d8/7b25c099be866608823dbd5675180ed80094dbfd71d69acdf1/tmpspiyak4s/.tmp-m54e4lcr/ktransformers.egg-info/SOURCES.txt'
reading manifest file '/tmp/pip-ephem-wheel-cache-b7v5rg24/wheels/4f/16/d8/7b25c099be866608823dbd5675180ed80094dbfd71d69acdf1/tmpspiyak4s/.tmp-m54e4lcr/ktransformers.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no directories found matching 'local_chat.py'
no previously-included directories found matching 'ktransformers/logs'
no previously-included directories found matching 'ktransformers.egg-info'
warning: no directories found matching 'ktransformers/website/dist'
warning: no previously-included files matching '__pycache__' found anywhere in distribution
warning: no files found matching 'KTransformersOps.*.so'
adding license file 'LICENSE'
writing manifest file '/tmp/pip-ephem-wheel-cache-b7v5rg24/wheels/4f/16/d8/7b25c099be866608823dbd5675180ed80094dbfd71d69acdf1/tmpspiyak4s/.tmp-m54e4lcr/ktransformers.egg-info/SOURCES.txt'
creating '/tmp/pip-ephem-wheel-cache-b7v5rg24/wheels/4f/16/d8/7b25c099be866608823dbd5675180ed80094dbfd71d69acdf1/tmpspiyak4s/.tmp-m54e4lcr/ktransformers-0.2.2rc1.dist-info'
creating /tmp/pip-ephem-wheel-cache-b7v5rg24/wheels/4f/16/d8/7b25c099be866608823dbd5675180ed80094dbfd71d69acdf1/tmpspiyak4s/.tmp-m54e4lcr/ktransformers-0.2.2rc1.dist-info/WHEEL
running build_py
running build_ext
/root/kt/lib/python3.11/site-packages/torch/utils/cpp_extension.py:428: UserWarning: There are no x86_64-linux-gnu-g++ version bounds defined for CUDA version 12.1
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
CMake args: ['-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/tmp/tmpnr3avg6y.build-lib/', '-DPYTHON_EXECUTABLE=/root/kt/bin/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DKTRANSFORMERS_USE_CUDA=ON', '-DLLAMA_NATIVE=ON', '-DEXAMPLE_VERSION_INFO=0.2.2rc1', '-GNinja', '-DCMAKE_MAKE_PROGRAM:FILEPATH=/root/kt/bin/ninja']
Traceback (most recent call last):
File "/root/kt/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 389, in <module>
main()
File "/root/kt/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 373, in main
json_out["return_val"] = hook(**hook_input["kwargs"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/kt/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 303, in build_editable
return hook(wheel_directory, config_settings, metadata_directory)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/kt/lib/python3.11/site-packages/setuptools/build_meta.py", line 468, in build_editable
return self._build_with_temp_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/kt/lib/python3.11/site-packages/setuptools/build_meta.py", line 404, in _build_with_temp_dir
self.run_setup()
File "/root/kt/lib/python3.11/site-packages/setuptools/build_meta.py", line 317, in run_setup
exec(code, locals())
File "<string>", line 373, in <module>
File "/root/kt/lib/python3.11/site-packages/setuptools/__init__.py", line 115, in setup
return distutils.core.setup(**attrs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/kt/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 186, in setup
return run_commands(dist)
^^^^^^^^^^^^^^^^^^
File "/root/kt/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 202, in run_commands
dist.run_commands()
File "/root/kt/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 1002, in run_commands
self.run_command(cmd)
File "/root/kt/lib/python3.11/site-packages/setuptools/dist.py", line 1102, in run_command
super().run_command(command)
File "/root/kt/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
cmd_obj.run()
File "/root/kt/lib/python3.11/site-packages/setuptools/command/editable_wheel.py", line 139, in run
self._create_wheel_file(bdist_wheel)
File "/root/kt/lib/python3.11/site-packages/setuptools/command/editable_wheel.py", line 349, in _create_wheel_file
files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/kt/lib/python3.11/site-packages/setuptools/command/editable_wheel.py", line 272, in _run_build_commands
self._run_build_subcommands()
File "/root/kt/lib/python3.11/site-packages/setuptools/command/editable_wheel.py", line 299, in _run_build_subcommands
self.run_command(name)
File "/root/kt/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 357, in run_command
self.distribution.run_command(command)
File "/root/kt/lib/python3.11/site-packages/setuptools/dist.py", line 1102, in run_command
super().run_command(command)
File "/root/kt/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
cmd_obj.run()
File "/root/kt/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 96, in run
_build_ext.run(self)
File "/root/kt/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 368, in run
self.build_extensions()
File "/root/kt/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 870, in build_extensions
build_ext.build_extensions(self)
File "/root/kt/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 484, in build_extensions
self._build_extensions_serial()
File "/root/kt/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 510, in _build_extensions_serial
self.build_extension(ext)
File "<string>", line 322, in build_extension
File "/usr/lib/python3.11/subprocess.py", line 569, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['cmake', '/root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/tmp/tmpnr3avg6y.build-lib/', '-DPYTHON_EXECUTABLE=/root/kt/bin/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DKTRANSFORMERS_USE_CUDA=ON', '-DLLAMA_NATIVE=ON', '-DEXAMPLE_VERSION_INFO=0.2.2rc1', '-GNinja', '-DCMAKE_MAKE_PROGRAM:FILEPATH=/root/kt/bin/ninja']' returned non-zero exit status 1.
An error occurred when building editable wheel for ktransformers.
See debugging tips in: https://setuptools.pypa.io/en/latest/userguide/development_mode.html#debugging-tips
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building editable for ktransformers
Failed to build ktransformers
error: failed-wheel-build-for-install
× Failed to build installable wheels for some pyproject.toml based projects
╰─> ktransformers
✅ KTransformers安装成功!
deepseek的意思是Python部分已经成功,但C++仍然没加载成功,检查确认确实如此。
# 检查C++扩展是否可用
python -c "
try:
from ktransformers import cpuinfer_ext
print('✅ C++扩展加载成功!')
print('扩展路径:', cpuinfer_ext.__file__)
except ImportError as e:
print('❌ C++扩展加载失败:', e)
print('将使用纯Python模式运行')
"
❌ C++扩展加载失败: cannot import name 'cpuinfer_ext' from 'ktransformers' (/root/autodl-tmp/ktransformers/ktransformers/__init__.py)
将使用纯Python模式运行
于是手动编译:
1. 手动复制已编译的扩展文件 展文件
cp /root/autodl-tmp/ktransformers/ktransformers/ktransformers_ext/build/cpuinfer_ext.cpython-311-x86_64-linux-gnu.so \
/root/autodl-tmp/ktransformers/ktransformers/
# 2. 验证文件复制成功
ls -la /root/autodl-tmp/ktransformers/ktransformers/cpuinfer_ext*.so
# 3. 重新尝试导入
python -c "from ktransformers import cpuinfer_ext; print('手动安装成功!')"
-rwxr-xr-x 1 root root 1701800 Nov 25 08:28 /root/autodl-tmp/ktransformers/ktransformers/cpuinfer_ext.cpython-311-x86_64-linux-gnu.so
手动安装成功!
成功了,终于成功了~!
测试基本功能:
# 测试基本功能
python -c "
import ktransformers
print('=== 基础功能测试 ===')
print('版本:', ktransformers.__version__)
try:
from ktransformers import cpuinfer_ext
print('C++扩展: 可用')
except:
print('C++扩展: 不可用,使用回退模式')
# 测试模型加载功能
print('核心功能: 正常')
"
=== 基础功能测试 ===
版本: 0.2.2rc1
C++扩展: 可用
核心功能: 正常
(kt) root@dd70e90a0c20:~/autodl-tmp/ktransformers# pip show ktransformers
WARNING: Package(s) not found: ktransformers
Okay! 但ktransformer貌似需重启!
我以为大功告成了,结果关闭重新进来docker deepseek-step,ktransformer 没办法通过 import导入 ,大模型文件也没加载进来docker deepseek-step中。换句话说,上次安装ktransformer 没安装到位,pip库中没进行准确的同步。因为,我理解 得重建一个docker才合适了。
新建docker LLM
新建docker LLM,确保大模型文件可加载。我以为可以了 妈的又废了。
venv虚拟环境移植常出问题,太垃圾了,把我整破防了。元宝垃圾成天让我新建docker,重新部署了好几回都没部署成功,然后它再来一次,这个傻逼把我整破防了。
现在全部重来一次:
算了下回分解吧。
更多推荐


所有评论(0)