赞
踩
公司想使用国产化平台部署一个对话机器人。
开发bot的时候使用常见的x86+cuda+英伟达显卡+ubuntu,所有依赖安装的都比较顺利。
部署太费劲了我的妈,赶紧把阿美立卡收为国有,这样就不用费劲国产化了
将自己中间遇到的一些坑写出来,以供参考。
系统:麒麟v10
cpu:飞腾S2500
显卡:无
rasa
torch
transformer
麒麟v10最好不要升级docker,会导致glibc自己升级,导致服务器死机(只能重装系统)。
由于rasa要求的python版本比较高,目标平台也不能联网,因此使用docker部署。
我使用buildx在开发机上进行多平台构建。
据说Docker > = 19.03的版本里包含的buildx,但是我的机器上没有,因此将buildx的安装过程也记录下来。
sudo apt install qemu qemu-kvm virt-manager bridge-utils
$HOME/.docker/cli-plugins
chmod +x ~/.docker/cli-plugins/docker-buildx
~/.docker/config.json
文件(如果没有就新建一个),增加以下内容:{
"experimental": "enabled"
}
# using ubuntu LTS version FROM ubuntu:20.04 AS builder-image # avoid stuck build due to user prompt ARG DEBIAN_FRONTEND=noninteractive RUN apt update && \ apt install software-properties-common -y && \ add-apt-repository ppa:deadsnakes/ppa && \ apt install --no-install-recommends -y python3.10 python3.10-dev python3.10-venv python3-pip python3-wheel build-essential && \ apt install rustc -y && apt install cargo -y && \ apt clean && rm -rf /var/lib/apt/lists/* ENV GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 && \ ENV GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1 && \ ENV PATH="$HOME/.cargo/bin:$PATH" # create and activate virtual environment # using final folder name to avoid path issues with packages RUN python3.10 -m venv /home/myuser/venv ENV PATH="/home/myuser/venv/bin:$PATH" # install requirements COPY requirements.txt . COPY tensorflow_text-2.13.0-cp310-cp310-linux_aarch64.whl . # pip安装 RUN pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/ && \ pip install --upgrade pip setuptools && \ pip install --no-cache-dir ez_setup numpy && \ python -m pip install --upgrade pip && \ pip install --no-cache-dir wheel && \ pip install --no-cache-dir tensorflow_text-2.13.0-cp310-cp310-linux_aarch64.whl && \ pip install --no-cache-dir torch==1.12 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu RUN pip install --no-cache-dir -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ # runner image FROM ubuntu:20.04 AS runner-image RUN apt update && \ apt install software-properties-common -y && \ add-apt-repository ppa:deadsnakes/ppa && \ apt install --no-install-recommends -y python3.10 python3.10-dev python3.10-venv python3-pip python3-wheel build-essential && \ apt clean && rm -rf /var/lib/apt/lists/* RUN useradd --create-home myuser COPY --from=builder-image /home/myuser/venv /home/myuser/venv ENV LD_PRELOAD=$LD_PRELOAD:/home/myuser/venv/python3.10/lib/site-packages/scikit_learn.libs/libgomp-d22c30c5.so.1.0.0 ENV LD_PRELOAD=$LD_PRELOAD:/home/myuser/venv/python3.10/lib/site-packages/faiss_cpu.libs/libgomp-d22c30c5.so.1.0.0 USER myuser RUN mkdir /home/myuser/code WORKDIR /home/myuser/code COPY ./models/to_docker /home/myuser/ USER root RUN sh -c "chmod a+x /home/myuser/start.sh" USER myuser # make sure all messages always reach console ENV PYTHONUNBUFFERED=1 # activate virtual environment ENV VIRTUAL_ENV=/home/myuser/venv ENV PATH="/home/myuser/venv/bin:$PATH" CMD ["/bin/sh","-c","/home/myuser/start.sh"]
docker buildx build --platform linux/arm64/v8 -t rasa-armv8:0.1.1 .
依赖包的安装顺序应为:下载到本地的安装包 > torch依赖但是官方index-url没有列出的包 > torch > 其他
‘Command “python setup.py egg_info” failed with error code 1’
错误,解决办法如下:
pip install –upgrade setuptools
python -m pip install -U pip
pip install ez_setup
pip freeze
的结果可能在安装的过程中导致依赖互相冲突,需要pip自动解决冲突。镜像打包和运行过程中还有可能遇到的其他错误:
ImportError: /home/myuser/venv/lib/python3.10/site-packages/faiss/../faiss_cpu.libs/libgomp-d22c30c5.so.1.0.0: cannot allocate memory in static TLS block
ImportError: /home/myuser/venv/lib/python3.10/site-packages/sklearn/__check_build/../../scikit_learn.libs/libgomp-d22c30c5.so.1.0.0: cannot allocate memory in static TLS block
Failed to import transformers.data.data_collator because of the following error (look up to see its traceback):
/home/myuser/venv/lib/python3.10/site-packages/torch/lib/libgomp-d22c30c5.so.1: cannot allocate memory in static TLS block
在runner镜像中使用ENV LD_PRELOAD=$LD_PRELOAD:报错的文件路径
设置环境变量。
rasa报错:No agent loaded. To continue processing, a model of a trained agent needs to be loaded.
这种情况通常是因为运行环境的rasa版本比训练模型的rasa版本高,重新训练一个模型即可。
docker build qemu: qemu_thread_create: Operation not permitted
不知道别人怎么解决,我在构建命令前加了sudo,没有再出现报错。
pip安装失败
可能是因为pip的索引到的依赖包没有aarch64版本,此时需要搜索是否有其他人编译过,或者自行从源代码安装。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。