从前慢现在也慢

这个屌丝很懒，什么也没留下！

热门标签

Vicuna 小羊驼（部署 + 运行）_vicuna 本地部署

作者：从前慢现在也慢 | 2024-07-04 15:31:27

踩

vicuna 本地部署

6、合并 LLaMA 生成 Vicuna 模型

小羊驼需要运存或显存的支撑，如果你的配置很低，没有足够的运存或显存，建议租用服务器来操作。

运存和显存的要求，具体可以查看：README.md

1、解决 Git 下载大文件

Git LFS 是 Github 开发的一个 Git 的扩展，用于实现 Git 对大文件的支持

2、获取 LLaMA 权重文件

1）通过磁力链接手动下载并上传服务器（本文章所使用）

磁力链接：magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA

备注：仅下载 7B / 13B 即可！！！

2）通过指令安装，该指令在中断后支持续传（测验可下载，未测验下完整）


# pip install pyllama -U
# python -m llama.download --model_size 7B
# python -m llama.download --model_size 13B

3）通过 Git 直接安装（未测验）


# git clone https://huggingface.co/huggyllama/llama-7b
# git clone https://huggingface.co/huggyllama/llama-13b

3、获取 Delta 权重文件

1）通过访问地址，手动下载并上传服务器（本文章所使用）

https://huggingface.co/lmsys/vicuna-7b-delta-v1.1/tree/main
https://huggingface.co/lmsys/vicuna-13b-delta-v1.1/tree/main

2）通过 Git 直接安装（未测验）


# git clone https://huggingface.co/lmsys/vicuna-7b-delta-v1.1
# git clone https://huggingface.co/lmsys/vicuna-13b-delta-v1.1

4、安装所需环境

1）是否使用虚拟环境：


===>> 使用 [避免和本机原始环境冲突]
=> 查看虚拟环境列表及路径
# conda env list
=> 创建虚拟环境（vicuna）
# conda create -n vicuna python==3.9
=> 激活虚拟环境
# conda activate vicuna 
=> 退出虚拟环境
# conda deactivate
=> 删除虚拟环境
# conda remove -n vicuna --all
 
===>> 不使用
=> 查询 python 版本
# python --version
=> 需确保 python 版本 >= 3.9，本处不对此处做出详细说明，请自行百度操作

2）创建文件目录


# cd /vms/app
# mkdir vicuna
# cd vicuna

3）安装 FastChat


方式一：
# pip install fschat
 
方式二：[会自动安装对应版本的transformers]
# git clone https://github.com/lm-sys/FastChat
# cd FastChat
# git tag
# git checkout v0.2.3
# pip install e .
# cd ../

4）安装 protobuf

# pip install protobuf==3.20.0

5）安装 transformers [ 若 FastChat 使用方式二安装的，可忽略 ]


方式一：
# pip install transformers
 
方式二：
# git clone https://github.com/huggingface/transformers.git
# cd transformers
# python setup.py install
# cd ../

5、转换 LLaMA 模型

转换需要用到 convert_llama_weights_to_hf.py 脚本

如果你并未操作 git clone https://github.com/huggingface/transformers.git 来安装 transformers，则可以使用下列方式寻找到需要的脚本

# find / -name convert_llama_weights_to_hf.py

如果你使用了 conda 虚拟环境，则要注意路径的位置，可能会有多个 convert_llama_weights_to_hf.py 脚本出现，尽可能使用自己环境下的脚本!!!

1）7B（本文章所使用）

# python transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir LLaMA/ --model_size 7B --output_dir ./llama-7b

2）13B

# python transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir LLaMA/ --model_size 13B --output_dir ./llama-13b

--input_dir *	LLaMA权重文件的路径
--model_size *	7B 或 13B
--output_dir *	转换成功后的保存路径

6、合并 LLaMA 生成 Vicuna 模型

1）7B（本文章所使用）

# python -m fastchat.model.apply_delta --base ./llama-7b --target ./vicuna-7b --delta ./vicuna-7b-delta-v1.1

2）13B

# python -m fastchat.model.apply_delta --base ./llama-13b --target ./vicuna-13b --delta ./vicuna-13b-delta-v1.1

--base *	转换 LLaMA 模型后的路径
--target *	合并生成后的保存路径
--delta *	7B 使用 vicuna-7b-delta-v1.1 13B 使用 vicuna-13b-delta-v1.1

7、启动程序

1）命令行模式


===>> 默认 GPU 运行（本文章所使用）
# python -m fastchat.serve.cli --model-path ./vicuna-7b
 
===>> CPU 运行
# python -m fastchat.serve.cli --model-path ./vicuna-7b --device cpu

--model-path *	合并 LLaMA 生成 Vicuna 模型的路径
--device cpu	CPU 运行
--load-8bit	量化，把32位的浮点参数压缩成８位，速度会变快，运存或显存会变小，但智力下降如：python -m fastchat.serve.cli --model-path ./vicuna-7b --load-8bit

2）Api 模式


===>> 窗口运行（需三个窗口执行指令）
=> 窗口 1
# python -m fastchat.serve.controller
=> 窗口 2
# python -m fastchat.serve.model_worker --model-name 'vicuna-7b' --model-path ./vicuna-7b
=> 窗口 3
# python -m fastchat.serve.openai_api_server --host localhost --port 8000
 
===>> 后台运行
# nohup python -m fastchat.serve.controller > controller.log 2>&1 &
# nohup python -m fastchat.serve.model_worker --model-name 'vicuna-7b' --model-path ./vicuna-7b > model_worker.log 2>&1 &
# nohup python -m fastchat.serve.openai_api_server --host localhost --port 8000 > openai_api_server.log 2>&1 &

测试方法详见：openai_api.md

在线 Api 文档 >>> 可能需要翻墙

3）Web 模式


===>> 窗口运行（需三个窗口执行指令）
=> 窗口 1
# python -m fastchat.serve.controller
=> 窗口 2
# python -m fastchat.serve.model_worker --model-name 'vicuna-7b' --model-path ./vicuna-7b
=> 窗口 3
# python -m fastchat.serve.gradio_web_server --host localhost --port 8000
 
===>> 后台运行
# nohup python -m fastchat.serve.controller > controller.log 2>&1 &
# nohup python -m fastchat.serve.model_worker --model-name 'vicuna-7b' --model-path ./vicuna-7b > model_worker.log 2>&1 &
# nohup python -m fastchat.serve.gradio_web_server --host localhost --port 8000 > gradio_web_server.log 2>&1 &

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/从前慢现在也慢/article/detail/787402

Vicuna 小羊驼（部署 + 运行）_vicuna 本地部署

文章转载自【最新请看】

1、解决 Git 下载大文件

2、获取 LLaMA 权重文件

3、获取 Delta 权重文件

4、安装所需环境

5、转换 LLaMA 模型

6、合并 LLaMA 生成 Vicuna 模型

7、启动程序

1）命令行模式

2）Api 模式

3）Web 模式