赞
踩
ChatGLM-6B-INT4 是 ChatGLM-6B 量化后的模型权重。具体的,ChatGLM-6B-INT4 对 ChatGLM-6B 中的 28 个 GLM Block 进行了 INT4 量化,没有对 Embedding 和 LM Head 进行量化。量化后的模型理论上 6G 显存(使用 CPU 即内存)即可推理,具有在嵌入式设备(如树莓派)上运行的可能。
在 CPU 上运行时,会根据硬件自动编译 CPU Kernel ,请确保已安装 GCC 和 OpenMP (Linux一般已安装,对于Windows则需手动安装),以获得最佳并行计算能力。
huggingface-cli.exe download \
--local-dir-use-symlinks False \
--resume-download THUDM/chatglm-6b-int4 \
--local-dir /root/jupyter/models/chatglm-6b-int4
# 安装sentencepiece pip download -d /root/jupyter/pip sentencepiece pip install --no-index --find-links=/root/jupyter/pip sentencepiece # 调整transformers的版本 # 版本过高,会报:AttributeError: 'ChatGLMTokenizer' object has no attribute 'sp_tokenizer' pip download -d /root/jupyter/pip transformers==4.33.2 pip install --no-index --find-links=/root/jupyter/pip transformers==4.33.2 #调整torch的版本 # 版本过高,会报:/opt/conda/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node # is deprecated. Please use torch.utils._pytree.register_pytree_node instead. # torch.utils._pytree._register_pytree_node( pip download -d /root/jupyter/pip torch==1.13.1 pip install --no-index --find-links=/root/jupyter/pip torch==1.13.1 #安装cpm_kernels pip download -d /root/jupyter/pip cpm_kernels pip install --no-index --find-links=/root/jupyter/pip cpm_kernels
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True).cpu().float()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
THUDM/chatglm-6b-int4 discussions
THUDM/chatglm-6b-int4
ChatGLM3 PROMPT
ChatGLM-6B的CPU版本如何安装
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。