赞
踩
在huggingface的gemma界面,点击“term”以申请gemma访问权限
https://huggingface.co/google/gemma-7b
然后接受条款
如果直接用gemma提供的代码,会出现如下问题:
- from transformers import AutoTokenizer, AutoModelForCausalLM
-
- tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b")
- model = AutoModelForCausalLM.from_pretrained("google/gemma-7b")
-
- input_text = "Write me a poem about Machine Learning."
- input_ids = tokenizer(input_text, return_tensors="pt")
-
- outputs = model.generate(**input_ids)
- print(tokenizer.decode(outputs[0]))
这时候就需要添加自己hugging的token了:
- import os
- os.environ["HF_TOKEN"] = '....'
token的位置在:
- from transformers import AutoTokenizer, AutoModelForCausalLM
- '''
- AutoTokenizer用于加载预训练的分词器
- AutoModelForCausalLM则用于加载预训练的因果语言模型(Causal Language Model),这种模型通常用于文本生成任务
- '''
-
- tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b",token='。。。')
- #加载gemma-2b的预训练分词器
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b",token='。。。')
- #加载gemma-2b的预训练语言生成模型
- '''
- 使用其他几个进行文本续写,其他的地方是一样的,就这里加载的预训练模型不同:
- "google/gemma-2b-it"
- "google/gemma-7b"
- "google/gemma-7b-it"
- '''
-
-
-
- input_text = "Write me a poem about Machine Learning."
- #定义了要生成文本的初始输入
- input_ids = tokenizer(input_text, return_tensors="pt")
- #使用前面加载的分词器将input_text转换为模型可理解的数字表示【token id】
- #return_tensors="pt"表明返回的是PyTorch张量格式。
-
- outputs = model.generate(**input_ids)
- #使用模型和转换后的输入input_ids来生成文本
-
- print(tokenizer.decode(outputs[0]))
- #将生成的文本令牌解码为人类可读的文本,并打印出来

多GPU
- '''
- 前面的一样
- '''
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto")
-
- input_text = "Write me a poem about Machine Learning."
- input_ids = tokenizer(input_text, return_tensors="pt").to(model.device)
-
- '''
- 后面的一样
- '''
指定单GPU
- '''
- 前面的一样
- '''
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="cuda:0")
-
- input_text = "Write me a poem about Machine Learning."
- input_ids = tokenizer(input_text, return_tensors="pt").to(model.device)
-
- '''
- 后面的一样
- '''
其他的不变(和2.1.1比),只修改outputs这一行
outputs = model.generate(**input_ids,max_length=100)
目前gemma我没试出来同时放n个不同的chat怎么搞,目前只放了一个
和文本生成相同,从预训练模型中导入一个分词器一个CausalLM
- # pip install accelerate
- from transformers import AutoTokenizer, AutoModelForCausalLM
-
- tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="cuda:0")
- chat=[
- {"role": "user", "content": "I am going to Paris, what should I see?"},
- {
- "role": "assistant",
- "content": """\
- Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:
- 1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.
- 2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.
- 3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.
- These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.""",
- },
- {"role": "user", "content": "What is so great about #1?"},
- ]
-
- prompt = tokenizer.apply_chat_template(chat,
- tokenize=False,
- add_generation_prompt=True)
- #tokenize=False:这个参数控制是否在应用模板之后对文本进行分词处理。False表示不进行分词处理
-
- #add_generation_prompt=True:这个参数控制是否在处理后的文本中添加生成提示。
- #True意味着会添加一个提示,这个提示通常用于指导模型进行下一步的文本生成
- #添加的提示是:<start_of_turn>model
-
- print(prompt)
- '''
- <bos><start_of_turn>user
- I am going to Paris, what should I see?<end_of_turn>
- <start_of_turn>model
- Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:
- 1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.
- 2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.
- 3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.
- These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.<end_of_turn>
- <start_of_turn>user
- What is so great about #1?<end_of_turn>
- <start_of_turn>model
- '''

- inputs = tokenizer.encode(prompt,
- add_special_tokens=False,
- return_tensors="pt")
- inputs
- '''
- tensor([[ 2, 106, 1645, 108, 235285, 1144, 2319, 577, 7127,
- 235269, 1212, 1412, 590, 1443, 235336, 107, 108, 106,
- 2516, 108, 29437, 235269, 573, 6037, 576, 6081, 235269,
- 603, 3836, 604, 1277, 24912, 16333, 235269, 3096, 52054,
- 235269, 13457, 82625, 235269, 578, 23939, 13795, 235265, 5698,
- 708, 1009, 576, 573, 2267, 39664, 577, 1443, 575,
- 7127, 235292, 108, 235274, 235265, 714, 125957, 22643, 235292,
- 714, 34829, 125957, 22643, 603, 974, 576, 573, 1546,
- 93720, 82625, 575, 573, 2134, 578, 6952, 79202, 7651,
- 576, 573, 3413, 235265, 108, 235284, 235265, 714, 91182,
- 9850, 235292, 714, 91182, 603, 974, 576, 573, 2134,
- 235303, 235256, 10155, 578, 1546, 10964, 52054, 235269, 12986,
- 671, 20110, 5488, 576, 3096, 578, 51728, 235269, 3359,
- 573, 37417, 25380, 235265, 108, 235304, 235265, 32370, 235290,
- 76463, 41998, 235292, 1417, 4964, 57046, 603, 974, 576,
- 573, 1546, 10964, 82625, 575, 7127, 578, 603, 3836,
- 604, 1277, 60151, 16333, 578, 24912, 44835, 5570, 11273,
- 235265, 108, 8652, 708, 1317, 476, 2619, 576, 573,
- 1767, 39664, 674, 7127, 919, 577, 3255, 235265, 3279,
- 712, 1683, 577, 1443, 578, 749, 235269, 665, 235303,
- 235256, 793, 5144, 674, 7127, 603, 974, 576, 573,
- 1546, 5876, 18408, 42333, 575, 573, 2134, 235265, 107,
- 108, 106, 1645, 108, 1841, 603, 712, 1775, 1105,
- 1700, 235274, 235336, 107, 108, 106, 2516, 108]])
- '''

和文本生成一样,也是model.generate
- outputs = model.generate(input_ids=inputs.to(model.device),
- max_new_tokens=500)
- print(tokenizer.decode(outputs[0]))
- '''
- <bos><start_of_turn>user
- I am going to Paris, what should I see?<end_of_turn>
- <start_of_turn>model
- Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:
- 1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.
- 2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.
- 3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.
- These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.<end_of_turn>
- <start_of_turn>user
- What is so great about #1?<end_of_turn>
- <start_of_turn>model
- The Eiffel Tower is one of the most iconic landmarks in the world and offers breathtaking views of the city. It is a symbol of French engineering and architecture and is a must-see for any visitor to Paris.<eos>
- '''

Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。