赞
踩
去Hugging Face官网下载sentence-transformers模型
1、导入所需要的库
- from transformers import AutoTokenizer, AutoModel
- import numpy as np
- import torch
- import torch.nn.functional as F
2、加载预训练模型
- path = 'D:/Model/sentence-transformers/all-MiniLM-L6-v2'
- tokenizer = AutoTokenizer.from_pretrained(path)
- model = AutoModel.from_pretrained(path)
3、定义平均池化
- def mean_pooling(model_output, attention_mask):
- #First element of model_output contains all token embeddings
- token_embeddings = model_output[0]
- input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
- return torch.sum(token_embeddings * input_mask_expanded, 1) /
- torch.clamp(input_mask_expanded.sum(1), min=1e-9)
4、对句子进行嵌入
- sentences = ['loved thisand know really bought wanted see pictures myselfIm lucky enough someone could justify buying present',
- 'issue pages stickers restuck really used configurations made regular pages rather taking pieces robot back',
- 'stickers dont stick well first time placing',
- 'Great fun grandson loves robots',
- 'would suggest younger kids son 3']
- encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
-
- with torch.no_grad():
- model_output = model(**encoded_input)
-
- sentence_embeddings1 = mean_pooling(model_output, encoded_input['attention_mask'])
- print("Sentence embeddings:")
- print(sentence_embeddings1)
- # Normalize embeddings
- sentence_embeddings2 = F.normalize(sentence_embeddings1, p=2, dim=1)
- print("Sentence embeddings:")
- print(sentence_embeddings2)

5、运行结果
6、定义句子之间的相似度
- def compute_sim_score(v1, v2) :
- return v1.dot(v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
'运行
7、计算句子相似度
- #'issue pages stickers restuck really used configurations made regular pages rather taking pieces robot back'
- #'stickers dont stick well first time placing'
- compute_sim_score(sentence_embeddings1[1], sentence_embeddings1[2])
- #result:tensor(0.5126)
8、看一下嵌入的shape
- sentence_embeddings1.shape
- #torch.Size([5, 384])
展望总结:
接下来试试对真实用户对项目的评论句子做嵌入
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。