SentenceTransformers 库介绍_sentence transformer model.encode

作者：Gausst松鼠会 | 2024-04-05 19:36:15

踩

sentence transformer model.encode

SentenceTransformers 是一个可以用于句子、文本和图像嵌入的Python库。可以为 100 多种语言计算文本的嵌入并且可以轻松地将它们用于语义文本相似性、语义搜索和同义词挖掘等常见任务。该框架基于 PyTorch 和 Transformers，并提供了大量针对各种任务的预训练模型。还可以很容易根据自己的模型进行微调。


### 1. install
 
pip install -U sentence-transformers
 
### 2. Computing Sentence Embeddings
 
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
 
#Our sentences we like to encode
sentences = ['This framework generates embeddings for each input sentence',
    'Sentences are passed as a list of string.', 
    'The quick brown fox jumps over the lazy dog.']
 
#Sentences are encoded by calling model.encode()
embeddings = model.encode(sentences)
 
#Print the embeddings
for sentence, embedding in zip(sentences, embeddings):
    print("Sentence:", sentence)
    print("Embedding:", embedding)
    print("")
# 注： 网络连接问题，导致模型下载失败！
 
### 3. Semantic Textual Similarity
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('all-MiniLM-L6-v2')
 
# Two lists of sentences
sentences1 = ['The cat sits outside',
             'A man is playing guitar',
             'The new movie is awesome']
 
sentences2 = ['The dog plays in the garden',
              'A woman watches TV',
              'The new movie is so great']
 
#Compute embedding for both lists
embeddings1 = model.encode(sentences1, convert_to_tensor=True)
embeddings2 = model.encode(sentences2, convert_to_tensor=True)
 
#Compute cosine-similarities
cosine_scores = util.cos_sim(embeddings1, embeddings2)
 
#Output the pairs with their score
for i in range(len(sentences1)):
    print("{} \t\t {} \t\t Score: {:.4f}".format(sentences1[i], sentences2[i], cosine_scores[i][i]))

参考：

https://www.sbert.net/examples/applications/computing-embeddings/README.html

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/Gausst松鼠会/article/detail/367601