赞
踩
SentenceTransformers 是一个可以用于句子、文本和图像嵌入的Python库。 可以为 100 多种语言计算文本的嵌入并且可以轻松地将它们用于语义文本相似性、语义搜索和同义词挖掘等常见任务。该框架基于 PyTorch 和 Transformers,并提供了大量针对各种任务的预训练模型。 还可以很容易根据自己的模型进行微调。
- ### 1. install
-
- pip install -U sentence-transformers
-
- ### 2. Computing Sentence Embeddings
-
- from sentence_transformers import SentenceTransformer
- model = SentenceTransformer('all-MiniLM-L6-v2')
-
- #Our sentences we like to encode
- sentences = ['This framework generates embeddings for each input sentence',
- 'Sentences are passed as a list of string.',
- 'The quick brown fox jumps over the lazy dog.']
-
- #Sentences are encoded by calling model.encode()
- embeddings = model.encode(sentences)
-
- #Print the embeddings
- for sentence, embedding in zip(sentences, embeddings):
- print("Sentence:", sentence)
- print("Embedding:", embedding)
- print("")
- # 注: 网络连接问题,导致模型下载失败!
-
- ### 3. Semantic Textual Similarity
- from sentence_transformers import SentenceTransformer, util
- model = SentenceTransformer('all-MiniLM-L6-v2')
-
- # Two lists of sentences
- sentences1 = ['The cat sits outside',
- 'A man is playing guitar',
- 'The new movie is awesome']
-
- sentences2 = ['The dog plays in the garden',
- 'A woman watches TV',
- 'The new movie is so great']
-
- #Compute embedding for both lists
- embeddings1 = model.encode(sentences1, convert_to_tensor=True)
- embeddings2 = model.encode(sentences2, convert_to_tensor=True)
-
- #Compute cosine-similarities
- cosine_scores = util.cos_sim(embeddings1, embeddings2)
-
- #Output the pairs with their score
- for i in range(len(sentences1)):
- print("{} \t\t {} \t\t Score: {:.4f}".format(sentences1[i], sentences2[i], cosine_scores[i][i]))

https://www.sbert.net/examples/applications/computing-embeddings/README.html
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。