当前位置:   article > 正文

python自然语言处理2

python自然语言处理2

词性标注

import nltk
from nltk import word_tokenize
s='I was watching TV'
print(nltk.pos_tag(word_tokenize(s)))
  • 1
  • 2
  • 3
  • 4

在这里插入图片描述

斯坦福标注器

from nltk.tag.stanford import StanfordPOSTagger
import nltk
stan_tagger=StanfordPOSTagger('models/english-bidirectional-distdim.tagger','standford-postagger.jar')
tokens=nltk.word_tokenize(s)
print(stan_tagger.tag(tokens))
  • 1
  • 2
  • 3
  • 4
  • 5

深入了解标注器

from nltk.corpus import brown
import nltk
tags=[tag for (word,tag) in brown.tagged_words(categories='news')]
print(nltk.FreqDist(tags))
brown_tagged_sents=brown.tagged_sents(categories='news')
default_tagger=nltk.DefaultTagger('NN')
print(default_tagger.evaluate(brown_tagged_sents))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

N元标注器

from nltk.corpus import brown
import nltk
tags=[tag for (word,tag) in brown.tagged_words(categories='news')]
print(nltk.FreqDist(tags))
brown_tagged_sents=brown.tagged_sents(categories='news')
default_tagger=nltk.DefaultTagger('NN')
# print(default_tagger.evaluate(brown_tagged_sents))

from nltk.tag import UnigramTagger,DefaultTagger,BigramTagger,TrigramTagger
train_data=brown_tagged_sents[:int(len(brown_tagged_sents)*0.9)]
test_data=brown_tagged_sents[int(len(brown_tagged_sents)*0.9):]
unigram_tagger=UnigramTagger(train_data,backoff=default_tagger)
print(unigram_tagger.evaluate(test_data))
bigram_tagger=BigramTagger(train_data,backoff=unigram_tagger)
print(bigram_tagger.evaluate(test_data))
trigram_tagger=TrigramTagger(train_data,backoff=bigram_tagger)
print(trigram_tagger.evaluate(test_data))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/904711
推荐阅读
相关标签
  

闽ICP备14008679号