当前位置:   article > 正文

OpenNLP 命令行_sentence-detector.bin

sentence-detector.bin

OpenNLP 命令行

1 安装

OPENNLP_HOME

E:\Software\NLP\apache-opennlp1.9.1
  • 1
  • 2
  • 3
  • 在CLASSPATH变量后追加:
%OPENNLP_HOME%\lib;
  • 1
  • 在Path后追加:
%OPENNLP_HOME%\bin;
  • 1
  • 使用
    linux使用bin目录下的opennlp,windows使用opennlp.bat。
    栗子:如果当前命令行所在目录下有文档setence.txt,则该文档中的句子分词:

  linux

./opennlp SimpleTokenizer < sentences.txt
  • 1

  windows

opennlp.bat SimpleTokenizer <sentences.txt
  • 1

1.2 工具列表

LanguageDetector					#语言检测
LanguageDetectorTrainer 			#语言检测模型训练
LanguageDetectorConverter			#将莱比锡(leipzig)数据格式转换为本机OpenNLP格式
LanguageDetectorCrossValidator		#K-fold交叉验证器
LanguageDetectorEvaluator			#检测模型的效率

DictionaryBuilder					#穿件词典

SentenceDetector					#分句
SentenceDetectorTrainer
SentenceDetectorEvaluator
SentenceDetectorCrossValidator
SentenceDetectorConverter

SimpleTokenizer						#字符类分词
TokenizerME							#分词
TokenizerTrainer					#训练分词模型
TokenizerMEEvaluator				
TokenizerCrossValidator
TokenizerConverter					#将外国语言格式转换为本机OpenNLP格式
DictionaryDetokenizer

TokenNameFinder						#实体识别
TokenNameFinderTrainer
TokenNameFinderEvaluator
TokenNameFinderCrossValidator
TokenNameFinderConverter
CensusDictionaryCreator				#将1990年美国人口普查名称转换为字典

Doccat								#文档分类
DoccatTrainer			
DoccatCrossValidator
DoccatConverter
POSTagger 							#词性标记
POSTaggerTrainer
POSTaggerEvaluator
POSTaggerCrossValidator
POSTaggerConverter

LemmatizerME						#指代消除
LemmatizerTrainerME		
LemmatizerEvaluator

ChunkerME 							#分块
ChunkerTrainerME
ChunkerEvaluator
ChunkerCrossValidator
ChunkerConverter					#

Parser								#语法分析
ParserTrainer
ParserEvaluator
ParserConverter
BuildModelUpdater					#训练、更新语法分析模型
CheckModelUpdater					#训练、更新语法分析的检查模型
TaggerModelReplacer					#替换语法分析模型

EntityLinker						#将实体链接到外部数据集

NGramLanguageModel

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61

1.3 使用详细说明

1.3.1 句子检测器
  • SentenceDetector
Usage: opennlp SentenceDetector model < sentences

Arguments description:
	-model     
		模型
	-setences 
		要解析的文件
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

 栗子:

opennlp.bat SentenceDetector ch_sentence_detector.bin < sentences.txt > output.txt
  • 1
  • SentenceDetectorTrainer
Usage: opennlp SentenceDetectorTrainer [.irishsentencebank|.ad|.pos|.conllx|.namefinder|.parse|.moses|.conllu|.letsmt] 
        [-factory factoryName]
		[-eosChars string]
		[-abbDict path] 
		[-params paramsFile] 
		-lang language 
		-model modelFile 
		-data sampleData 
		[-encoding charsetName] 

Arguments description:
	-factory factoryName
		A sub-class of SentenceDetectorFactory where to get implementation and resources.
	-eosChars string
		EOS characters.
	-abbDict path
		abbreviation dictionary in XML format.
	-params paramsFile
		training parameters file.
	-lang language
		language which is being processed.
	-model modelFile
		output model file.
	-data sampleData
		data to be used, usually a file name.
	-encoding charsetName
		encoding for reading and writing text, if absent the system default is used.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27

 栗子:

opennlp.bat SentenceDetectorTrainer -model ch_sentence_detector.bin -lang jpn -data ch_sentence_detector.train -encoding UTF-8
  • 1

注:中文训练时,如果使用默认符号分句,则lang必须为jpn。

  • SentenceDetectorEvaluator
Usage: opennlp SentenceDetectorEvaluator[.nkjp|.irishsentencebank|.ad|.pos|.conllx|.namefinder|.parse|.moses|.conllu|.letsmt] 
		-model model 
		[-misclassified true|false]
		-data sampleData 
		[-encoding charsetName]

Arguments description:
        -model model
                the model file to be evaluated.
        -misclassified true|false
                if true will print false negatives and false positives.
        -data sampleData
                data to be used, usually a file name.
        -encoding charsetName
                encoding for reading and writing text, if absent the system default is used.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

 栗子:

opennlp.bat SentenceDetectorEvaluator -model ch_sentence_detector.bin -misclassified true -data sentences.txt -encoding UTF-8
  • 1
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/378736
推荐阅读
相关标签
  

闽ICP备14008679号