赞
踩
OPENNLP_HOME
E:\Software\NLP\apache-opennlp1.9.1
%OPENNLP_HOME%\lib;
%OPENNLP_HOME%\bin;
linux
./opennlp SimpleTokenizer < sentences.txt
windows
opennlp.bat SimpleTokenizer <sentences.txt
LanguageDetector #语言检测 LanguageDetectorTrainer #语言检测模型训练 LanguageDetectorConverter #将莱比锡(leipzig)数据格式转换为本机OpenNLP格式 LanguageDetectorCrossValidator #K-fold交叉验证器 LanguageDetectorEvaluator #检测模型的效率 DictionaryBuilder #穿件词典 SentenceDetector #分句 SentenceDetectorTrainer SentenceDetectorEvaluator SentenceDetectorCrossValidator SentenceDetectorConverter SimpleTokenizer #字符类分词 TokenizerME #分词 TokenizerTrainer #训练分词模型 TokenizerMEEvaluator TokenizerCrossValidator TokenizerConverter #将外国语言格式转换为本机OpenNLP格式 DictionaryDetokenizer TokenNameFinder #实体识别 TokenNameFinderTrainer TokenNameFinderEvaluator TokenNameFinderCrossValidator TokenNameFinderConverter CensusDictionaryCreator #将1990年美国人口普查名称转换为字典 Doccat #文档分类 DoccatTrainer DoccatCrossValidator DoccatConverter POSTagger #词性标记 POSTaggerTrainer POSTaggerEvaluator POSTaggerCrossValidator POSTaggerConverter LemmatizerME #指代消除 LemmatizerTrainerME LemmatizerEvaluator ChunkerME #分块 ChunkerTrainerME ChunkerEvaluator ChunkerCrossValidator ChunkerConverter # Parser #语法分析 ParserTrainer ParserEvaluator ParserConverter BuildModelUpdater #训练、更新语法分析模型 CheckModelUpdater #训练、更新语法分析的检查模型 TaggerModelReplacer #替换语法分析模型 EntityLinker #将实体链接到外部数据集 NGramLanguageModel
Usage: opennlp SentenceDetector model < sentences
Arguments description:
-model
模型
-setences
要解析的文件
栗子:
opennlp.bat SentenceDetector ch_sentence_detector.bin < sentences.txt > output.txt
Usage: opennlp SentenceDetectorTrainer [.irishsentencebank|.ad|.pos|.conllx|.namefinder|.parse|.moses|.conllu|.letsmt] [-factory factoryName] [-eosChars string] [-abbDict path] [-params paramsFile] -lang language -model modelFile -data sampleData [-encoding charsetName] Arguments description: -factory factoryName A sub-class of SentenceDetectorFactory where to get implementation and resources. -eosChars string EOS characters. -abbDict path abbreviation dictionary in XML format. -params paramsFile training parameters file. -lang language language which is being processed. -model modelFile output model file. -data sampleData data to be used, usually a file name. -encoding charsetName encoding for reading and writing text, if absent the system default is used.
栗子:
opennlp.bat SentenceDetectorTrainer -model ch_sentence_detector.bin -lang jpn -data ch_sentence_detector.train -encoding UTF-8
注:中文训练时,如果使用默认符号分句,则lang必须为jpn。
Usage: opennlp SentenceDetectorEvaluator[.nkjp|.irishsentencebank|.ad|.pos|.conllx|.namefinder|.parse|.moses|.conllu|.letsmt]
-model model
[-misclassified true|false]
-data sampleData
[-encoding charsetName]
Arguments description:
-model model
the model file to be evaluated.
-misclassified true|false
if true will print false negatives and false positives.
-data sampleData
data to be used, usually a file name.
-encoding charsetName
encoding for reading and writing text, if absent the system default is used.
栗子:
opennlp.bat SentenceDetectorEvaluator -model ch_sentence_detector.bin -misclassified true -data sentences.txt -encoding UTF-8
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。