当前位置:   article > 正文

python 多分类情感_使用BERT进行多标签分类; 来AI挑战者的细粒度情感分析

情感分类多标签

Introduction

With this repository, you will able to train Multi-label Classification with BERT,

Deploy BERT for online prediction.

You can also find the a short tutorial of how to use bert with chinese: BERT short chinese tutorial

Basic Ideas

Add something here.

Experiment on New Models

for more, check model/bert_cnn_fine_grain_model.py

Performance

Model

TextCNN(No-pretrain)

TextCNN(Pretrain-Finetuning)

Bert(base_model_zh)

Bert(base_model_zh,pre-train on corpus)

F1 Score

0.678

0.685

ADD A NUMBER HERE

ADD A NUMBER HERE

Notice: F1 Score is reported on validation set

Usage

Bert for Multi-label Classificaiton [data for fine-tuning and pre-train]

export BERT_BASE_DIR=BERT_BASE_DIR/chinese_L-12_H-768_A-12

export TEXT_DIR=TEXT_DIR

nohup python run_classifier_multi_labels_bert.py

--task_name=sentiment_analysis

--do_train=true

--do_eval=true

--data_dir=$TEXT_DIR

--vocab_file=$BERT_BASE_DIR/vocab.txt

--bert_config_file=$BERT_BASE_DIR/bert_config.json

--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt

--max_seq_length=512

--train_batch_size=4

--learning_rate=2e-5

--num_train_epochs=3

--output_dir=./checkpoint_bert &

1.firstly, you need to download pre-trained model from google, and put to a folder(e.g.BERT_BASE_DIR)

chinese_L-12_H-768_A-12 from bert

2.secondly, you need to have training data(e.g. train.tsv) and validation data(e.g. dev.tsv), and put it under a

folder(e.g.TEXT_DIR ). you can also download data from here data to train bert for AI challenger-Sentiment Analysis.

it contains processed data you can run for both fine-tuning on sentiment analysis and pre-train with Bert.

it is generated by following this notebook step by step:

preprocess_char.ipynb

you can generate data by yourself as long as data format is compatible with

processor SentimentAnalysisFineGrainProcessor(alias as sentiment_analysis);

data format: label1,label2,label3\t here is sentence or sentences\t

it only contains two columns, the first one is target(one or multi-labels), the second one is input strings.

no need to tokenized.

sample:"0_1,1_-2,2_-2,3_-2,4_1,5_-2,6_-2,7_-2,8_1,9_1,10_-2,11_-2,12_-2,13_-2,14_-2,15_1,16_-2,17_-2,18_0,19_-2 浦东五莲路站,老饭店福瑞轩属于上海的本帮菜,交通方便,最近又重新装修,来拨草了,饭店活动满188元送50元钱,环境干净,简单。朋友提前一天来预订包房也没有订到,只有大堂,五点半到店基本上每个台子都客满了,都是附近居民,每道冷菜量都比以前小,味道还可以,热菜烤茄子,炒河虾仁,脆皮鸭,照牌鸡,小牛排,手撕腊味花菜等每道菜都很入味好吃,会员价划算,服务员人手太少,服务态度好,要能团购更好。可以用支付宝方便"

check sample data in ./BERT_BASE_DIR folder

for more detail, check create_model and SentimentAnalysisFineGrainProcessor from run_classifier.py

Pre-train Bert model based on open-souced model, then do classification task

generate raw data: [ADD SOMETHING HERE]

take sure each line is a sentence. between each document there is a blank line.

you can find generated data from zip file. use write_pre_train_doc() from preprocess_char.ipynb

generate data for pre-train stage using: export BERT_BASE_DIR=./BERT_BASE_DIR/chinese_L-12_H-768_A-12

nohup python create_pretraining_data.py \

--input_file=./PRE_TRAIN_DIR/bert_*_pretrain.txt \

--output_file=./PRE_TRAIN_DIR/tf_examples.tfrecord \

--vocab_file=$BERT_BASE_DIR/vocab.txt \

--do_lower_case=True \

--max_seq_length=512 \

--max_predictions_per_seq=60 \

--masked_lm_prob=0.15 \

--random_seed=12345 \

--dupe_factor=5 nohup_pre.out &

pre-train model with generated data:

python run_pretraining.py

fine-tuning

python run_classifier.py

TextCNN

cache file of TextCNN model was generate by following steps from preprocess_word.ipynb.

it contains everything you need to run TextCNN.

it include: processed train/validation/test set; vocabulary of word; a dict map label to index.

take train_valid_test_vocab_cache.pik and put it under folder of preprocess_word/

raw data are also included in this zip file.

Pre-train TextCNN

pre-train TextCNN with masked language model

python train_cnn_lm.py

fine-tuning for TextCNN

python train_cnn_fine_grain.py

Deploy BERT for online prediction

with session and feed style you can easily deploy BERT.

Reference

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/356543
推荐阅读
相关标签
  

闽ICP备14008679号