赞
踩
来源 | CSDN原力计划作品
*点击阅读原文,查看美团技术团队更多干货文章。
[1] Peters, Matthew E., et al. "Deep contextualized word representations." arXiv preprint arXiv:1802.05365 (2018).
[2] Howard, Jeremy, and Sebastian Ruder. "Universal language model fine-tuning for text classification." arXiv preprint arXiv:1801.06146 (2018).
[3] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017.
[4] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. Technical report, OpenAI.
[5] Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
[6] Ming Zhou. "The Bright Future of ACL/NLP." Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. (2019).
[7] Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." 2009 IEEE conference on computer vision and pattern recognition. Ieee, (2009).
[8] Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
[9] Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.
[10] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In EMNLP.
[11] Oren Melamud, Jacob Goldberger, and Ido Dagan.2016. context2vec: Learning generic context embedding with bidirectional lstm. In CoNLL.
[12] Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
[13] 张俊林. 从Word Embedding到BERT模型—自然语言处理中的预训练技术发展史. https://zhuanlan.zhihu.com/p/49271699
[14] Sebastion Ruder. "NLP's ImageNet moment has arrived." http://ruder.io/nlp-imagenet/. (2019)
[15] Liu, Yinhan, et al. "Roberta: A robustly optimized BERT pretraining approach." arXiv preprint arXiv:1907.11692 (2019).
[16] 郑坤. 使用TensorFlow训练WDL模型性能问题定位与调优. https://tech.meituan.com/2018/04/08/tensorflow-performance-bottleneck-analysis-on-hadoop.html
[17] Uber. "Meet Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow". https://eng.uber.com/horovod/
[18] Goyal, Priya, et al. "Accurate, large minibatch sgd: Training imagenet in 1 hour." arXiv preprint arXiv:1706.02677 (2017).
[19] Baidu. https://github.com/baidu-research/baidu-allreduce
[20] Micikevicius, Paulius, et al. "Mixed precision training." arXiv preprint arXiv:1710.03740 (2017).
[21] 仲远,富峥等. 美团餐饮娱乐知识图谱——美团大脑揭秘. https://tech.meituan.com/2018/11/22/meituan-brain-nlp-01.html
[22] Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531 (2015).
[23] Google. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. https://openreview.net/pdf?id=H1eA7AEtvS. (2019)
(*本文为CSDN原力计划评选文章,转载请联系原作者)
◆
精彩推荐
◆
开幕倒计时13天|2019 中国大数据技术大会(BDTC)即将震撼来袭!豪华主席阵容及百位技术专家齐聚,十余场精选专题技术和行业论坛,超强干货+技术剖析+行业实践立体解读。6.6 折票限时特惠(立减1400元),学生票仅 599 元!
你点的每个“在看”,我都认真当成了AI
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。