融合深度神经网络与统计学习的 印地语词性标注方法研究
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家社会科学基金资助项目(17CTQ045),广州市科技计划基金资助项目(202006020302),广东省软科学基 金资助项目(2019A101002108),广东省普通高校“人工智能”重点领域专项基金资助项目(2019KZDZX1016), 广东外语外贸大学特色创新基金资助项目(师生共研类)(19SS01)


Research on Hindi Part-of-Speech Tagging Based on Deep Neural Network and Statistical Learning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对统计模型受限于标注语料规模且不能捕获标注序列的上下文信息问题,提出一种融合深度学习和统计学习的印地语词性标注模型。该模型具有3层逻辑结构,首先在词表示层采用深度神经网络框架训练出印地语单词的形态特征,并利用word2vec方法对语料训练生成具有语义信息的低维度稠密实数词向量,然后在序列表示层将形态特征和词向量作为深度神经网络模型的输入并进行训练,得到输入序列的信息特征,最后在CRF推理层利用深度神经网络模型的输出状态和当前的转移概率矩阵作为CRF模型的参数,最终得到最优的标签序列。对提出的方法与其他方法进行了对比实验,结果表明融合深度学习和统计模型的方法较其他几种统计模型的性能有显著的提升。

    Abstract:

    In view of the limitation exhibited by the statistical model in its size of the annotated corpus with its inability to capture the context information of the tagging sequence, a Hindi part-of-speech tagging model with deep learning and statistical learning combined has thus been proposed. The model has a three-layer logical structure. First, the deep neural network framework is used to extract the morphological features of Hindi words in the word representation layer, together with using the word2vec method to generate with the low-dimensional dense real number word vectors with semantic information to be generated by adopting the word2vec method. Next, in the sequence presentation layer, the information features of the input sequence can be obtained with morphological features and word vectors used as the input of the deep neural network model tested by training. Finally, in the CRF inference layer, with the output state of the deep neural network model and the current transition probability matrix as the parameters of the CRF model, the optimal label sequence can be finally obtained. The results show that, compared with other statistical models, the performance of the proposed method has been significantly improved.

    参考文献
    相似文献
    引证文献
引用本文

王连喜,钟 准,丁曾强,邓致妍,李 霞.融合深度神经网络与统计学习的 印地语词性标注方法研究[J].湖南工业大学学报,2020,34(3):17-22.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-04-17
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2020-05-26
  • 出版日期: