基于 SoftLexicon 的医疗实体识别模型
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金资助项目(61871432);湖南省自然科学基金资助项目(2020JJ6089);湖南省教育厅科研 基金资助重点项目(19A133)


A Medical Entity Recognition Model Based on SoftLexicon
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为了解决在中文电子病历命名实体识别任务中,基于字符粒度 NER 方法对序列信息遗漏的 问题,以及引入外部词典资源方法所带来的运算效率问题,提出一种基于 SoftLexicon 的医疗实体识别模 型。首先,将输入序列中的每个字符映射到一个稠密向量中;接下来,引入外部词典资源,为每个字符构造 SoftLexicon特征,并将其添加到对应的字向量表示中;然后,将这些增强的字符表示放入Bi-LSTM和CRF层, 以获得最终的识别结果。该模型既能有效捕捉句子序列中字符的特征,提取上下文之间的依赖关系,又能实 现标签预测的顺序性。以 CCKS-2020 医疗命名实体识别评测任务提供的电子病历数据作为实验数据集,实 验结果表明,与基于字符粒度的传统 NER 方法相比,所提方法在实体识别性能和效率上都显著提高。

    Abstract:

    In view of the problem of missing sequence information based on the character granularity NER in the task of naming entity recognition of Chinese electronic medical records, as well as the low computational efficiency brought about by the introduction of external dictionary resource methods, a model based on SoftLexicon has thus been proposed. First, each character in the sequence is mapped to a dense vector; next, an external dictionary resource is introduced to construct SoftLexicon features for each character to be added to the corresponding word vector representation; then, these enhanced characters representations are to be put into the Bi-LSTM and CRF layers so as to obtain the final recognition result. The model can effectively capture the characteristics in the sentence sequence, and extract the dependencies between contexts, thus realizing the sequentiality of label prediction. With the electronic medical record data provided by the CCKS-2020 medical named entity recognition evaluation task is as the experimental data set, the proposed method, compared with the traditional NER method based on character granularity, has significantly improved entity recognition performance and efficiency.

    参考文献
    相似文献
    引证文献
引用本文

张 旭,朱艳辉,梁文桐,詹 飞.基于 SoftLexicon 的医疗实体识别模型[J].湖南工业大学学报,2021,35(5):77-84.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-12-25
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2021-07-21
  • 出版日期: