基于视觉与文本语义增强的多模态命名实体识别方法
DOI:
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金资助项目(52272347);湖南省教育厅科学研究基金资助重点项目(22A0408)


A Multi-Modal Named Entity Recognition Method Based on Visual and Textual Semantic Enhancement
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为了解决视觉特征和文本特征融合后存在部分语义缺失从而导致视觉信息对文本信息的补充有较大偏差的问题,提出了一种基于视觉与文本语义增强的多模态命名实体识别方法。融合BERT文本特征提取和CLIP(contrastive language–image pre-training)视觉特征提取方法,设计了基于协同交叉注意力机制的特征交互单元,以增强视觉信息和文本信息之间的语义关系。CLIP通过对比学习框架进行预训练,优化模型以正确匹配视觉和对应的文本描述,最大化正样本(匹配的视觉-文本对)的相似性,同时最小化负样本(不匹配的视觉-文本对)的相似性。采用通用领域数据集TWITTER-2015和TWITTER-2017作为实验数据集。实验结果表明,本模型相比传统方法在多模态命名实体识别任务中的准确率、召回率、F1值均有显著提升。

    Abstract:

    In view of a solution of the partial semantic loss in the fusion of visual and textual features, which leads to a significant deviation in the supplementation of visual information to textual information, a multimodal named entity recognition method has thus been proposed based on visual and textual semantic enhancement. A feature interaction unit based on collaborative cross attention mechanism is designed for an enhancement of the semantic relationship between visual information and textual information by integrating BERT text feature extraction and CLIP (contrastive language image pre-training) visual feature extraction methods. CLIP pre-trains through a contrastive learning framework to optimize the model for a correct matching of visual and corresponding text descriptions, thus maximizing the similarity of positive samples (matched visual text pairs) while minimizing the similarity of negative samples (mismatched visual text pairs). The general domain datasets TWITTER-2015 and TWITTER-2017 are adopted as experimental datasets in this article. Experimental results show that compared with traditional methods, this model is characterized with a significantly improved accuracy, recall, and F1 score in multi-modal named entity recognition tasks.

    参考文献
    相似文献
    引证文献
引用本文

满芳滕,朱艳辉,张志轩,应旭剑,陈 豪.基于视觉与文本语义增强的多模态命名实体识别方法[J].湖南工业大学学报,2025,39(1):64-71.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-03-30
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-11-04
  • 出版日期:
文章二维码