融合MFCC和Wav2vec特征的对话情感识别方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP391.4

基金项目:

安徽省住房城乡建设科学技术计划基金资助项目(2023-YF004,2023-YF113);安徽建筑大学智能建筑与建筑节能安徽省重点实验室开放课题基金资助项目(IBES2022ZR02)


An Emotion Recognition Method in Conversation Integrating MFCC and Wav2vec Features
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对语音信号传统手工特征动态信息捕捉不足的问题,引入Wav2vec 2.0模型以提取语音信号中的长距离依赖关系,并通过特征融合方式得到充分的情感特征表示。通过提取语音信号中最具有代表性的MFCC特征,且用Wav2vec提取特征以弥补MFCC在动态信息捕捉上的不足,获得了更为丰富和具有代表性的语音情感特征。通过对交叉注意力机制的运用,将语音声学特征与上下文信息进行融合,以获得更加全面和准确的特征表示。最终,通过Transformer网络实现了对情感状态的精准预测。通过在MELD和EEIDB数据集上进行实验,得知本文提出的方法在加权F1-Score指标上分别达到了44.32%和65.50%,从而验证了其有效性和优越性。

    Abstract:

    In view of the issue of insufficient capture of dynamic information in traditional manual features of speech signals, Wav2vec 2.0 model is introduced for an extraction of long-range dependencies in speech signals, thus obtaining sufficient emotional feature representations based on feature fusion. By extracting the most representative MFCC features from speech signals, with Wav2vec adopted to compensate for the lack of MFCC in capturing dynamic information, richer and more representative speech emotion features can be obtained. By utilizing the cross attention mechanism, the acoustic features of speech are integrated with contextual information so as to obtain a more comprehensive and accurate feature representation. Consequently, an accurate prediction of emotional states can be achieved through Transformer networks. Through experiments on MELD and EEIDB datasets, it is found that the proposed method achieves 44.32% and 65.50% in weighted F1-Score metrics, respectively, which verifies its effectiveness and superiority in performance.

    参考文献
    相似文献
    引证文献
引用本文

刘旭东,王坤侠.融合MFCC和Wav2vec特征的对话情感识别方法[J].湖南工业大学学报,2025,39(6):29-36.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-06-17
  • 出版日期:
文章二维码