基于Spark平台的FP-Growth算法优化与实现
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

湖南省教育厅科学研究基金资助项目(17C0009)


Optimization and Implementation of FP-Growth Algorithm Based on Spark Platform
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对FP-Growth算法面对海量数据挖掘时串行操作机制出现内存瓶颈或者数据挖掘失效等问题,提出将基于Spark平台的FP-Growth算法在数据分组策略和项头表结构两方面进行优化。一方面提出一种S型的负载权值均衡分组的方式;另一方面,设计出一种新的项头表结构,此结构包含Hash查找表,能有效降低查找时间复杂度。实验证明,优化的基于Spark平台的FP-Growth算法(OptFP-Spark算法)具有更高的并行运算加速比、更好的并行挖掘效果及更高效的计算效率。

    Abstract:

    In view of the defect of memory bottleneck or data mining failure found in FP growth algorithm when processing massive data mining, a new method has thus been proposed to optimize FP growth algorithm based on spark platform in data grouping strategy and item header table structure. On the one hand, an S-typed grouping method has been proposed, which can realize a balanced grouping of load weights. On the other hand, a new item header table structure of FP-Growth with a hash look-up table has been proposed, which can effectively reduce the complexity of look-up time. Experimental results show that, characterized with a very high computational efficiency, the optimized FP-Growth algorithm, which is based on Spark platform, has a higher speedup of parallel computing and better parallel mining efficiency.

    参考文献
    相似文献
    引证文献
引用本文

黄 婕.基于Spark平台的FP-Growth算法优化与实现[J].湖南工业大学学报,2020,34(1):77-84.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2019-05-22
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2020-01-10
  • 出版日期: