Abstract:In view of the defect of memory bottleneck or data mining failure found in FP growth algorithm when processing massive data mining, a new method has thus been proposed to optimize FP growth algorithm based on spark platform in data grouping strategy and item header table structure. On the one hand, an S-typed grouping method has been proposed, which can realize a balanced grouping of load weights. On the other hand, a new item header table structure of FP-Growth with a hash look-up table has been proposed, which can effectively reduce the complexity of look-up time. Experimental results show that, characterized with a very high computational efficiency, the optimized FP-Growth algorithm, which is based on Spark platform, has a higher speedup of parallel computing and better parallel mining efficiency.