Abstract:Given its good performance in extracting global context as well as its significant performance in single image super-resolution (SISR), the Transformer based method focuses more on capturing low-frequency information thus neglecting the extraction of high-frequency features, due to the fact that the main function of Transformer is for global feature capturing. In view of a solution of this issue, a multi-frequency feature aggregation network (MFAN) has thus been proposed with the advantages of convolution and transformer structures integrated together. This network consists of three important modules: the coupled self-attention transformer (CSAT) for extracting global context, the high-frequency enhancement module (HFEM) for extracting and enhancing high-frequency information, and the refinement fusion module (RFM) for refining global features. It is found that, compared with other SR methods, the proposed multi-frequency feature aggregation network is characterized with a significant improvement of the visual resolution and image quality based on experimental results.