Abstract:The main limiting factor for the efficiency of organic solar cells, a key component of distributed renewable energy, is the energy level difference between the highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) of molecules. In view of a reduction of the manufacturing cost of organic solar cells and an improvement of their energy conversion efficiency, machine learning is used to analyze the energy levels of organic solar cells and guide the molecular design. Firstly, based on the high efficiency and cost-effectiveness of machine learning, 20 key features are selected for a deeper analysis of how they affect the performance of photovoltaic devices. Subsequently, 6 different prediction models are constructed and compared. It is found that the XGBT model based on gradient boosting is characterized with the best performance in predicting the property of organic solar cells, with a coefficient of determination of 0.8 and a root mean square error of 0.2. Finally, the performance of organic solar cells can be effectively predicted by using this model, and through an in-depth analysis of HOMO and LUMO, two key molecular structures that affect battery energy levels are successfully identified.