Development of a Regional Photovoltaic Baseline Dataset for Qinghai Province
DOI:
https://doi.org/10.6911/WSRJ.202604_12(4).0008Keywords:
Photovoltaic data, Data processing, Quality control, DBSCAN, Isolation Forest.Abstract
To support the operational needs for detailed assessment of solar resources and PV power forecasting in high-altitude regions of Qinghai Province, this study conducted the standardization and development of a PV dataset, along with end-to-end quality control, based on 15-minute interval measured data from June 2021 to June 2022 at four typical PV power plants in Qinghai Province. Missing values were imputed using physical threshold constraints and linear/co-periodic mean interpolation. A combined outlier detection model utilizing DBSCAN density clustering and random forest regression was constructed to automatically identify and remove invalid values, abnormal spikes, and physically implausible data. The results show that after quality control, the valid data rate for total radiation increased from less than 30% to 87.5%, with a correlation coefficient of 0.92 relative to meteorological parameters and a root mean square error of 48.7 W·m⁻². The data quality meets the requirements for solar resource assessment and power forecasting.
Downloads
References
[1] Li Bo, Pan Meng, Sun Yue. A Review of the Application of Artificial Intelligence in the Development of Meteorological Datasets [J]. People’s Yangtze, 2025, 56(01): 88–96. DOI: 10.16232/j.cnki.1001-4179.2025.01.012.
[2] Zhang Pei, Liu Jincheng, Zhang Bin, et al. A Review of Public Datasets for Photovoltaic Power Generation Forecasting [J]. Electric Power Information and Communication Technology, 2023, 21(08): 16–21. DOI: 10.16543/j.2095-641x.electric.power.ict.2023.08.03.
[3] Lee S G, Park S J, Lee K S, et al. Performance prediction of NREL (National Renewable Energy Laboratory) Phase VI blade adopting blunt trailing edge airfoil [J]. Energy, 2012, 47(1):47-61. DOI:10.1016/j.energy.2012.08.007.
[4] Ren Mifeng, Wang Jiahui, Ye Zefu, et al. A transferable ultra-short-term PV forecasting modeling framework applicable to single/multi-PV power plants [J]. Journal of Solar Energy, 2024, 45 (06): 359-367. DOI:10.19912/j.0254-0096.tynxb.2023-0330.
[5] Xu Yongfang, Liao Jie, Zhao Yufei. Development of a Chinese Ground-Based Meteorological Radiation Climate Data Set for 1991–2020 [J]. Atmospheric Sciences, 2024, 48(05): 2080–2094.
[6] Liu Junjian, Shi Chunxiang, Han Shuai, et al. Fusion and Evaluation of Multi-Source Ground-Based Shortwave Radiation Data [J]. Remote Sensing Technology and Application, 2018, 33(05): 850-856.
[7] Wang Dian, Chang Jun. Data Anomaly Detection Combining Deep Learning with Improved DBSCAN Clustering [J]. Journal of Dynamics and Control, 2025, 23(09): 74-84.
[8] Huang Yanjun, Zhang Bo, Zhang Yichao, et al. Research on Anomaly Detection in Power Big Data Based on an Improved Isolated Forest Model [J]. International Journal of Electronic Measurement Technology, 2025, 44(10): 88-94. DOI:10.19652/j.cnki.femt.2510015.
[9] He Jianping. A Method for Filling Missing Data in Photovoltaic Systems Based on Multi-Temporal Forecasting Models [D]. North China Electric Power University, 2025. DOI:10.27139/d.cnki.ghbdu.2025.000082.
[10] Khan Q A ,Muhammad G S ,Raza A , et al.Machine learning models for predicting carbonation depth in fly ash concrete:performance and interpretability insights[J/OL].Journal of Road Engineering, 2026, (01):74-90[2026-03-18].https://link.cnki.net/urlid/61.1520.U.20260317.1715.010.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 World Scientific Research Journal

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.




