OUTCOMES
Our results show that support vector regression can be a viable means to solve the forecasting problems presented. We have found several important factors for successful power load and solar resource forecasts. The first is to include a large amount of data for training the model. By including five to eight years of hourly data, our models were able to capture the periodic behavior based on the hour of the day and the day of the year. We performed tests with smaller data sets of several months to several years, but the largest data sets gave the best results. With large data sets, it is also important to have a procedure for missing values. The algorithm we used interprets missing values as zero, which would falsely represent data from most of our attributes. Since fewer than about 0.1% of our data points had missing values, we deleted all such points from the training sets.
Another important factor is the selection of parameters. Certain choices of C and γ gave us predictions that were rather flat, staying near the average value and not reaching the minima or maxima, while others varied widely, at times predicting a dramatic change where the actual value remained nearly constant. These are examples of under- and over-fitting. We used the parameters that we found to give the best balance between these.
Another important factor is the selection of parameters. Certain choices of C and γ gave us predictions that were rather flat, staying near the average value and not reaching the minima or maxima, while others varied widely, at times predicting a dramatic change where the actual value remained nearly constant. These are examples of under- and over-fitting. We used the parameters that we found to give the best balance between these.
POSSIBLE IMPROVEMENTS
Much improvement could be made to our results with improved methods of parameter selection for C and. We used a grid search method, which is time consuming and can be inaccurate. Parameter selection is an area of ongoing research in machine learning, and a variety of methods such as particle swarm optimization have been shown to lead to improved selection of parameters. [5] With more research in these methods and additional computing power, our models could be made more robust and give more accurate predictions.
In addition, with more time and computing power, we could incorporate more years of data into our model building. We found that adding several years of data improved the accuracy of our results, but the time required for computing with any additional data quickly became prohibitive.
Another improvement could come in the data that we use as our attributes. More sophisticated weather measurements such as percent sky cover or even sky images could be used as attributes. In-depth study of the dynamics of cloud behavior could lead to a better understanding of the changes in solar radiation and an improved model for prediction. Weather stations could be built to take the measurements that are found to be most important. It would also be beneficial to obtain data for the actual power output of a photovoltaic installation. This data would be more useful than the incident radiation as the target value of the prediction.
Finally, in addition to improving the performance of support vector regression models, it would be worthwhile to investigate other methods of machine learning for this problem. One option, if several working models were built, would be to combine them into a hybrid model, as Wu et al. showed very successfully for photovoltaic output. [7]
A combination of these improvements applied at a specific location could result in accurate prediction of the power that a photovoltaic installation at that specific location will generate. This information would be valuable for the sale of the power or the planning of grid operations, allowing better utilization of power plants, from base- to peak-load, along with storage and demand response, ultimately reducing the cost of harnessing energy from the sun.
In addition, with more time and computing power, we could incorporate more years of data into our model building. We found that adding several years of data improved the accuracy of our results, but the time required for computing with any additional data quickly became prohibitive.
Another improvement could come in the data that we use as our attributes. More sophisticated weather measurements such as percent sky cover or even sky images could be used as attributes. In-depth study of the dynamics of cloud behavior could lead to a better understanding of the changes in solar radiation and an improved model for prediction. Weather stations could be built to take the measurements that are found to be most important. It would also be beneficial to obtain data for the actual power output of a photovoltaic installation. This data would be more useful than the incident radiation as the target value of the prediction.
Finally, in addition to improving the performance of support vector regression models, it would be worthwhile to investigate other methods of machine learning for this problem. One option, if several working models were built, would be to combine them into a hybrid model, as Wu et al. showed very successfully for photovoltaic output. [7]
A combination of these improvements applied at a specific location could result in accurate prediction of the power that a photovoltaic installation at that specific location will generate. This information would be valuable for the sale of the power or the planning of grid operations, allowing better utilization of power plants, from base- to peak-load, along with storage and demand response, ultimately reducing the cost of harnessing energy from the sun.
ACKNOWLEDGMENT
We would like to thank Alex Cassidy and Dr. Arye Nehorai for their guidance and support and Professor Ed Richter for his coordination of the undergraduate research
REFERENCES
[1] “The Duck Curve: Managing a Green Grid." Flexible Resources Help Renewables. California ISO, 2013. Web. 20 Feb. 2015.
[2] Smola, Alex J., and Bernhard Schölkopf. "A Tutorial on Support Vector Regression." Statistics and Computing 14.3 (2004): 199-222. Web. 22 Jan. 2015.
[3] K.-R. Müller, A. J. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen, and V. Vapnik, “Predicting time series with support vector machines,” in Artificial Neural Networks — ICANN’97, vol. 1327, W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud, Eds. Springer Berlin Heidelberg, 1997, pp. 999–1004.
[4] Bo-Juen Chen, Ming-Wei Chang, and Chih-Jen Lin, “Load forecasting using support vector Machines: a study on EUNITE competition 2001,” Power Systems, IEEE Transactions on, vol. 19, no. 4, pp. 1821–1830, Nov. 2004.
[5] Hong, Wei-Chiang. "Chaotic Particle Swarm Optimization Algorithm in a Support Vector Regression Electric Load Forecasting Model." Energy Conversion and Management 50.1 (2009): 105-17.
[6] Rojas, I., O. Valenzuela, F. Rojas, A. Guillen, L.j. Herrera, H. Pomares, L. Marquez, and M. Pasadas. "Soft-computing Techniques and ARMA Model for Time Series Prediction." Neurocomputing 71.4-6 (2008): 519-37.
[7] Yuan-Kang Wu, Chao-Rong Chen, and Hasimah Abdul Rahman, “A Novel Hybrid Model for Short-Term Forecasting in PV Power Generation,” International Journal of Photoenergy, vol. 2014, Article ID 569249, 9 pages, 2014.
[8] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten (2009); The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1.
[9] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[2] Smola, Alex J., and Bernhard Schölkopf. "A Tutorial on Support Vector Regression." Statistics and Computing 14.3 (2004): 199-222. Web. 22 Jan. 2015.
[3] K.-R. Müller, A. J. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen, and V. Vapnik, “Predicting time series with support vector machines,” in Artificial Neural Networks — ICANN’97, vol. 1327, W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud, Eds. Springer Berlin Heidelberg, 1997, pp. 999–1004.
[4] Bo-Juen Chen, Ming-Wei Chang, and Chih-Jen Lin, “Load forecasting using support vector Machines: a study on EUNITE competition 2001,” Power Systems, IEEE Transactions on, vol. 19, no. 4, pp. 1821–1830, Nov. 2004.
[5] Hong, Wei-Chiang. "Chaotic Particle Swarm Optimization Algorithm in a Support Vector Regression Electric Load Forecasting Model." Energy Conversion and Management 50.1 (2009): 105-17.
[6] Rojas, I., O. Valenzuela, F. Rojas, A. Guillen, L.j. Herrera, H. Pomares, L. Marquez, and M. Pasadas. "Soft-computing Techniques and ARMA Model for Time Series Prediction." Neurocomputing 71.4-6 (2008): 519-37.
[7] Yuan-Kang Wu, Chao-Rong Chen, and Hasimah Abdul Rahman, “A Novel Hybrid Model for Short-Term Forecasting in PV Power Generation,” International Journal of Photoenergy, vol. 2014, Article ID 569249, 9 pages, 2014.
[8] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten (2009); The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1.
[9] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
DATA
"AgriMet Historical Hourly (Dayfile) Data Access -- Bureau of Reclamation." AgriMet Historical Hourly (Dayfile) Data Access -- Bureau of Reclamation. Web. 14 Apr. 2015.
"WIND GENERATION & Total Load in The BPA Balancing Authority." BPA: Balancing Authority Load & Total Wind Generation. Web. 14 Apr. 2015.
"Weather Forecasts." National Forecast Maps. National Weather Service, 12 Apr. 2015. Web. 12 Apr. 2015. <http://www.weather.gov/forecastmaps>.
"WIND GENERATION & Total Load in The BPA Balancing Authority." BPA: Balancing Authority Load & Total Wind Generation. Web. 14 Apr. 2015.
"Weather Forecasts." National Forecast Maps. National Weather Service, 12 Apr. 2015. Web. 12 Apr. 2015. <http://www.weather.gov/forecastmaps>.