Artigo Acesso aberto Revisado por pares

Combined forecasting system for short‐term bus load forecasting based on clustering and neural networks

2020; Institution of Engineering and Technology; Volume: 14; Issue: 18 Linguagem: Inglês

10.1049/iet-gtd.2019.1057

ISSN

1751-8695

Autores

Ioannis P. Panapakidis, Nikolaos Skiadopoulos, Georgios C. Christoforidis,

Tópico(s)

Evaluation Methods in Various Fields

Resumo

IET Generation, Transmission & DistributionVolume 14, Issue 18 p. 3652-3664 Research ArticleFree Access Combined forecasting system for short-term bus load forecasting based on clustering and neural networks Ioannis P. Panapakidis, Corresponding Author panap@teilar.gr School of Technology, University of Thessaly, Larisa, GreeceSearch for more papers by this authorNikolaos Skiadopoulos, School of Technology, University of Thessaly, Larisa, GreeceSearch for more papers by this authorGeorgios C. Christoforidis, orcid.org/0000-0001-5595-8409 Department of Electrical and Computer Engineering, University of Western Macedonia, Kozani, GreeceSearch for more papers by this author Ioannis P. Panapakidis, Corresponding Author panap@teilar.gr School of Technology, University of Thessaly, Larisa, GreeceSearch for more papers by this authorNikolaos Skiadopoulos, School of Technology, University of Thessaly, Larisa, GreeceSearch for more papers by this authorGeorgios C. Christoforidis, orcid.org/0000-0001-5595-8409 Department of Electrical and Computer Engineering, University of Western Macedonia, Kozani, GreeceSearch for more papers by this author First published: 08 April 2020 https://doi.org/10.1049/iet-gtd.2019.1057Citations: 1AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onEmailFacebookTwitterLinked InRedditWechat Abstract Micro-grids as 'micro-graphs' of the power systems involve the management of small loads, either isolated or connected to the main grid. Load forecasting is a tool of fundamental importance in power systems design and operation. During the last years, many types of research have focused on aggregated system loads. However, few studies deal with small loads and especially with bus loads of the transmission system. While smart grids and micro-grids literature are gathering research momentum, there is an emergent need for more investigation on forecasting models for buses. In this study, the aim of this work is to propose a novel robust forecasting system for bus load predictions on a short-term horizon. The model refers to the hybridisation of clustering and feed-forward neural network (FFNN). Experimental results and analysis indicate the robustness of the model; the combination of clustering and FFNN provides better forecasts compared with the single application of the FFNN. 1 Introduction Power systems operation and planning are based on predictions such as the electricity and natural gas demand, generation capacity, market prices etc. [1–5]. Based on the time horizon, forecasting is categorised in very short-term, short-term, medium-term and long-term [6]. Short-term load forecasting (STLF) plays a major role in intra-day and day-ahead decisions in power grids and markets [7]. A large number of researches presenting various forecasting tools have been proposed in the literature. The models differ in terms of input parameter requirements, structure, complexity, execution speed and efficiency. They can be generally, classified into time-series models (e.g. ARMA, ARIMA etc.) and computational intelligent models (e.g. FFNNs, radial basis networks etc.) [8]. Despite the numerous models presented in the literature, there is a lack of a universal model that outperforms the rest. Thus, most models are application dependent; their design and formulation highly depend on the dataset, user needs and the special characteristics of the forecasting problem. Nevertheless, there are some common features that the model designer should follow: Increased flexibility, low requirements for computational resources, comprehensive operation and interpretability of the forecasting results [9]. The majority of previous works focus on aggregated loads at the national system level. However, due to the emergent development of smart grid and micro-grids technologies, there is a need to target small-size loads by testing the models proposed in the literature or develop new ones. Apart from the techno-economic assessment of distributed energy resources and storage units, an accurate prediction at a low-scale level, such as in a distribution bus or a group of medium or low voltage consumers, can contribute to designing and assessing demand-side management measures. In the majority of the cases, bus load patterns differ substantially compared to the aggregated loads [10]. Bus loads are characterised by high complexity, volatility and stochasticity [11–13]. In addition, due to possible modifications in power grid configurations, current load patterns may not be highly correlated with specific historical load records. Note that bus load forecasting may be site-specific, i.e. a dedicated forecasting model should be designed and implemented for each bus. This approach can result in increased complexity, due to a large number of buses within the transmission and distribution systems. Therefore, in geographically neighbouring buses, a general model should be built to take into account the basic characteristics of the respective bus loads [14]. There is a variety of works that have proposed models for bus loads forecasting. In [15], an ensemble forecasting model built on multi-layer perceptrons (MLPs) is part of a distribution management system at a 154 kV bus in Sanremo, Italy. The authors test various combinations of MLPs without any no comparison with other models. In [16], the test case involves nine buses connected to a transmission system of 500 kV. The paper proposes an STLF approach built on fuzzy logic. The parameters of temperature and humidity are used as inputs to the fuzzy logic. In [17], an adaptive neuro-fuzzy inference system (ANFIS) was used to forecast a South African distribution network's load. An 88/11 kV, 80 MVA, distribution substation was used as a case study. The experiments were conducted for the winter period and two weeks ahead forecast. ANFIS is also utilised in [18], aiming at the STLF of a university campus in Turkey. Inputs to the model are only historical load values. A fuzzy-ART neural network is applied to a set of nine substations in New Zealand [19]. The test case involves one day of January 2008. In [20], the problem of very-short term forecasting of reactive loads is investigated. The test case refers to a set of buses in Brazil and uses an MLP for hour-ahead forecasts. The authors of [21] proposed a method based on the autoregressive moving average (ARIMA) and phase space reconstruction back-propagation neural network (PSR-BPNN). The model is applied for the STLF of a city using seven days for training data and three days of test data. Zhang [22] proposes a hybrid model comprising fuzzy clustering, least-squares vector machine and wolf pack algorithm, in order to perform short-term load forecasting for buses containing electric bus charging stations. The authors in [23] compared the performance of various neural networks on a real load dataset in China and proposed a model based on a recurrent neural network for short-term load forecasting. The main conclusion is that the performance of different neural network models depends on the data time scale. Based on the previous literature survey, the following conclusions are drawn: (i) Most papers develop and apply a single model. No comparisons are provided with other ones for the purpose of providing strong evidence about the developed model's robustness. (ii) None of the aforementioned studies utilise a full year of hourly load test data. The following periods are used as test sets: [15]: 1 month, [16]: total monthly loads for 12 months, [17]: 2 weeks, [18]: 9 months, [19]: 1 day, [20]: 5 weeks and [21]: 3 days. (iii) The clustering tool is not implemented to support the main forecasting system. Through clustering, datasets with high homogeneity are constructed and for each cluster, a separate forecaster is trained. This paper presents a model that combines clustering and an ensemble forecaster. Simulation results indicate a positive influence on the forecaster's accuracy. In order to examine trends and seasonalities during the year, a full year test set is used in this paper. This allows the analyst to examine the predictions of working days, weekends, holidays, working days close to holidays and other daily loads of special characteristics. Also, the proposed model is compared with other commonly used computational based models found in the STLF literature. Finally, the model is flexible, i.e. it can utilise different clustering algorithms and forecasters. This fact provides a tailored model for different bus forecasting problems. The present study proposes a hybrid forecasting model based on combined forecasts [24]. The hybridisation refers to the synergy of the clustering algorithm and FFNN. While combinations of clustering and neural networks have been proposed in the literature for various test cases displaying promising results [25], the proposed model provides an automatic selection among a plethora of FFNN operating concurrently. The term 'combined' refers to the parallel operation of FFNNs that differ in terms of training algorithm and activation function. When all predictions are accomplished by the parallel FFNNs, the minimum error is searched and the final forecast is provided by the FFNN configuration that resulted in the minimum error. The buses understudy correspond to four high to medium voltage transformers located on a Greek island that is connected to the main electricity grid of Greece. Various cases are examined that differ in terms of types of input patterns. 2 Model description 2.1 Input selection Fig. 1Open in figure viewerPowerPoint Correlation coefficient curve We formulate the STLF procedure can be formulated as a time-series-based non-linear discrete-time dynamical model (1)where is the load at a time t, m is the order of the system and is a vector that contains external variables such as temperature etc. Thus, the forecasting task is to propagate historical load values in future time intervals. The value of m is either defined through mathematical analysis or determined by expertise knowledge. The elements of are problem-specific. According to [26], there are several methods for variable selection. Hence, we conduct a correlation analysis to select the appropriate historical values [27]. By examining the correlation coefficient among the current load and the historical ones, a correlation curve is constructed. Let be the load at an hour . According to the correlation analysis between the current load and the previous 216 values, load and display the highest correlation with . The correlation coefficient curve is shown in Fig. 1. However, in order to provide a real-world case example, hourly loads of the previous day of the test days are not selected. This is in agreement with the operation of the Greek wholesale energy market. The Independent Transmission System Operator (IPTO) S.A. coordinates energy transactions on a day-ahead basis and provides the official forecasts of the interconnected system's demand [28]. Due to the Greek pool market structure and operation, the forecasting of the next day's hourly loads is performed with historical loads of the two days before and the previous week. Let d with denotes the current day of the set. For the training set, it is and for the test set, it is . We consider three cases of inputs to the forecasting model to fully examine the influence of exogenous variables on the bus STLF problem under study Case 1: Only historical load values are considered, namely hourly loads of the days and . The number of inputs is 48. Case 2: Loads and average daily temperatures are considered, namely hourly loads of the days and mean daily temperatures of and and mean forecasted daily temperature of the day d. The number of inputs is 51. Case 3: Loads, average daily temperatures and day type identification codes are considered. More specifically, the following inputs are regarded: hourly loads of the days d −2 and d −7, mean daily temperatures of and and mean forecasted daily temperature of day d, day distinction code (Sunday = '1',…, Saturday = '7') of day d and month distinction code (January = '1',…, December = '12'). The number of inputs is 54. The outputs are 24-hourly loads of day d. 2.2 Clustering phase Fig. 2 shows the topology of the proposed hybrid model. Based on the case considered, the training matrix is generated. For example, the dimension of the training matrix for Case 1 is 1461 × 48. The training matrix is clustered in a specific number of clusters. We consider three different clustering algorithms to fully assess the functionality of the proposed model, namely the fuzzy C-means (FCM), K-means and Ward's algorithm [27]. Clustering provides a gating of the input data. Each subset/cluster is represented by its centroid, where the general form is (2)where is the indicator denoting the number of clusters, are the loads and average daily temperatures of days and of the k th cluster, respectively, and , is the average daily temperature of the targeted day d. Also, and are the day distinction code (DDC) and monthly distinction code (MDC) of the day d that belongs to the k th cluster. It should be noted that a load of the day d is known in the training phase. Fig. 2Open in figure viewerPowerPoint Schematic representation of the proposed model 2.3 FFNN configuration For each cluster, we train separate parallel FFNNs and provide forecasts. This concept provides the benefit to training the FFNN with patterns having high similarity (i.e. since they belong to the same cluster). All FFNNs have one hidden layer but differ in terms of the training algorithm and activation function. The number of parallel FFNNs is six. The considered training algorithms are Levenberg–Marquardt (LM), resilient backpropagation (RB) and scaled conjugate gradient backpropagation (SCGB) [29–31]. The activation functions are Linear (Lin) and Hyperbolic Tangent Sigmoid (Tan). Table 1 presents the different topologies. The 'min' operator is applied and the lowest training error is extracted. The test phase, i.e. the final prediction, is held with the FFNN that resulted in the lowest training error. Firstly, the test set matrix is formatted. Here is the targeted unknown variable. Each row of the matrix is compared with the k centroids using the Euclidean distance. Table 1. FFNN topologies No. Training algorithm Activation function Hidden layer Output layer 1 LM Lin Tan 2 LM Tan Tan 3 RB Lin Tan 4 RB Tan Tan 5 SCGB Lin Tan 6 SCGB Tan Tan The selected FFNN refers to the cluster of the most similar centroid, i.e. the one that corresponds to the smallest distance. The comparison is held for each pattern of the test set and in every case, a specific FFNN is selected. 2.4 Training phase Following the application of clustering in the training set data matrix, we extract various clusters. Next, for every cluster, each one of the six structures of Table 1 is trained in parallel with the same number of epochs, e.g. 1000. When the training is ended for all structures, the one with the minimum training error is selected and applied to the test case. There are various parameters of each algorithm that need to be properly decided. Initial experimentations with the parameters indicate that the following parameters lead to better performance: For the FFNN: The number of hidden layers: 2, 4,…, 50 (variable number; the optimal number was selected based on the minimum MAPE). The maximum number of training epochs: 1000. Initial momentum term of the LM algorithm: 0.50. The minimum gradient of performance of the LM algorithm: 10−6. Learning rate of the RB algorithm: 0.025. The minimum gradient of the performance of the RB algorithm: 10−6. The minimum gradient of performance of the SCGB algorithm: 10−6. For the RBF The spread of radial basis functions: 1.25 For the GRNN The spread of radial basis functions: 1.50 For the ENN The number of hidden layers: 8 For the SVR Type of kernel function: Gaussian Epsilon parameter: 0.40 2.5 Test phase For the selection of the closest (i.e. similar) centroid to the input patterns, the term is removed from the centroids since it is unknown during the test case. For each pattern, a calculation of the Euclidean distances is held between the pattern and the centroids. The closest centroid is drawn and the optimal FFNN structure is selected to provide the forecast. 3 Results The case study refers to four buses that cover the island of Corfu in Greece, namely 'Kerkira 1', 'Kerkira 2', 'Agios Vasilios' and 'Mesogi'. The available data, provided by the Independent Transmission System Operator (IPTO) S.A. of Greece, covers the period 01/01/2013–31/12/2016 and refer to hourly loads [28]. Figs. 3–6 present the probability density functions (pdf) for loads of buses 'Kerkira 1', 'Kerkira 2', 'Agios Vasilios' and 'Mesogi', respectively. Through the pdfs, information about the frequencies of various load values within the year can be drawn. An FFNN operates via two distinct phases, training and test, and hence, the data are split into training and test sets. The training set corresponds to the period 01/01/2013–31/12/2015 and it is used to determine the optimal FFNN parameters, such as the number of hidden layers, the number of neurons in the hidden layer, type and the parameters of the training algorithm and activation function. The training set allows experimenting with different network topologies via trial and error. The test set covers the period of a full year, i.e. 01/01/2016–31/12/2016. The test set refers to the actual application of the optimal configuration that has been drawn from the preceding training phase. The benefits of involving a complete year of the dataset are the examination of the seasonality of the demand, tracking of special patterns in the various sub-periods, investigation of load variations in bank holidays etc. Fig. 3Open in figure viewerPowerPoint Pdf for the load of the bus 'Kerkira 1' Fig. 4Open in figure viewerPowerPoint Pdf for the load of the bus 'Kerkira 2' Fig. 5Open in figure viewerPowerPoint Pdf for the load of the bus 'Agios Vasilios' Fig. 6Open in figure viewerPowerPoint Pdf for the load of the bus 'Mesogi' The assessment framework is based on a set of evaluation metrics that measure the forecasting errors. Let and be the actual and the forecasted load values of the day d, respectively. The mean absolute error (MAE) is expressed as [25] (3)The mean absolute percentage error (MAPE) expresses the percentage variation of the forecasted value from the actual one [25] (4)The mean absolute range normalised error (MARNE) is the absolute difference between the actual and forecasted values, normalised to the maximum power [32]: (5)It should be noted that a large number of clusters can lead to the training sets with a low number of members and thus, a possible poor training efficiency for the FFNNs. On the contrary, a low number of clusters lead to clusters with many dissimilar patterns. Therefore, a trade-off among clustering efficiency and neural network training robustness should be taken into account. After a series of experiments, k = 3 is selected, which satisfies the aforementioned contradicting facts. Table 2 shows the clustering distribution of the test set using the K-means algorithm. The number in brackets denotes the cluster. Table 2. Day type distribution per cluster Jan 12 (1), 15 (2), 4 (3) Jul 4 (1), 0 (2), 27 (3) Feb 22 (1), 7 (2), 0 (3) Aug 1 (1), 1 (8). 29 (3) Mar 30 (1), 1 (2), 0 (3) Sep 28 (1), 0 (2), 2 (3) Apr 30 (1), 0 (2), 0 (3) Oct 31 (1), 0 (2), 0 (3) May 31 (1), 0 (2), 0 (3) Nov 30 (1), 0 (2), 0 (3) Jun 14 (1), 2 (2), 14 (3) Dec 15 (1), 16 (2), 0 (3) Tables 3–5 present the comparison of the different configurations for Case 1 and per clustering algorithm for Agios Vasilios dataset. Tables 6–11 present the comparisons for Case 2 and Case 3. Table 3. FFNN topologies evaluation for Case 1 and FCM Algorithm Activation function Cluster Metric Hidden layer Output layer MAPE, % MAE, MW MARNE, % LM Lin. Tan. 1 4.86 1.31 3.40 LM Tan. Tan. 1 5.87 1.57 4.06 LM Lin. Tan. 2 4.32 1.34 2.79 LM Tan. Tan. 2 4.38 1.35 2.81 LM Lin. Tan. 3 8.43 1.26 3.65 LM Tan. Tan. 3 9.45 1.50 4.33 RB Lin. Tan. 1 5.90 1.55 4.02 RB Tan. Tan. 1 5.25 1.37 3.55 RB Lin. Tan. 2 5.11 1.55 3.23 RB Tan. Tan. 2 4.17 1.27 2.65 RB Lin. Tan. 3 8.55 1.29 3.73 RB Tan. Tan. 3 9.08 1.43 4.13 SCGB Lin. Tan. 1 5.36 1.42 3.67 SCGB Tan. Tan. 1 5.44 1.47 3.80 SCGB Lin. Tan. 2 5.46 1.69 3.52 SCGB Tan. Tan. 2 4.21 1.30 2.71 SCGB Lin. Tan. 3 8.47 1.28 3.70 SCGB Tan. Tan. 3 8.96 1.39 4.01 Table 4. FFNN topologies evaluation for Case 1 and K-means Algorithm Activation function Cluster Metric Hidden layer Output layer MAPE, % MAE, MW MARNE, % LM Lin. Tan. 1 4.60 1.42 2.96 LM Tan. Tan. 1 4.18 1.27 2.65 LM Lin. Tan. 2 4.76 1.27 3.24 LM Tan. Tan. 2 5.24 1.41 3.61 LM Lin. Tan. 3 8.30 1.25 3.53 LM Tan. Tan. 3 9.03 1.40 3.95 RB Lin. Tan. 1 5.59 1.70 3.54 RB Tan. Tan. 1 4.22 1.29 2.70 RB Lin. Tan. 2 5.22 1.40 3.58 RB Tan. Tan. 2 4.93 1.33 3.41 RB Lin. Tan. 3 8.52 1.29 3.65 RB Tan. Tan. 3 8.91 1.39 3.93 SCGB Lin. Tan. 1 4.88 1.51 3.14 SCGB Tan. Tan. 1 3.99 1.24 2.59 SCGB Lin. Tan. 2 4.92 1.31 3.35 SCGB Tan. Tan. 2 5.00 1.34 3.43 SCGB Lin. Tan. 3 8.50 1.29 3.64 SCGB Tan. Tan. 3 8.72 1.35 3.82 Table 5. FFNN topologies evaluation for Case 1 and Ward's algorithm Algorithm Activation function Cluster Metric Hidden layer Output layer MAPE, % MAE, MW MARNE, % LM Lin. Tan. 1 5.55 1.53 3.80 LM Tan. Tan. 1 6.52 1.80 4.47 LM Lin. Tan. 2 4.72 1.49 3.11 LM Tan. Tan. 2 4.35 1.35 2.82 LM Lin. Tan. 3 7.65 1.10 3.13 LM Tan. Tan. 3 7.60 1.10 3.10 RB Lin. Tan. 1 5.79 1.57 3.91 RB Tan. Tan. 1 5.22 1.42 3.54 RB Lin. Tan. 2 4.07 1.27 2.64 RB Tan. Tan. 2 4.08 1.27 2.66 RB Lin. Tan. 3 8.24 1.20 3.40 RB Tan. Tan. 3 7.77 1.12 3.18 SCGB Lin. Tan. 1 5.83 1.58 3.94 SCGB Tan. Tan. 1 5.52 1.51 3.76 SCGB Lin. Tan. 2 5.40 1.68 3.51 SCGB Tan. Tan. 2 4.18 1.30 2.70 SCGB Lin. Tan. 3 7.83 1.15 3.25 SCGB Tan. Tan. 3 7.75 1.13 3.20 Table 6. FFNN topologies evaluation for Case 2 and FCM Algorithm Activation function Cluster Metric Hidden layer Output layer MAPE, % MAE, MW MARNE, % LM Lin. Tan. 1 5.19 1.37 3.55 LM Tan. Tan. 1 6.14 1.65 4.26 LM Lin. Tan. 2 4.50 1.38 2.89 LM Tan. Tan. 2 4.85 1.50 3.13 LM Lin. Tan. 3 8.58 1.29 3.73 LM Tan. Tan. 3 9.23 1.43 4.12 RB Lin. Tan. 1 5.97 1.56 4.04 RB Tan. Tan. 1 4.87 1.31 3.39 RB Lin. Tan. 2 5.60 1.72 3.58 RB Tan. Tan. 2 4.32 1.34 2.79 RB Lin. Tan. 3 8.80 1.33 3.84 RB Tan. Tan. 3 9.18 1.43 4.11 SCGB Lin. Tan. 1 5.34 1.40 3.62 SCGB Tan. Tan. 1 5.43 1.45 3.76 SCGB Lin. Tan. 2 5.36 1.64 3.43 SCGB Tan. Tan. 2 4.32 1.33 2.78 SCGB Lin. Tan. 3 8.68 1.32 3.80 SCGB Tan. Tan. 3 8.82 1.34 3.88 Table 7. FFNN topologies evaluation for Case 2 and K-means Algorithm Activation function Cluster Metric Hidden layer Output layer MAPE, % MAE, MW MARNE, % LM Lin. Tan. 1 4.73 1.47 3.06 LM Tan. Tan. 1 5.03 1.55 3.24 LM Lin. Tan. 2 4.69 1.24 3.18 LM Tan. Tan. 2 5.23 1.39 3.56 LM Lin. Tan. 3 8.44 1.27 3.60 LM Tan. Tan. 3 8.69 1.34 3.80 RB Lin. Tan. 1 5.33 1.64 3.41 RB Tan. Tan. 1 4.16 1.29 2.69 RB Lin. Tan. 2 4.94 1.31 3.37 RB Tan. Tan. 2 4.94 1.33 3.41 RB Lin. Tan. 3 8.65 1.30 3.68 RB Tan. Tan. 3 8.92 1.38 3.92 SCGB Lin. Tan. 1 5.37 1.66 3.45 SCGB Tan. Tan. 1 4.42 1.35 2.82 SCGB Lin. Tan. 2 5.23 1.40 3.57 SCGB Tan. Tan. 2 4.62 1.24 3.18 SCGB Lin. Tan. 3 8.46 1.28 3.63 SCGB Tan. Tan. 3 8.80 1.36 3.84 Table 8. FFNN topologies evaluation for Case 2 and Ward's algorithm Algorithm Activation function Cluster Metric Hidden layer Output layer MAPE, % MAE, MW MARNE, % LM Lin. Tan. 1 5.07 1.39 3.46 LM Tan. Tan. 1 5.71 1.52 3.79 LM Lin. Tan. 2 4.58 1.45 3.02 LM Tan. Tan. 2 5.03 1.59 3.32 LM Lin. Tan. 3 7.73 1.11 3.16 LM Tan. Tan. 3 7.83 1.13 3.20 RB Lin. Tan. 1 6.05 1.63 4.06 RB Tan. Tan. 1 5.69 1.53 3.82 RB Lin. Tan. 2 5.25 1.61 3.35 RB Tan. Tan. 2 4.32 1.34 2.80 RB Lin. Tan. 3 8.32 1.21 3.42 RB Tan. Tan. 3 7.75 1.13 3.21 SCGB Lin. Tan. 1 6.27 1.67 4.17 SCGB Tan. Tan. 1 5.24 1.44 3.58 SCGB Lin. Tan. 2 5.80 1.81 3.77 SCGB Tan. Tan. 2 4.23 1.32 2.75 SCGB Lin. Tan. 3 7.98 1.17 3.31 SCGB Tan. Tan. 3 7.73 1.12 3.18 Table 9. FFNN topologies evaluation for Case 3 and FCM Algorithm Activation function Cluster Metric Hidden layer Output layer MAPE, % MAE, MW MARNE, % LM Lin. Tan. 1 8.48 1.28 3.69 LM Tan. Tan. 1 9.78 1.51 4.37 LM Lin. Tan. 2 4.59 1.42 2.95 LM Tan. Tan. 2 5.07 1.56 3.26 LM Lin. Tan. 3 4.97 1.34 3.46 LM Tan. Tan. 3 5.16 1.37 3.54 RB Lin. Tan. 1 8.56 1.29 3.73 RB Tan. Tan. 1 8.56 1.27 3.66 RB Lin. Tan. 2 5.89 1.79 3.72 RB Tan. Tan. 2 4.53 1.38 2.88 RB Lin. Tan. 3 5.78 1.53 3.96 RB Tan. Tan. 3 5.75 1.55 4.00 SCGB Lin. Tan. 1 8.51 1.29 3.73 SCGB Tan. Tan. 1 8.80 1.36 3.92 SCGB Lin. Tan. 2 5.15 1.59 3.31 SCGB Tan. Tan. 2 4.24 1.30 2.71 SCGB Lin. Tan. 3 5.85 1.53 3.96 SCGB Tan. Tan. 3 5.34 1.43 3.70 Table 10. FFNN topologies evaluation for Case 3 and K-means Algorithm Activation function Cluster Metric Hidden layer Output layer MAPE, % MAE, MW MARNE, % LM Lin. Tan. 1 4.60 1.41 2.94 LM Tan. Tan. 1 4.83 1.47 3.06 LM Lin. Tan. 2 4.97 1.34 3.43 LM Tan. Tan. 2 4.99 1.34 3.43 LM Lin. Tan. 3 8.35 1.25 3.55 LM Tan. Tan. 3 9.07 1.39 3.92 RB Lin. Tan. 1 5.44 1.68 3.51 RB Tan. Tan. 1 4.60 1.41 2.95 RB Lin. Tan. 2 5.70 1.52 3.88 RB Tan. Tan. 2 5.27 1.42 3.62 RB Lin. Tan. 3 8.53 1.28 3.63 RB Tan. Tan. 3 8.89 1.37 3.88 SCGB Lin. Tan. 1 5.46 1.71 3.55 SCGB Tan. Tan. 1 4.48 1.39 2.90 SCGB Lin. Tan. 2 5.52 1.50 3.84 SCGB Tan. Tan. 2 5.21 1.39 3.57 SCGB Lin. Tan. 3 8.35 1.26 3.57 SCGB Tan. Tan. 3 8.60 1.32 3.73 Table 11. FFNN topologies evaluation for Case 3 and Ward's algorithm Algorithm Activation function Cluster Metric Hidden layer Output layer MAPE, % MAE, MW MARNE, % LM Lin. Tan. 1 5.27 1.40 3.49 LM Tan. Tan. 1 6.85 1.85 4.61 LM Lin. Tan. 2 4.97 1.56 3.25 LM Tan. Tan. 2 6.34 2.02 4.22 LM Lin. Tan. 3 7.67 1.11 3.13 LM Tan. Tan. 3 7.77 1.12 3.17 RB Lin. Tan. 1 6.38 1.73 4.31 RB Tan. Tan. 1 5.80 1.59 3.95 RB Lin. Tan. 2 5.25 1.63 3.39 RB Tan. Tan. 2 4.57 1.43 2.98 RB Lin. Tan. 3 8.28 1.21 3.43 RB Tan. Tan. 3 7.75 1.14 3.22 SCGB Lin. Tan. 1 6.20 1.65 4.11 SCGB Tan. Tan. 1 5.15 1.41 3.51 SCGB Lin. Tan. 2 6.43 2.00 4.17 SCGB Tan. Tan. 2 4.48 1.40 2.93 SCGB Lin. Tan. 3 7.96 1.15 3.25 SCGB Tan. Tan. 3 7.79 1.11 3.15 During the training phase, a series of simulations took place that differed in the number of neurons in the hidden layer. The number of neurons varies between 2 and 50, with an increasing step equal to 2. The tables present the lowest values of the indicators that are recorded after examining all the different number of neurons in the hidden layer. The optimal number of neurons differs per configuration. For defining the optimal number of neurons during the training phase, we utilise the MAPE indicator. According to Table 3, the LM results in the lowest errors for the first cluster. This configuration uses a linear activation function in the hidden layer and tangent sigmoid in the output layer. Regarding the second cluster, the optimal combination refers to the tangent sigmoid activation function for both layers. In this case, the SCGB algorithm appears as the optimal selection. Finally, for the third cluster, the optimal configuration corresponds to the tangent sigmoid function for both layers and the LM training algorithm. Table 6 present the results for Case 2. The LM algorithm results in the lowest errors for tall clusters. For Case 3, the SCGB algorithm leads to lower errors in the first cluster, and the LM in both second and third clusters. The comparison between the cases shows that Case 1 leads to better forecasts. However, there is no relatively strong evidence about which type of inputs is optimal. In addition, K-means generally leads to lower errors. To fully evaluate the proposed model, a comparison is held with the following models: radial basis function neural network (RBFNN) [33], general regression neural network (GRNN) [34], Elman neural network (ENN) [35] and support vector regression (SVR) [36]. Tables 12 presents the scores of these models considering the data of Agios Vasilios and the K-means for Case 1 and Tables 13 and 14 for Case 2 and Case 3, respectively. It is shown that ENN results in lower errors followed by RBFNN. The conclusions for the cases are similar to those examined by the hybrid model. The latter outperforms all the other models. Table 12. Other models evaluation for Case 1 and K-means Model Cluster Metric MAPE, % MAE, MW MARNE, % RBFNN 1 5.01 1.37 3.33 RBFNN 2 5.46 1.41 3.61 RBFNN 3 5.89 1.53 3.76 GRNN 1 5.70 1.81 3.76 GRNN 2 6.39 1.69 4.32 GRNN 3 12.51 1.96 5.53 ENN 1 4.

Referência(s)