Adaptive composite frequency control of power systems using reinforcement learning
2022; Institution of Engineering and Technology; Volume: 7; Issue: 4 Linguagem: Inglês
10.1049/cit2.12103
ISSN2468-6557
AutoresChaoxu Mu, Ke Wang, Shiqian Ma, Zhiqiang Chong, Zhen Ni,
Tópico(s)Adaptive Dynamic Programming Control
ResumoCAAI Transactions on Intelligence TechnologyEarly View ORIGINAL RESEARCHOpen Access Adaptive composite frequency control of power systems using reinforcement learning Chaoxu Mu, Corresponding Author Chaoxu Mu cxmu@tju.edu.cn orcid.org/0000-0003-1055-9513 School of Electrical and Information Engineering, Tianjin University, Tianjin, China Correspondence Chaoxu Mu, School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China. Email: cxmu@tju.edu.cnSearch for more papers by this authorKe Wang, Ke Wang orcid.org/0000-0002-8306-1663 School of Electrical and Information Engineering, Tianjin University, Tianjin, ChinaSearch for more papers by this authorShiqian Ma, Shiqian Ma State Grid Tianjin Electric Power Company Electric Power Research Institute, Tianjin, ChinaSearch for more papers by this authorZhiqiang Chong, Zhiqiang Chong State Grid Tianjin Electric Power Company Electric Power Research Institute, Tianjin, ChinaSearch for more papers by this authorZhen Ni, Zhen Ni orcid.org/0000-0003-3166-4726 Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USASearch for more papers by this author Chaoxu Mu, Corresponding Author Chaoxu Mu cxmu@tju.edu.cn orcid.org/0000-0003-1055-9513 School of Electrical and Information Engineering, Tianjin University, Tianjin, China Correspondence Chaoxu Mu, School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China. Email: cxmu@tju.edu.cnSearch for more papers by this authorKe Wang, Ke Wang orcid.org/0000-0002-8306-1663 School of Electrical and Information Engineering, Tianjin University, Tianjin, ChinaSearch for more papers by this authorShiqian Ma, Shiqian Ma State Grid Tianjin Electric Power Company Electric Power Research Institute, Tianjin, ChinaSearch for more papers by this authorZhiqiang Chong, Zhiqiang Chong State Grid Tianjin Electric Power Company Electric Power Research Institute, Tianjin, ChinaSearch for more papers by this authorZhen Ni, Zhen Ni orcid.org/0000-0003-3166-4726 Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USASearch for more papers by this author First published: 17 May 2022 https://doi.org/10.1049/cit2.12103AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinked InRedditWechat Abstract With the incorporation of renewable energy, load frequency control (LFC) becomes more challenging due to uncertain power generation and changeable load demands. The electric vehicle (EV) has been a popular transportation and can also provide flexible options to play a role in frequency regulation. In this paper, a novel adaptive composite controller is designed to solve the LFC problem for the interconnected power system with electric vehicles and wind turbine. EVs are used as regulation resources to effectively compensate the power mismatch. First, the sliding mode controller is developed to reduce the random influences caused by the wind turbine generation system. Second, an auxiliary controller with reinforcement learning is proposed to produce adaptive control signals, which will be attached to the primary proportion-integration-differentiation control signal in a real-time manner. Finally, by considering random wind power, load disturbances and output constraints, the proposed scheme is verified on a two-area power system under four different cases. Simulation results demonstrate that the proposed adaptive composite frequency control scheme has a competitive performance with regard to dynamic performance. 1 INTRODUCTION There usually exist different power generation units and electrical loads in a power system. Frequency deviation is an immediate consequence of the imbalance between the electrical load and the mechanical power supplied to the connected generators. Therefore, the frequency deviation has been a useful index to evaluate the stability of a power system and the quality of electric energy [1]. The active power and frequency control are called load frequency control (LFC). When it comes to the power mismatch between generation and demand, LFC plays a fundamental role to restore system frequency, especially for an interconnected multi-area power system, in which the power mismatch over one control area will affect the frequency of other areas via tie-lines [2-4]. For the purpose of addressing this issue, some LFC designs have been proposed. For example, Yang et al. [5] completed the optimal design of the state feedback matrix and successfully damped the frequency deviation and the tie-line power fluctuation to zero. By replacing the centralised mechanism with a distributed mode, Ikram et al. [6] proposed a consensus-based algorithm to estimate power mismatch. In ref.[7], a novel continuous under-frequency load shedding scheme was presented to eliminate frequency deviation, and this scheme was adaptive to power mismatch based on local frequency measurement. At the same time, in order to increase generation reserves, renewable energy sources (RESs) have been gradually incorporated into the power system. Various RESs have greatly promoted the development of microgrids, and also facilitated the deep integration of power systems with microgrids. Meanwhile, the increasing penetration of RESs as well as microgrids brings more challenges on frequency stability. The wind turbine generation (WTG) system is a common approach to utilise the endless wind energy, while the wind energy is stochastic and even has large fluctuations in a period of time [8]. At the same time, power energy consumption is becoming more and more various. Therefore, for a microgrid, when renewable energy such as wind energy is integrated, it is obvious that there are power mismatch and frequency fluctuation issues. The micro-turbine is a typical controlled device to compensate for the power mismatch [9], but for a microgrid with renewable energy, it is unrealistic to let the micro-turbine provide all the active power. It must be supported by some energy storage devices. Typically, EVs can present some positive influences on frequency regulation by acting as energy storage devices when they are spare [10]. This paper desires to develop learning-based LFC controllers to stabilise the frequency of interconnected power systems when the wind turbine and electric vehicles are connected to the power network. Next, some typical LFC methods are reviewed. The proportion-integration-differentiation (PID) controller is a universal regulation strategy for LFC design, and many other control strategies have been applied to deal with the frequency regulation problem, such as robust control, optimal control, sliding mode control (SMC), and so on [11-15]. Among them, Sathya et al. [13] proposed an improved PID frequency control method using the Bat inspired algorithm. Based on the SMC structure, Mi et al. [14] designed an LFC controller for the multi-area power system with matching and mismatching uncertainties. In ref.[15], a distributed optimal control method was proposed to restore the nominal frequency as well as the tie-line power flows between control areas after the power mismatch. On the other hand, some adaptive and intelligent control algorithms have been reported in power systems [16-19]. Among them, Dash et al. [17] proposed an adaptive neural network (NN) control method to adjust frequency fluctuations. In ref.[18], an adaptive fuzzy logic controller was designed to stabilise the frequency. Similarly, Yousef et al. [19] also proposed an adaptive fuzzy frequency controller. These research studies preliminarily show the potential of learning-based methodologies in the LFC design. Unlike these studies, this paper is devoted to designing a learning-based controller in the optimal sense, and hence focuses on a learning technique called adaptive dynamic programing (ADP). Adaptive dynamic programing employs the knowledge of reinforcement learning and dynamic programming to complete an optimal control design by minimising a designed cost function. This learning technique usually adopts neural networks to approximate non-linear functions and has been applied in various industrial fields [20-23]. In recent years, some ADP-based control methods have been developed to regulate the frequency of power systems. For example, in ref.[24], the frequency stability of an isolated smart grid was studied based on goal-representation ADP approach. Lu et al. [25] obtained better control performance in stabilising a large power system. Sui et al. [26] effectively suppressed frequency oscillations for a power system when considering energy storage devices. In ref.[27], a novel fuzzy ADP-based controller was also developed when considering the transmission delay, so as to increase the transient stability. In addition, we found some studies on the LFC problem of microgrids to be inspiring for our work. For the microgrid containing photovoltaic, Sekhar et al. [28] proposed a tracking mechanism based on adaptive predictive correction to adjust the frequency. In ref.[29], by using centralised and distributed coordinated control methods, the primary frequency regulation of the microgrid was solved by simultaneously introducing wind turbines and EVs. The above-mentioned articles have successfully addressed the frequency stability problem of the power system to a certain extent; however, there still exist some challenges to be tackled: (1) dynamic response ability of the system frequency needs to be enhanced, and the controller needs to be further optimised; (2) more generic and effective LFC designs are required for multi-area power systems by considering the integration of wind energy and electric vehicles; (3) Most intelligent controllers are purely designed based on learning techniques and have no consideration for the stability margin; besides, the frequency regulation process lacks optimality considerations. With the aid of ADP, proposing an efficient LFC method to solve these problems is the main motivation behind this work. This paper considers an operation scenario in which wind energy and EVs are included into the multi-area power system, and thus a composite frequency controller is proposed which consists of a PID primary signal and ADP-based auxiliary signal. The main contributions include the following: (1) A novel hybrid LFC model is constructed for the multi-area interconnected power system, in which the wind turbine and electric vehicles are integrated on the basis of traditional governor-turbine units. (2) A sliding mode pitch angle controller is designed for the WTG system, which is integrated into the inner loop of the power system to provide positive active power; besides, electric vehicles are introduced in the form of EV aggregators to effectively support the frequency regulation. (3) Based on PID and ADP, an adaptive composite frequency control scheme is proposed for the wind-integrated multi-area power system, in which the auxiliary adaptive control is implemented by the action-critic NN structure and reinforcement learning mechanism. The rest of paper is arranged as follows: Section 2 describes the multi-area power system and states the LFC problem and then designs the pitch angle controller for the WTG system. Section 3 introduces the detailed design process of the PID primary controller and learning-based auxiliary controller. Section 4 performs some simulation cases on a two-area power system, and all the simulation results are comparatively analysed. Finally, Section 5 concludes this work and provides some future insights. 2 SYSTEM DESCRIPTION AND PROBLEM FORMULATION 2.1 The load frequency control problem of hybrid multi-area power system The LFC model of a classic multi-area power system is mainly composed of governors, turbines, electric loads and tie-lines. Although power systems indeed have a non-linear and time-varying nature, for the purpose of frequency analysis in the presence of load disturbances, the linearisation model is usually adopted [9]. This is accounted for the fact that the system presents an obvious linear property around a rated operating condition [1]. After considering wind turbine and EVs, a block diagram of the LFC model used for a hybrid multi-area power system is shown in Figure 1, in which each component is described by the transfer function. It can be seen that each control area contains a governor-turbine unit, EV aggregators, wind turbine, power system (namely electricity consuming system), frequency controller, pitch angle controller, electric loads and tie-lines. Each two areas transmit power through the tie-lines. Moreover, in each area, the turbine unit and WTG system provide active power, and EVs also contribute to the active power. At this time, this power system can also be seen as a microgrid. Next, the main signals and variables in each area are introduced. FIGURE 1Open in figure viewerPowerPoint Structure sketch of the i-th area in a multi-area power system For the i-th area, Δfi is the frequency deviation, Tit, Tig and Tip are the time constants of the turbine, the governor and the power system, respectively. Tij represents the interconnection gain or synchronising power coefficient between areas i and j. ΔPit and ΔXig denote the turbine power and the governor position valve, respectively. ΔPie1, …, ΔPien are the change of EVs power, and Tie1, …, Tien are the time constants of EV aggregators, where each EV aggregator supervises an EV group. Ri, Kif and Kip are the speed regulation coefficient, gains of frequency deviation and power system, respectively. ΔPid, ΔPiwg and ΔPtie,i are the disturbances from load change, WTG system and tie-line power deviation, respectively. The power system achieves electric energy balance through automatic generation control (AGC) to meet the frequency stability under load fluctuations. Since only one generation unit is involved in each area, the AGC does not need to consider the distribution coefficient. The balance between interconnected control areas is achieved by detecting the frequency and tie-line power deviations to generate the area control error (ACE) signal, given by A C E i = Δ P tie,i + K i f Δ f i . $AC{E}_{i}={\Delta}{P}_{\mathit{tie,i}}+{K}_{if}{\Delta}{f}_{i}.$ (1)It is evident that the ACE signal is a linear combination of Δfi and ΔPtie,i and is in turn utilised in the controller design. Therefore, the purpose of the controller is to adjust both the frequency deviation and the ACE to zero. Remark 1.Note that in a microgrid, the electric loads not only refer to commercial and industrial loads, such as supermarkets and factories, but also have some new types of loads, such as smart homes. They can be regarded as load disturbances, and therefore in the simulation analysis, different step disturbances are usually added to observe the frequency deviation 30. Remark 2.Electric vehicles are integrated into the power system and can be seen as energy storage devices, which are regulated by EV aggregators 31. Aggregator is the concept of EV clusters. Although a single EV is uncertain, the EV cluster composed of multiple electric cars will exhibit certain statistical regularities, and its uncertainty will be greatly reduced. It is also worth emphasising again that both electric vehicles and the wind turbine provide active power 32, and hence the influences of wind energy and electric vehicles are considered as positive power disturbances. 2.2 Effect of electric vehicles on the frequency regulation In this section, the impact of EVs on the frequency response is discussed. EVs act as distributed storage devices and respond to frequency changes by transmitting energy to the power system. In order to illustrate the effect of EVs, the relationship between frequency deviation Δf and total power mismatch ΔP is shown in Figure 2. [ΔfL, ΔfU] and [ΔPL, ΔPU] are acceptable ranges of frequency deviation and total power mismatch, respectively. EVs are controlled to enhance their charging power when Δf exceeds the upper bound value ΔfU, such as Case A in Figure 2; conversely, EVs are controlled to enhance their discharging power when Δf exceeds the lower bound value ΔfL, such as Case B in Figure 2. In this way, EVs can effectively reduce the total power mismatch of power system [33], and hence support the generation unit to compensate for the power imbalance. In addition, compared to common energy storage systems, EV stations are easy to manage and relatively low in operating costs. Therefore, it is of significance to investigate the co-design of electric vehicles and WTG system on the frequency response. FIGURE 2Open in figure viewerPowerPoint Frequency deviation and power mismatch 2.3 Sliding mode pitch angle controller In this section, we analyse the power output of the WTG system and design the corresponding controller. Due to the randomness of wind energy, the output power of the WTG system is usually fluctuating and even unstable [34]. Therefore, it is necessary to stabilise the output power of the WTG system before it is connected to the grid. Pitch angle control is the most popular approach against wind power fluctuation [35]. In this work, a sliding mode pitch angle controller is proposed to reduce power fluctuations. This controller will behave as the block 'pitch angle controller' shown in Figure 1. The whole WTG system mainly includes three parts: wind turbine, hydraulic servo system and pitch angle control system. First, the output of wind turbine can be calculated by P w = 1 2 C p ( λ , β ) v 3 ρ π ℏ 2 , ${P}_{w}=\frac{1}{2}{C}_{p}(\lambda ,\beta ){v}^{3}\rho \pi {\hslash }^{2},$ (2)where Pw is the output power and Cp(λ, β) is the power coefficient, in which λ and β are the tip speed ratio and the pitch angle, respectively. Besides, v, ρ and ℏ are the wind speed, the air density and the blade radius, respectively. The power coefficient Cp(λ, β) is expressed by C p ( λ , β ) = c 1 ( β ) λ 2 + c 2 ( β ) λ 3 + c 3 ( β ) λ 4 ${C}_{p}(\lambda ,\beta )={c}_{1}(\beta ){\lambda }^{2}+{c}_{2}(\beta ){\lambda }^{3}+{c}_{3}(\beta ){\lambda }^{4}$ (3)with ci(β) = ci0 + ci1β + ci2β2 + ci3β3 + ci4β4; i = 1, 2, 3, where ci0, …, ci4 are the parameters determined by the characteristics of the wind turbine. The tip speed ratio is λ = ω ℏ v , $\lambda =\frac{\omega \hslash }{v},$ (4)where ω is the angular velocity and can be calculated by ω 2 = ∫ 2 M P w − P w g d t . ${\omega }^{2}=\int \frac{2}{M}\left({P}_{w}-{P}_{wg}\right)dt.$ (5) In Equation (5), M is the inertia moment of the wind turbine. In this design, WTG is used as a squirrel-cage induction generator [36], and the output power Pwg can be calculated by P w g = − 3 V p 2 ϱ ( 1 − ϱ ) D 2 D 2 + ϱ D 1 2 + ϱ 2 G 1 + G 2 2 , ${P}_{wg}=\frac{-3{V}_{p}^{2}\varrho (1-\varrho ){D}_{2}}{{\left({D}_{2}+\varrho {D}_{1}\right)}^{2}+{\varrho }^{2}{\left({G}_{1}+{G}_{2}\right)}^{2}},$ (6)where Vp is the phase voltage; G1 and G2 are the reactances of the stator and rotor, respectively; D1 and D2 are the resistances of the stator and rotor, respectively. Also note that ϱ = (ω0 − ω)/ω0 is the slip of the induction generator, where ω0 is the synchronous angular velocity. In this paper, the pitch angle of wind turbine is from 10° to 90° and will be regulated by the sliding mode controller. Next, one considers such a relation τw = Pw/ω, where τw is the output torque of the wind turbine and can be given by τ w = 1 2 ω C p ( λ , β ) v 3 ρ π ℏ 2 ≜ f ( w , β , v ) , ${\tau }_{w}=\frac{1}{2\omega }{C}_{p}(\lambda ,\beta ){v}^{3}\rho \pi {\hslash }^{2}\triangleq f(w,\beta ,v),$ (7)where Equation (2) has been used. In order to design a controller, τw needs to be linearised around the rated operating point, which is denoted by τwop = f(wop, βop, vop), where ωop, βop and vop are steady-state or rated values of ω, β and v, respectively. With the help of Taylor expansion, (7) can be linearised as follows: τ w − τ wop = A 1 Δ ω + A 2 Δ β + A 3 Δ v ${\tau }_{w}-{\tau }_{\mathit{wop}}={A}_{1}{\Delta}\omega +{A}_{2}{\Delta}\beta +{A}_{3}{\Delta}v$ (8) with A 1 = ∂ f ∂ w = − C p ρ π ℏ 2 v 3 2 ω 2 + ρ π ℏ 3 v 2 2 ω ∂ C p ∂ λ , A 2 = ∂ f ∂ β = ρ π ℏ 2 v 3 2 ω ∂ C p ∂ β , A 3 = ∂ β ∂ v = 3 C p ρ π ℏ 2 v 2 2 ω − ρ π ℏ 3 v 2 ∂ C p ∂ λ , $\begin{array}{rl}\hfill {A}_{1}& =\frac{\partial f}{\partial w}=-\frac{{C}_{p}\rho \pi {\hslash }^{2}{v}^{3}}{2{\omega }^{2}}+\frac{\rho \pi {\hslash }^{3}{v}^{2}}{2\omega }\frac{\partial {C}_{p}}{\partial \lambda },\hfill \\ \hfill {A}_{2}& =\frac{\partial f}{\partial \beta }=\frac{\rho \pi {\hslash }^{2}{v}^{3}}{2\omega }\frac{\partial {C}_{p}}{\partial \beta },\hfill \\ \hfill {A}_{3}& =\frac{\partial \beta }{\partial v}=\frac{3{C}_{p}\rho \pi {\hslash }^{2}{v}^{2}}{2\omega }-\frac{\rho \pi {\hslash }^{3}v}{2}\frac{\partial {C}_{p}}{\partial \lambda },\hfill \end{array}$ where Δω = ω − ωop, Δβ = β − βop and Δv = v − vop. Note that during this process, the higher-order items have been ignored. The dynamic model of WTG system can be expressed as τ w − τ w g = M ω ̇ , ${\tau }_{w}-{\tau }_{wg}=M\dot{\omega },$ (9)where τwg is the generator torque. Also note that at a specific operating point, the turbine and generator torques are assumed to be the same. Thus, by considering Equations (8) and (9), the model of the WTG system can be formulated as M Δ ω ̇ = A 1 Δ ω + A 2 Δ β + A 3 Δ v . $M{\Delta}\dot{\omega }={A}_{1}{\Delta}\omega +{A}_{2}{\Delta}\beta +{A}_{3}{\Delta}v.$ (10) At this point, the linear model of the WTG system is obtained and then we can proceed to design the sliding mode pitch angle controller. First, the sliding mode variable ξ is defined as ξ = ς Δ ω $\xi =\varsigma {\Delta}\omega $ (11)with a constant gain ς. Since the pitch angle controller works only when the output power Pwg is greater than the rated power Prg, so Δω > 0 can be obtained. By adopting the reaching law ξ ̇ = − η ξ − ε s a t ( ξ ) $\dot{\xi }=-\eta \xi -\varepsilon sat(\xi )$ [37], the sliding mode pitch angle controller is designed as u w = Δ β = − 1 ς A 2 M η ξ + ε s a t ( ξ ) + ς A 1 Δ ω + ς A 3 Δ v , ${u}_{w}={\Delta}\beta =-\hspace*{.5em}\frac{1}{\varsigma {A}_{2}}\left(M\left(\eta \xi \,+\varepsilon sat(\xi )\right)\,+\varsigma {A}_{1}{\Delta}\omega \,+\varsigma {A}_{3}{\Delta}v\right),$ (12)where η and ɛ are positive constants, sat(ξ) is the saturation function to reduce the chattering. Define the Lyapunov function as L = 1 2 ξ 2 $L=\frac{1}{2}{\xi }^{2}$ , then we can obtain L ̇ = ξ ξ ̇ = ς Δ ω ( − η ξ − ε s a t ( ξ ) ) . $\dot{L}=\xi \dot{\xi }=\varsigma {\Delta}\omega (-\eta \xi -\varepsilon sat(\xi )).$ Since Δω > 0, we can know that L ̇ ≤ 0 $\dot{L}\le 0$ . Based on the Lyapunov stability theory, it is concluded that the designed sliding mode pitch angle controller can let the WTG system be asymptotically stable. Until now, the integration of wind energy is solved and the corresponding controller has been designed. In the next section, we will design a learning-based composite LFC controller. 3 ADAPTIVE-CRITIC-BASED COMPOSITE DESIGN FOR FREQUENCY REGULATION The proposed control strategy utilises the PID signal as the primary control signal while introducing an adaptive critic signal to adjust the dynamic response. This adaptive critic control scheme is implemented by the heuristic action-critic NN structure. The adaptive critic mechanism is elaborated as follows: The critic NN estimates the cost function, which is composed of the control cost and the environment reward [38]. The action NN updates the control signal under the estimated cost function, so that the control signal can be adaptive to the system. First, the cost function is defined by J ( t ) = ∑ t = ℓ ∞ γ t − ℓ U ( x ( t ) , u ( t ) , t ) , $J(t)=\sum\limits _{t=\ell }^{\infty }{\gamma }^{t-\ell }U(x(t),u(t),t),$ (13)where U(x(t), u(t), t) is the utility function, in which x(t), u(t) and γ are the state vector, control signal and discount factor, respectively. Second, using the Bellman optimality principle yields the optimal cost function J*(t) given by J ∗ ( t ) = min u ( t ) U ( x ( t ) , u ( t ) , t ) + γ J ∗ ( t + 1 ) , ${J}^{\ast }(t)=\underset{u(t)}{\mathrm{min}\,}\left\{U(x(t),u(t),t)+\gamma {J}^{\ast }(t+1)\right\},$ (14)where J*(t) and J*(t + 1) are minimum cost functions for the corresponding time. From equation (14), one can observe that it is not easy to get the optimal control signal u*(t) since the future cost J*(t + 1) cannot be known prior. In our design, two NNs are applied to solve this equation (14) such that the optimal control signal u*(t) can be approximately obtained forward in time. 3.1 Critic neural network The critic network is implemented by a single-hidden-layer NN, shown in Figure 3, where kc and mc are the neuron numbers of the input and hidden layers, respectively. The sigmoid function is used as the activation function. For an independent variable z, it is ψ ( z ) = 1 − e − z 1 + e − z . $\psi (z)=\frac{1-{e}^{-z}}{1+{e}^{-z}}.$ (15) FIGURE 3Open in figure viewerPowerPoint Critic neural network (NN) structure Specific to the multi-area power system, the cost function Ji(t) is defined for the i-th area: J i ( t ) = U i x i a ( t ) , u i a ( t ) , t + γ J i ( t + 1 ) , ${J}_{i}(t)={U}_{i}\left({x}_{ia}(t),{u}_{ia}(t),t\right)+\gamma {J}_{i}(t+1),$ (16)where uia(t) is the adaptive auxiliary control, which can be approximately estimated by the action NN. xia(t) is the state vector generated by frequency deviations. Therefore, the input vector x i c ( t ) ∈ R k c ${x}_{ic}(t)\in {\mathbb{R}}^{{k}_{c}}$ of critic NN is denoted by x i c ( t ) = u i a ( t ) , Δ f i ( t ) , … , Δ f i t − k c + 2 T . ${x}_{ic}(t)={\left[{u}_{ia}(t),{\Delta}{f}_{i}(t),\dots ,{\Delta}{f}_{i}\left(t-{k}_{c}+2\right)\right]}^{T}.$ By using the critic NN, Ji(t) can be estimated by p ι c ( t ) = ∑ ι = 1 m c w κ ι c 1 ( t ) x i c ( t ) , κ = 1 , … , k c ${p}_{\iota }^{c}(t)=\sum\limits _{\iota =1}^{{m}_{c}}{w}_{\kappa \iota }^{c1}(t){x}_{ic}(t),\kappa =1,\dots ,{k}_{c}$ (17a) q ι c ( t ) = 1 − e − p ι c ( t ) 1 + e − p ι c ( t ) , ι = 1 , … , m c ${q}_{\iota }^{c}(t)=\frac{1-{e}^{-{p}_{\iota }^{c}(t)}}{1+{e}^{-{p}_{\iota }^{c}(t)}},\iota =1,\dots ,{m}_{c}$ (17b) J ˆ i ( t ) = ∑ ι = 1 m c w ι c 2 ( t ) q ι c ( t ) , ${\widehat{J}}_{i}(t)=\sum\limits _{\iota =1}^{{m}_{c}}{w}_{\iota }^{c2}(t){q}_{\iota }^{c}(t),$ (17c)where w κ ι c 1 ( t ) ${w}_{\kappa \iota }^{c1}(t)$ is the input-hidden weight; w ι c 2 ( t ) ${w}_{\iota }^{c2}(t)$ is the hidden-output weight; p ι c ( t ) ${p}_{\iota }^{c}(t)$ and q ι c ( t ) ${q}_{\iota }^{c}(t)$ are intermediate variables; and J ˆ i ( t ) ${\widehat{J}}_{i}(t)$ is the estimation of cost function. At this time, the approximation error or learning error is defined as follows: e i c ( t ) = γ J ˆ i ( t ) − J ˆ i ( t − 1 ) − r i ( t ) , ${e}_{ic}(t)=\gamma {\widehat{J}}_{i}(t)-\left({\widehat{J}}_{i}(t-1)-{r}_{i}(t)\right),$ (18)which can be driven to zero by minimising such a squared error E i c ( t ) = 1 2 e i c 2 ( t ) . ${E}_{ic}(t)=\frac{1}{2}\,{e}_{ic}^{2}(t).$ Based on this squared error, one only needs to design appropriate weight updating rules to make the actual cost J ˆ i ( t ) ${\widehat{J}}_{i}(t)$ approximate to the optimal cost function. For this purpose, the back-propagation-based gradient-descent method is adopted, and thus the updating rule of the hidden-output weight vector is given by Δ w ι c 2 ( t ) = − α c ∂ E i c ( t ) ∂ w ι c 2 ( t ) = − α c ∂ E i c ( t ) ∂ J ˆ i ( t ) ∂ J ˆ i ( t ) ∂ w ι c 2 ( t ) . ${\Delta}{w}_{\iota }^{c2}(t)=-{\alpha }_{c}\frac{\partial {E}_{ic}(t)}{\partial {w}_{\iota }^{c2}(t)}=-{\alpha }_{c}\frac{\partial {E}_{ic}(t)}{\partial {\widehat{J}}_{i}(t)}\frac{\partial {\widehat{J}}_{i}(t)}{\partial {w}_{\iota }^{c2}(t)}.$ (19)Similarly, using the chain rule, the input-hidden weight vector is updated by Δ w κ ι c 1 ( t ) = − α c ∂ E i c ( t ) ∂ w κ ι c 1 ( t ) = − α c ∂ E i c ( t ) ∂ J ˆ i ( t ) ∂ J ˆ i ( t ) ∂ q ι c ( t ) ∂ q ι c ( t ) ∂ p ι c ( t ) ∂ p ι c ( t ) ∂ w κ ι c 1 ( t ) , $\begin{array}{rl}
Referência(s)