1
Recurrent neural networks as forecasting models | |
Author | Suwarin Pattamavorakun |
Call Number | AIT Diss. no.IM-05-01 |
Subject(s) | Neural networks (Computer science)--Forecasting |
Note | A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Technical Science |
Publisher | Asian Institute of Technology |
Abstract | Applications of neural networks in forecasting require treatment of dynamics associated with the input signal. Feedforward neural networks for processing of dynamical systems tend to capture the dynamics by including past inputs in the input vector. However, for dynamical modelling of complex systems, there is a need to involve feedback and this leads to the use of Recurrent Neural Networks (RNNs). In addition to the various features of feedfonvard networks, RNNs allow the output from self-loops and backward connections between nodes. One of the main reasons for being interested is that there are new proposed algorithms which allow them to learn how to interact with an environment in an appropriate way. However, most RNN's training algorithms suffer from the slow convergence, high complexity in determining the gradient of the related (sum of squared) errors, and also very sensitive to the choice of learning rate and initial values of the weights. Reportedly, the structure of the network seriously affects the performance of the network model, so the problem of determining the "optimum structure" of the RNN for a given data set arises. Since the number of nodes in the input and output layers are applicationdependent, the remaining problem is how to optimally choose the number of hidden nodes in the hidden layers. The objectives of this study are twofolds. The first objective is to come up with a good algorithm for use in training RNNs. To this aim, a number of important algorithms are analyzed and evaluated empirically leading to the development of a new algorithm for speeding up the convergence of the RNNs . The second objective is the determination of the optimal structure of the RNN for a given data set. This is achieved by using the Bayesian Information Criterion(BIC) via the number of hidden nodes of the RNN. It is also proved that there is a close link between the BIC and the Efficiency Index, which indicates how good the RNN model is when it is used for that data set. This study involves eight architectures The first architecture is a fully recurrent network where all the outputs of the existing neurons are used for feedback and its neurons are fully connected. The other seven topologies are partially RNNs, namely: - The PRNN-I, where the feedback links come only from hidden nodes, - The Elman network, where the feedbacks from hidden nodes link to a set of additional inputs called context units, - The Jordan network, where the outputs are fully put back into the state layer which is in the same level as input layer, - The Narendra-Parthasarathy (N&P) network, where a partial connection with feedback links from each node of the output layer to all hidden nodes, - The Franscon1-Gori-Soda (FGS) network, where a multilayer perception is augmented with local feedback around each hidden node, - The PRN- -Il, where the data arc processed from the input layer through a recurrent layer, and then are fed back into the input layer, and - The PRNN-III, where the Elman network is modified by adding the feedback links from the context units to themselves. The study considers several important training algorithms While the real time recurrent learning (RTRL) algorithm makes use of the exact computation of the gradient of the sum of squared errors, the other five algorithms involved its approximations, designed to simplify its computation, namely: - The Yamamoto-Nikifouk (Y-N) algorithm is derived using an algebraic method instead of gradient approach. The algorithm incorporates the Error Back Propagation (EBP) method for obtaining a fictitious target signal of output of the hidden nodes and the weight parameters are obtained by Exponentially Weighted Least Squares (EWLS) method, - The Atiya-Parlos algorithm, which finds the direction of weight change by approximation and when minimizing the sum of squared errors, the error of hidden node is set equal to zero, - The YNC algorithm, which is modified from Y-N algorithm by combining three techniques EBP, Recursive Least Square (RLS) and Error Self Recurrent (ESR), and - Tthe YNS algorithm which is obtained by combining the methods of weight updating of Atiya-Parlos algorithm and the Y-N algorithm to enhance the network performance. Moreover, based on the analysis a new algorithm was devised to speed up the convergence of the RN\s. The algorithm is obtained by combining the methods of weight update of Atiya-Parlos algorithm (the algorithm find the direction of weight change by approximation), and Y-N algorithm technique (the algorithm estimate fictitious target signals of hidden nodes to update hidden weight separately from output weights), and then by adding the error self-recurrent (ESR) network to improve the error functions (to speed up the convergence and not sensitive to initial weight by calculating the errors from output unit and then these errors are fed back for determining weight updates of output unit nodes). In order to compare the performance of the different architectures and algorithms in terms of prediction accuracy and computational time, an empirical study was carried out based on two kinds of data sets; namely daily stream flow data and daily data on stock prices in Thai market. The results showed that both fully RNN and partially RNNs on some selected and the proposed algorithm could forecast the daily flow and stock price quite satisfactorily. In this study, in order to find the optimal structure of the RNN for a given data set, the combined scheme of the Baum-Haussler rule and the modified BIC was proposed. Simulation results by using the hydrological at two important stations and financial data of five important companies, showed that the new training algorithm equipped with the new proposed combined rule performed very satisfactorily in the considered cases. |
Year | 2005 |
Type | Dissertation |
School | School of Advanced Technologies (SAT) |
Department | Department of Information and Communications Technologies (DICT) |
Academic Program/FoS | Information Management (IM) |
Chairperson(s) | Manukid Parnichkun;Huynh Ngoc Phien |
Examination Committee(s) | Batanov, Dencho N.;Lewis III, Harold W.; |
Scholarship Donor(s) | Rajamangala Institute of Technology, Thailand; |
Degree | Thesis (Ph.D.) - Asian Institute of Technology, 2005 |