Nonlinear Regression – updated 2024-05-19
Regression is a procedure for adjusting coefficient values in a mathematical model to have the model best fit the data. In nonlinear regression the model coefficients are not linear in the model.
I have written a book on the topic: Nonlinear Regression Modeling. However, this section of the website provides some basic techniques, and offers software.
The model and data can represent either steady-state ( alternately called static or equilibrium) or a transient response. Further, the model can describe either a single-input single-output, such as y=f(x), or it could represent a multiple-input multiple-output process.
The criterion for regression quality could be the classical vertical sum of squared deviations (SSD) between model and data responses, or it could be any number of other criteria such as maximum likelihood, normal deviation (Akaho’s approximation, or total squared deviation), etc.
The objective in regression is to minimize SSD (or alternate measures of goodness of fit) by adjusting model coefficient values. This is an optimization statement. In nonlinear regression the optimization is an iterative numerical search procedure. Many techniques are commonly used (Levenberg-Marquardt, back propagation, particle swarm, GRG, Hooke-Jeeves, genetic algorithms, etc.). But, in my experience leapfrogging (LF) is the best choice when considering generality, robustness, probability of finding the global optimum, code simplicity, and the computational burden. So, the demonstration here uses LF, first published by Rhinehart, R. R., M. Su, and U. Manimegalai-Sridhar, “Leapfrogging and Synoptic Leapfrogging: a new optimization approach”, Computers & Chemical Engineering, Vol. 40, 11 May 2012, pp. 67-81.
Iterative techniques require a criterion for defining convergence – to identify when the optimizer is close enough to a solution, and can stop. Often this criterion is based on the incremental changes of the coefficient values (the decision variables, DV), requiring a user to specify a tolerance (or precision) threshold on the DV-value increments. However, this requires the user to forecast what is a meaningful increment, and often cannot be truly done until after the model is identified, permitting analysis of model sensitivity to coefficient value and model uncertainty relative to data variation. Too small a threshold, and the optimizer takes excessive iterations. Too large, and the model is not good enough. Demonstrated here, the convergence criterion is based on the relative improvement of the model with respect to data variation. In this procedure: After each iteration, a random subset of about 30% of the data is selected, the sum of squared deviations of model to data are calculated, and plotted w.r.t. iteration. This is a signal that relaxes from an initial high value to a noisy steady state when the model is best fitting the data. When the signal is perceived to be at steady state, the optimizer is not making any detectable model improvement, and convergence should be claimed. This demonstration uses the steady state identification method of Cao and Rhinehart on a random subset as a convergence criterion. (See Cao, S., and R. R. Rhinehart, “An Efficient Method for On-Line Identification of Steady-State,” Journal of Process Control, Vol. 5, No 6, 1995, pp. 363-374.)
Finally, nonlinear regression is not guaranteed to have a single global optima. Any optimizer might get stuck in local minima. Accordingly, one should start an optimizer N times from randomized initializations, and take the best of N solutions as the reportable solution. N can be calculated from a user’s desire for a confidence, c, that the best of N trials will have found one of the solutions within a best fraction, f, of all possible solutions. N=ln(1-c)/ln(1-f). (See Iyer, M. S., and R. R. Rhinehart, “A Method to Determine the Required Number of Neural Network Training Repetitions,” IEEE Transactions on Neural Networks, Vol. 10, No. 2, March, 1999, pp. 427-432. Or, Padmanabhan, V.; and R. R. Rhinehart, “A Novel Termination Criterion for Optimization”, Proceedings of the 2005 American Control Conference, paper ThA18.3, Portland, OR, June 8-10, 2005, pp 2281-2286.)
The demonstration program is r3eda Generic LF Static Model Regression 2017-04-23 and the user guide is r3eda site Regression User Guide 2016-06-15. The demonstration here is for the simpler, and conventional, vertical SSD, but any goodness of fit metric could be used. Further, the demonstration here is for the simpler situation of a steady-state model with a single output variable (but with up to 20 model coefficients).