============================= Nonlinear Regression ============================= From Linear Regression to Nonlinear Regression =================================================== Given an arbitrary set of point :math:`\{(X_{i1},\dots, X_{in}, Y_i): i=1,\dots,m\}` and an arbitrary norm :math:`\|\cdot\|`, a *linear regression*, :math:`\ell_{\|\cdot\|}`, with respect to the norm corresponds to a tuple of numbers :math:`(a_1, \dots, a_n, b)` such that .. math:: \sum_{i=1}^{m}\|\sum_{j=1}^n a_i X_{ij} + b - Y_i\| is minimized. Generally the norm is taken to be a wighted Euclidean norm. Then the association :math:`\ell_{\|\cdot\|}(x_1,\dots,x_n)=\sum_{j=1}^n a_i x_i + b` provides an approximation for :math:`Y_i`'s with respect to :math:`\|\cdot\|`. Suppose that a set of basic functions, :math:`B=\{f_0,\dots, f_p\}`, a set of data points :math:`X = \{(X_i, Y_i) : i=1,\dots, m\}`, and a norm :math:`\|\cdot\|` are given. A nonlinear regression, :math:`\mathcal{R}_B`, based on :math:`B` and :math:`\|\cdot\|` corresponds to a tuple :math:`a_0,\dots, a_p` such that .. math:: \sum_{i=1}^m \|\sum_{j=0}^p a_j f_j(X_i)-Y_i\| is minimized. Define a transformer .. math:: \begin{array}{rcl} T_B:X & \longrightarrow & \mathbb{R}^{p+1}\\ x & \mapsto & (f_0(x),\dots, f_p(x)). \end{array} Then it is clear that :math:`\mathcal{R}_B(x) = \ell_{\|\cdot\|}\circ T_{B}(x)`. Orthonormal system of functions =============================== Let `X` be a topological space and :math:`\mu` be a finite Borel measure on `X`. The bilinear function :math:`\langle\cdot,\cdot\rangle` defined on :math:`L_2(X, \mu)` as :math:`\langle f, g\rangle = \int_X fg d\mu` is an inner product which turns :math:`L_2(X, \mu)` into a Hilbert space. Let us denote the family of all continuous real valued functions on a non-empty compact space `X` by :math:`\textrm{C}(X)`. Suppose that among elements of :math:`\textrm{C}(X)`, a subfamily `A` of functions are of particular interest. Suppose that `A` is a subalgebra of :math:`\textrm{C}(X)` containing constants. We say that an element :math:`f\in\textrm{C}(X)` can be approximated by elements of `A`, if for every :math:`\epsilon>0`, there exists :math:`p\in A` such that :math:`|f(x)-p(x)|<\epsilon` for every :math:`x\in X`. The following classical results guarantees when every :math:`f\in\textrm{C}(X)` can be approximated by elements of `A`. Let :math:`(V, \langle\cdot,\cdot\rangle)` be an inner product space with :math:`\|v\|_2=\langle v,v\rangle^{\frac{1}{2}}`. A basis :math:`\{v_{\alpha}\}_{\alpha\in I}` is called an orthonormal basis for `V` if :math:`\langle v_{\alpha},v_{\beta}\rangle=\delta_{\alpha\beta}`, where :math:`\delta_{\alpha\beta}=1` if and only if :math:`\alpha=\beta` and is equal to `0` otherwise. Every given set of linearly independent vectors can be turned into a set of orthonormal vectors that spans the same sub vector space as the original. The following well-known result gives an algorithm for producing such orthonormal from a set of linearly independent vectors: .. note:: **Gram--Schmidt** Let :math:`(V,\langle\cdot,\cdot\rangle)` be an inner product space. Suppose :math:`\{v_{i}\}^{n}_{i=1}` is a set of linearly independent vectors in `V`. Let .. math:: u_{1}:=\frac{v_{1}}{\|v_{1}\|_2} and (inductively) let .. math:: w_{k}:=v_{k}-\sum_{i=1}^{k-1}\langle v_{k},u_{i}\rangle u_{i}\textrm{ and } u_{k}:=\frac{w_{k}}{\|w_{k}\|_2}. Then :math:`\{u_{i}\}_{i=1}^{n}` is an orthonormal collection, and for each `k`, .. math:: span\{u_{1},u_{2},\cdots,u_{k}\}=span\{v_{1},v_{2},\cdots,v_{k}\}. Note that in the above note, we can even assume that :math:`n=\infty`. Let :math:`B=\{v_1, v_2, \dots\}` be an ordered basis for :math:`(V,\langle\cdot,\cdot\rangle)`. For any given vector :math:`w\in V` and any initial segment of `B`, say :math:`B_n=\{v_1,\dots,v_n\}`, there exists a unique :math:`v\in\textrm{span}(B_n)` such that :math:`\|w-v\|_2` is the minimum: .. note :: Let :math:`w\in V` and `B` a finite orthonormal set of vectors (not necessarily a basis). Then for :math:`v=\sum_{u\in B}\langle u,w\rangle u` .. math:: \|w-v\|_2 = \min_{z\in\textrm{span}(B)}\|w-z\|_2. Now, let :math:`\mu` be a finite measure on `X` and for :math:`f,g\in\textrm{C}(X)` define :math:`\langle f,g\rangle=\int_Xf g d\mu`. This defines an inner product on the space of functions. The norm induced by the inner product is denoted by :math:`\|\cdot\|_{2}`. It is evident that .. math:: \|f\|_{2}\leq\|f\|_{\infty}\mu(X),~\forall f\in\textrm{C}(X), which implies that any good approximation in :math:`\|\cdot\|_{\infty}` gives a good :math:`\|\cdot\|_{2}`-approximation. But generally, our interest is the other way around. Employing Gram-Schmidt procedure, we can find :math:`\|\cdot\|_{2}` within any desired accuracy, but this does not guarantee a good :math:`\|\cdot\|_{\infty}`-approximation. The situation is favorable in finite dimensional case. Take :math:`B=\{p_1,\dots,p_n\}\subset\textrm{C}(X)` and :math:`f\in\textrm{C}(X)`, then there exists :math:`K_f>0` such that for every :math:`g\in\textrm{span}(B\cup\{f\})`, .. math:: K_f\|g\|_{\infty}\leq\|g\|_{2\leq}\|g\|_{\infty}\mu(X). Since `X` is assumed to be compact, :math:`\textrm{C}(X)` is separable, i.e., :math:`\textrm{C}(X)` admits a countable dimensional dense subvector space (e.g. polynomials for when `X` is a closed, bounded interval). Thus for every :math:`f\in\textrm{C}(X)` and every :math:`\epsilon>0` one can find a big enough finite `B`, such that the above inequality holds. In other words, good enough :math:`\|\cdot\|_{2}`-approximations of `f` give good :math:`\|\cdot\|_{\infty}`-approximations, as desired. **Example.** Polynomial regression on 2-dimensional random data:: from mpl_toolkits.mplot3d import Axes3D from matplotlib import cm import matplotlib.pyplot as plt import numpy as np from GeneralRegression.NpyProximation import HilbertRegressor, Measure from GeneralRegression.extras import FunctionBasis def randrange(n, vmin, vmax): ''' Helper function to make an array of random numbers having shape (n, ) with each number distributed Uniform(vmin, vmax). ''' return (vmax - vmin)*np.random.rand(n) + vmin # degree of polynomials deg = 2 FB = FunctionBasis() B = FB.poly(2, deg) # initiate regressor regressor = HilbertRegressor(base=B) # number of random points n = 20 fig = plt.figure() ax = fig.add_subplot(111, projection='3d') for c, m, zlow, zhigh in [('k', 'o', -5, -2.5)]: xs = randrange(n, 2.3, 3.2) ys = randrange(n, 0, 1.0) zs = randrange(n, zlow, zhigh) ax.scatter(xs, ys, zs, c=c, s=10, marker=m) ax.set_xlabel('$X$') ax.set_ylabel('$Y$') ax.set_zlabel('$f(X,Y)$') X = np.array([np.array((xs[_], ys[_])) for _ in range(n)]) y = np.array([np.array((zs[_],)) for _ in range(n)]) X_ = np.arange(2.3, 3.2, 0.02) Y_ = np.arange(0, 1.0, 0.02) _X, _Y = np.meshgrid(X_, Y_) # fit the regressor regressor.fit(X, y) # prepare the plot Z = [] for idx in range(_X.shape[0]): _X_ = _X[idx] _Y_ = _Y[idx] _Z_ = [] for jdx in range(_X.shape[1]): t = np.array([np.array([_X_[jdx], _Y_[jdx]])]) _Z_.append(regressor.predict(t)[0]) Z.append(np.array(_Z_)) Z = np.array(Z) surf = ax.plot_surface(_X, _Y, Z, cmap=cm.coolwarm, linewidth=0, antialiased=False, alpha=.3) .. image:: ./images/polyreg.png :height: 400px Weighted & Unweighted Regression ================================== The following example demonstrates how to use `GenericRegressor` to perform a nonlinear regression based on customized function basis. In the example we use a mixture of polynomials, trigonometric functions and exponential functions of the form :math:`x^k e^{\pm x/\ell}` to estimate the function :math:`x\times e^{\sin(x^2)} + x^2 \times\cos(x)`. The confidence interval is the default 95% for points:: from random import randint import matplotlib.pyplot as plt import numpy as np from sklearn.linear_model import BayesianRidge from GeneralRegression.GeneralRegression import GenericRegressor plt.figure(randint(1, 1000), figsize=(16, 12)) # Make up a 1-dim regression data n_samples = 100 f = lambda x: x * np.exp(np.sin(x ** 2)) + np.cos(x) * x ** 2 X = np.linspace(0., 10, n_samples).reshape((-1, 1)) y = f(X).reshape((1, -1))[0] # Function basis generator def mixed(X, p_d=3, f_d=1, l=1., e_d=2): """ A mixture of polynomial, Fourier and exponential functions :param X: the domain to be transformed :param p_d: the maximum degree of polynomials to be included :param f_d: the maximum degree of discrete Fourier transform :param e_d: the maximum degree of the `x` coefficient to be included as :math:`x^d\times e^{\pm x}` :return: the transformed data points """ points = [] for x in X: point = [1.] for deg in range(1, f_d + 1): point.append(np.sin(deg * x[0] / l)) point.append(np.cos(deg * x[0] / l)) for deg in range(1, p_d + 1): point.append(x[0] ** deg) for deg in range(e_d + 1): point.append((x[0] ** deg) * np.exp(-x[0] / l)) point.append((x[0] ** deg) * np.exp(x[0] / (2.5 * l))) points.append(np.array(point)) return np.array(points) domain = np.linspace(min(X), max(X), 150) regressor = GenericRegressor(mixed, regressor=BayesianRidge, **dict(p_d=3, f_d=50, l=1., e_d=0)) regressor.fit(X, y) y_pred = regressor.predict(domain) plt.scatter(X, y, color='red', s=10, marker='o', alpha=0.5, label="Data points") plt.plot(domain, y_pred, color='blue', label='Fit') plt.fill_between(domain.reshape((1, -1))[0], y_pred - regressor.ci_band, y_pred + regressor.ci_band, color='purple', alpha=0.1, label='CI: 95%') plt.legend(loc=2) plt.grid(True, alpha=.4) plt.show() The output looks like the following image: .. image:: ./images/unweighted.png :height: 400px Now, we use two different weights to approximate the same function. The first weight puts more emphasise on the mid points and less on the extreme points, while the second weight puts less emphasise on the lower values and more on the higher ones. This example shows how to use `HilbertRegressor` and `GeneralRegression.extras.FunctionBasis`. In contrast to the previous example, the confidence intervals are 95% defaults for the curves, not the points:: import numpy as np import matplotlib.pyplot as plt from random import randint from GeneralRegression.NpyProximation import HilbertRegressor, Measure from GeneralRegression.extras import FunctionBasis plt.figure(randint(1, 1000), figsize=(16, 12)) # Make up a 1-dim regression data n_samples = 100 f = lambda x: x * np.exp(np.sin(x**2)) + np.cos(x)*x ** 2 X = np.linspace(0., 10, n_samples).reshape((-1,1)) y = f(X) def pfe_1d(p_d=3, f_d=1, l=1.): basis = FunctionBasis() p_basis = basis.poly(1, p_d) f_basis = basis.fourier(1, f_d, l)[1:] e_basis = [] return p_basis + f_basis + e_basis domain = np.linspace(min(X), max(X), 150) x_min, x_max = X.min(), X.max() x_mid = (x_min + x_max) / 2. w_min = .1 w_max = 5. ws1 = {_[0]: np.exp(-1./max(abs((_[0]-x_min)*(_[0]-x_max))/10, 1.e-5)) for _ in X} Xs1 = [_[0] for _ in X] Ws1 = [ws1[_] for _ in Xs1] ws2 = {_[0]: .1 if _[0] < x_mid else 1. for _ in X} Xs2 = [_[0] for _ in X] Ws2 = [ws2[_] for _ in Xs2] meas1 = Measure(ws1) ell = .7 B1 = pfe_1d(p_d=3, f_d=20, l=ell) regressor1 = HilbertRegressor(base=B1, meas=meas1) regressor1.fit(X, y) y_pred1 = regressor1.predict(domain) meas2 = Measure(ws2) regressor2 = HilbertRegressor(base=B1, meas=meas2) regressor2.fit(X, y) y_pred2 = regressor2.predict(domain) fig = plt.figure(randint(1, 10000), constrained_layout=True, figsize=(16, 10)) gs = fig.add_gridspec(6, 1) f_ax1 = fig.add_subplot(gs[:4, :]) f_ax1.scatter(X, y, color='red', s=10, marker='o', alpha=0.5, label="Data points") f_ax1.plot(domain, y_pred1, color='blue', label='Fit 1') f_ax1.plot(domain, y_pred2, color='teal', label='Fit 2') f_ax1.fill_between(domain.reshape((1, -1))[0], y_pred1 - regressor1.ci_band, y_pred1 + regressor1.ci_band, color='purple', alpha=0.1, label='CI: 95%') f_ax1.fill_between(domain.reshape((1, -1))[0], y_pred2 - regressor2.ci_band, y_pred2 + regressor2.ci_band, color='orange', alpha=0.1, label='CI: 95%') f_ax1.legend(loc=1) f_ax1.grid(True, linestyle='-.', alpha=.4) f_ax2 = fig.add_subplot(gs[4, :]) f_ax2.set_title('Weight 1') f_ax2.fill_between(Xs1, [0. for _ in Ws1], Ws1, label='Distibution', color='purple', alpha=.3) f_ax2.set_ylabel('Weight') f_ax3 = fig.add_subplot(gs[5:, :]) f_ax3.set_title('Weight 2') f_ax3.fill_between(Xs2, [0. for _ in Ws2], Ws2, label='Distibution', color='orange', alpha=.3) f_ax3.set_ylabel('Weight') The output looks like the following image: .. image:: ./images/weighted.png :height: 400px .. note:: The major code difference between `GenericRegressor` and `HilbertRegressor` lies in the way they accept the basis functions. The `funcs` parameter of `GenericRegressor` applies the set of basis function on the input points and returns a new set of data points. While `HilbertRegressor` uses a set of functions as `base` to perform the required calculations. Time series as regression with missing steps ================================================ The following example illustrates how a typical time series problem with missing data can be treated as a regression problem while using all existing data points. **Using unweighted nonlinear regression**:: import numpy as np import matplotlib.pyplot as plt import pandas as pd from random import randint from sklearn.linear_model import BayesianRidge from GeneralRegression.GeneralRegression import GenericRegressor from GeneralRegression.extras import Time2Interval df = pd.read_csv("./data/elec_cost.csv", parse_dates=['Effective Start Date (mm/dd/yyyy)'], infer_datetime_format=True) dm, dx = df['Effective Start Date (mm/dd/yyyy)'].min(), df['Effective Start Date (mm/dd/yyyy)'].max() time_trans = Time2Interval(dm, dx) df['T'] = df.apply(lambda x: time_trans.date2num(x['Effective Start Date (mm/dd/yyyy)']), axis=1) plt.figure(randint(1, 1000), figsize=(16, 12)) X = df[['T']].values y = df['Cost'].values def mixed(X, p_d=3, f_d=1, l=1., e_d=2): """ A mixture of polynomial, Fourier and exponential functions :param X: the domain to be transformed :param p_d: the maximum degree of polynomials to be included :param f_d: the maximum degree of discrete Fourier transform :param e_d: the maximum degree of the `x` coefficient to be included as :math:`x^d\times e^{\pm x}` :return: the transformed data points """ points = [] for x in X: point = [] point.append(1.) for deg in range(1, f_d + 1): point.append(np.sin(deg * x[0] / l)) point.append(np.cos(deg * x[0] / l)) for deg in range(1, p_d + 1): point.append(x[0] ** deg) for deg in range(e_d + 1): point.append((x[0] ** deg) * np.exp(-x[0] / l)) point.append((x[0] ** deg) * np.exp(x[0] / (2.5 * l))) points.append(np.array(point)) return np.array(points) domain = np.linspace(min(X), max(X)+.5, 300) regressor = GenericRegressor(mixed, regressor=BayesianRidge, **dict(p_d=5, f_d=50, l=1./12., e_d=-1)) regressor.fit(X, y) y_pred = regressor.predict(domain) plt.scatter(X, y, color='red', s=10, marker='o', alpha=0.5, label="Data points") plt.plot(domain, y_pred, color='blue', label='Fit') plt.fill_between(domain.reshape((1, -1))[0], y_pred - regressor.ci_band, y_pred + regressor.ci_band, color='purple', alpha=0.1, label='CI: 95%') plt.legend(loc=2) plt.grid(True, alpha=.4) plt.title('Unweighted Nonlinear Regression') The output would be tha following: .. image:: ./images/ec_unw.png :height: 400px **Using weighted nonlinear regression**:: import numpy as np import matplotlib.pyplot as plt import pandas as pd from random import randint from sklearn.linear_model import BayesianRidge from GeneralRegression.NpyProximation import HilbertRegressor, Measure from GeneralRegression.extras import FunctionBasis df = pd.read_csv("./data/elec_cost.csv", parse_dates=['Effective Start Date (mm/dd/yyyy)'], infer_datetime_format=True) dm, dx = df['Effective Start Date (mm/dd/yyyy)'].min(), df['Effective Start Date (mm/dd/yyyy)'].max() time_trans = Time2Interval(dm, dx) df['T'] = df.apply(lambda x: time_trans.date2num(x['Effective Start Date (mm/dd/yyyy)']), axis=1) plt.figure(randint(1, 1000), figsize=(16, 12)) X = df[['T']].values y = df['Cost'].values def pfe_1d(p_d=3, f_d=3, l=1.): basis = FunctionBasis() p_basis = basis.poly(1, p_d) f_basis = basis.fourier(1, f_d, l)[1:] e_basis = [] return p_basis + f_basis + e_basis domain = np.linspace(min(X), max(X)+.5, 300) x_min, x_max = X.min(), X.max() x_mid = (x_min + x_max) / 2. w_min = .1 w_max = 5. ws1 = {_[0]: 1./(1. + np.exp(-6*(_[0] - .35))) for _ in X} Xs1 = [_[0] for _ in X] Ws1 = [ws1[_] for _ in Xs1] ws2 = {_[0]: .1 if _[0] < x_mid else 1. for _ in X} Xs2 = [_[0] for _ in X] Ws2 = [ws2[_] for _ in Xs2] meas1 = Measure(ws1) ell = 1./12 B1 = pfe_1d(p_d=4, f_d=20, l=ell) regressor1 = HilbertRegressor(base=B1, meas=meas1) regressor1.fit(X, y) y_pred1 = regressor1.predict(domain) meas2 = Measure(ws2) regressor2 = HilbertRegressor(base=B1, meas=meas2) regressor2.fit(X, y) y_pred2 = regressor2.predict(domain) fig = plt.figure(randint(1, 10000), constrained_layout=True, figsize=(16, 10)) gs = fig.add_gridspec(6, 1) f_ax1 = fig.add_subplot(gs[:4, :]) f_ax1.scatter(X, y, color='red', s=10, marker='o', alpha=0.5, label="Data points") f_ax1.plot(domain, y_pred1, color='blue', label='Fit 1') f_ax1.plot(domain, y_pred2, color='teal', label='Fit 2') f_ax1.fill_between(domain.reshape((1, -1))[0], y_pred1 - regressor1.ci_band, y_pred1 + regressor1.ci_band, color='purple', alpha=0.1, label='CI: 95%') f_ax1.fill_between(domain.reshape((1, -1))[0], y_pred2 - regressor2.ci_band, y_pred2 + regressor2.ci_band, color='orange', alpha=0.1, label='CI: 95%') f_ax1.legend(loc=1) f_ax1.grid(True, linestyle='-.', alpha=.4) f_ax2 = fig.add_subplot(gs[4, :]) f_ax2.set_title('Weight 1') f_ax2.fill_between(Xs1, [0. for _ in Ws1], Ws1, label='Distibution', color='purple', alpha=.3) f_ax2.set_ylabel('Weight') f_ax3 = fig.add_subplot(gs[5:, :]) f_ax3.set_title('Weight 2') f_ax3.fill_between(Xs2, [0. for _ in Ws2], Ws2, label='Distibution', color='orange', alpha=.3) f_ax3.set_ylabel('Weight') The output would be tha following: .. image:: ./images/ec_w.png :height: 400px And for what it worth **Using support vector regression with rbf**:: import numpy as np import matplotlib.pyplot as plt import pandas as pd from random import randint from sklearn.svm import SVR from GeneralRegression.extras import Time2Interval df = pd.read_csv("./data/elec_cost.csv", parse_dates=['Effective Start Date (mm/dd/yyyy)'], infer_datetime_format=True) dm, dx = df['Effective Start Date (mm/dd/yyyy)'].min(), df['Effective Start Date (mm/dd/yyyy)'].max() time_trans = Time2Interval(dm, dx) df['T'] = df.apply(lambda x: time_trans.date2num(x['Effective Start Date (mm/dd/yyyy)']), axis=1) plt.figure(randint(1, 1000), figsize=(16, 12)) X = df[['T']].values y = df['Cost'].values domain = np.linspace(min(X), max(X)+.5, 300) svr = SVR(kernel='rbf', tol=.0000001, C=10., epsilon=.0001) svr.fit(X, y) y_pred = svr.predict(domain) plt.scatter(X, y, color='red', s=10, marker='o', alpha=0.5, label="Data points") plt.plot(domain, y_pred, color='blue', label='Fit') plt.legend(loc=2) plt.grid(True, alpha=.4) plt.title('Unweighted Nonlinear Regression') The output would be tha following: .. image:: ./images/ec_svr.png :height: 400px