Code Documentation

class GeneralRegression.GenericRegressor(funcs, regressor=None, ci=0.95, **kwargs)[source]

Uses a linear regression algorithm and a transformer to perform nonlinear regression. Using a set of functions \((f_0,\dots,f_n)\), and a point \(x\), lifts the point \(x\) to \((x, f_0(x),\dots,f_n(x))\) and applies a linear regression. The result will be a nonlinear regression based on \(f_0,\dots,f_n.\)

  • funcs – a function that transforms the data points
  • regressor – the linear regression method which should be scikit-learn compatible; default: BayesianRidge
  • ci – confidence interval; float between 0 and 1.
  • kwargs – argument to be passed to funcs
fit(X, y)[source]

Calculates an orthonormal basis according to the given function basis and the linear regressor..

  • X – Training data
  • y – Target values



Predict using the Hilbert regression method

Parameters:X – data points for prediction
Returns:returns predicted values

Hilbert Space based regression

exception NpyProximation.Error(*args)[source]

Generic errors that may occur in the course of a run.

class NpyProximation.FunctionSpace(dim=1, measure=None, basis=None)[source]

A class tha facilitates a few types of computations over function spaces of type \(L_2(X, \mu)\)

  • dim – the dimension of ‘X’ (default: 1)
  • measure – an object of type Measure representing \(\mu\)
  • basis – a finite basis of functions to construct a subset of \(L_2(X, \mu)\)

Call this method to generate the orthogonal basis corresponding to the given basis. The result will be stored in a property called orth_base which is a list of function that are orthogonal to each other with respect to the measure measure over the given range domain.

inner(f, g)[source]

Computes the inner product of the two parameters with respect to the measure measure, i.e., \(\int_Xf\cdot g d\mu\).

  • f – callable
  • g – callable

the quantity of \(\int_Xf\cdot g d\mu\)

project(f, g)[source]

Finds the projection of f on g with respect to the inner product induced by the measure measure.

  • f – callable
  • g – callable

the quantity of \(\frac{\langle f, g\rangle}{\|g\|_2}g\)


Given a function f, this method finds and returns the coefficients of the series that approximates f as a linear combination of the elements of the orthogonal basis \(B\). In symbols \(\sum_{b\in B}\langle f, b\rangle b\).

Returns:the list of coefficients \(\langle f, b\rangle\) for \(b\in B\)
class NpyProximation.HilbertRegressor(deg=3, base=None, meas=None, fspace=None, c_limit=0.95)[source]

Regression using Hilbert Space techniques Scikit-Learn style.

  • deg – int, default=3 The degree of polynomial regression. Only used if base is None
  • base – list, default = None a list of function to form an orthogonal function basis
  • measNpyProximation.Measure, default = None the measure to form the \(L_2(\mu)\) space. If None a discrete measure will be constructed based on fit inputs
  • fspaceNpyProximation.FunctionBasis, default = None the function subspace of \(L_2(\mu)\), if None it will be initiated according to self.meas
fit(X, y)[source]

Calculates an orthonormal basis according to the given function space basis and the discrete measure from the training points.

  • X – Training data
  • y – Target values



Predict using the Hilbert regression method

Parameters:X – data points for prediction
Returns:returns predicted values
score(X, y, sample_weight=None)[source]

The default scoring method is the weighted mean square error

  • X
  • y
  • sample_weight

class NpyProximation.Measure(density=None, domain=None)[source]

Constructs a measure \(\mu\) based on density and domain.

  • density

    the density over the domain: + if none is given, it assumes uniform distribution

    • if a callable h is given, then \(d\mu=h(x)dx\)
    • if a dictionary is given, then \(\mu=\sum w_x\delta_x\) a discrete measure. The points \(x\) are the keys of the dictionary (tuples) and the weights \(w_x\) are the values.
  • domain – if density is a dictionary, it will be set by its keys. If callable, then domain must be a list of tuples defining the domain’s box. If None is given, it will be set to \([-1, 1]^n\)

Calculates \(\int_{domain} fd\mu\).

Parameters:f – the integrand
Returns:the value of the integral
norm(p, f)[source]

Computes the norm-p of the f with respect to the current measure, i.e., \((\int_{domain}|f|^p d\mu)^{1/p}\).

  • p – a positive real number
  • f – the function whose norm is desired.

\(\|f\|_{p, \mu}\)

class NpyProximation.Regression(points, dim=None)[source]

Given a set of points, i.e., a list of tuples of the equal lengths P, this class computes the best approximation of a function that fits the data, in the following sense:

  • if no extra parameters is provided, meaning that an object is initiated like R = Regression(P) then calling returns the linear regression that fits the data.
  • if at initiation the parameter deg=n is set, then returns the polynomial regression of degree n.
  • if a basis of functions provided by means of an OrthSystem object (R.SetOrthSys(orth)) then calling returns the best approximation that can be found using the basic functions of the orth object.
  • points – a list of points to be fitted or a callable to be approximated
  • dim – dimension of the domain

Fits the best curve based on the optional provided orthogonal basis. If no basis is provided, it fits a polynomial of a given degree (at initiation) :return: The fit.


Sets the bases of the orthogonal basis

Parameters:sysorthsys.OrthSystem object.


For technical reasons, the measure needs to be given via set_measure method. Otherwise, the Lebesque measure on \([-1, 1]^n\) is assumed.


Sets the default measure for approximation.

Parameters:meas – a measure.Measure object

Time Series Tools

class ModelSelection.TimeSeriesCV(test_ratio=0.2, train_ratio=None, index=0)[source]

This is a very naive cross validator for time series. It simply sorts the given index (default 0) and splits the sorted index into a train and a test index set according to the given ratios.

  • test_ratio – (default .2) float betweem 0. and 1., the portion of test data
  • train_ratio – (default None-> .8) float betweem 0. and 1., the portion of train data
  • index – (default 0) the index of the column that corresponds to a time parameter in the data
get_n_splits(X=None, y=None, groups=None)[source]

Returns the number of splitting iterations in the cross-validator

  • X – Always ignored, exists for compatibility.
  • y – Always ignored, exists for compatibility.
  • groups – Always ignored, exists for compatibility.

Returns the number of splitting iterations in the cross-validator which is 1 for time series.

split(X, y=None, groups=None)[source]

Generate indices to split data into training and test set.

  • X – array-like of shape (n_samples, n_features) Training data, where n_samples is the number of samples and n_features is the number of features.
  • y – array-like of shape (n_samples,), default=None The target variable for supervised learning problems.
  • groups – array-like of shape (n_samples,), default=None Group labels for the samples used while splitting the dataset into train/test set.

train The training set indices for that split. test The testing set indices for that split.