Title: | Geometrically Designed Spline Regression |
---|---|
Description: | Spline Regression, Generalized Additive Models, and Component-wise Gradient Boosting, utilizing Geometrically Designed (GeD) Splines. GeDS regression is a non-parametric method inspired by geometric principles, for fitting spline regression models with variable knots in one or two independent variables. It efficiently estimates the number of knots and their positions, as well as the spline order, assuming the response variable follows a distribution from the exponential family. GeDS models integrate the broader category of Generalized (Non-)Linear Models, offering a flexible approach to modeling complex relationships. A description of the method can be found in Kaishev et al. (2016) <doi:10.1007/s00180-015-0621-7> and Dimitrova et al. (2023) <doi:10.1016/j.amc.2022.127493>. Further extending its capabilities, GeDS's implementation includes Generalized Additive Models (GAM) and Functional Gradient Boosting (FGB), enabling versatile multivariate predictor modeling, as discussed in the forthcoming work of Dimitrova et al. (2025). |
Authors: | Dimitrina S. Dimitrova [aut], Vladimir K. Kaishev [aut], Andrea Lattuada [aut], Emilio L. Sáenz Guillén [aut, cre], Richard J. Verrall [aut] |
Maintainer: | Emilio L. Sáenz Guillén <[email protected]> |
License: | GPL-3 |
Version: | 0.2.7 |
Built: | 2025-03-11 08:45:28 UTC |
Source: | https://github.com/emilioluissaenzguillen/geds |
Geometrically Designed Splines (GeDS) regression is a non-parametric method
for fitting spline regression models with variable knots. The GeDS technique
is inspired by geometric principles and falls within the domain of
generalized non-linear models (GNM), which include generalized linear models
(GLM) as a special case. GeDS regression is fitted based on a sample of
observations of a response variable
, dependent on a set of
(currently up to two) covariates, assuming
has a distribution from
the exponential family. In addition, GeDS methodology is implemented both in
the context of Generalized Additive Models (GAM) and Functional Gradient
Boosting (FGB). On the one hand, GAM consist of an additive modeling
technique where the impact of the predictor variables is captured through
smooth (GeDS, in this case) functions. On the other hand, GeDS incorporates
gradient boosting machine learning technique by implementing functional
gradient descent algorithm to optimize general risk functions utilizing
component-wise GeDS estimates.
GeDS provides a novel solution to the spline regression problem and in
particular, to the problem of estimating the number and position of the knots.
The GeDS estimation method is based on: first, constructing a piecewise linear
fit (spline fit of order 2) which captures the underlying functional shape of
determined by the data (Stage A); second, approximating the latter fit through
shape preserving (variation diminishing) spline fits of higher orders
,
,
(i.e., degrees 2, 3,
) at
stage B. As a result, GeDS simultaneously produces a linear, a quadratic and
a cubic spline fit.
The GeDS method was originally developed by Kaishev et al. (2016) assuming
the response variable to be normally distributed and a corresponding
Mathematica code was provided.
The GeDS method was extended by Dimitrova et al. (2023) to cover any
distribution from the exponential family. The GeDS R package presented
here includes an enhanced R implementation of the original Normal GeDS
Mathematica code due to Kaishev et al. (2016), implemented as the
NGeDS
function and a generalization of it in the function
GGeDS
which covers the case of any distribution from the
exponential family.
The GeDS package allows also to fit two dimensional response surfaces
and to construct multivariate (predictor) models with a GeD spline component
and a parametric component (see the functions f
,
formula
, NGeDS
and
GGeDS
for details).
Dimitrova et al. (2025) have recently made significant enhancements to the
GeDS methodology, by incorporating generalized additive models (GAM) and
functional gradient boosting (FGB). On the one hand, generalized additive
models are encompassed by implementing the local-scoring algorithm
using normal GeD splines (i.e., NGeDS
) as function smoothers
within the backfitting iterations. This is implemented via the function
NGeDSgam
. On the other hand, the GeDS package incorporates
the functional gradient descent algorithm by utilizing normal GeD splines (i.e.,
NGeDS
) as base learners. This is implemented via the function
NGeDSboost
.
The outputs of both NGeDS
and GGeDS
functions are
GeDS-class
objects, while the outputs of NGeDSgam
and NGeDSboost
functions are GeDSgam-class
and
GeDSboost-class
objects, respectively. GeDS-class
,
GeDSgam-class
and GeDSboost-class
objects contain
second, third and fourth order spline fits. As described in
Kaishev et al. (2016), Dimitrova et al. (2023) and Dimitrova et al. (2025),
the "final" GeDS fit is the one minimizing the empirical deviance. Nevertheless,
the user can choose to use any of the available fits.
The GeDS package also includes some datasets where GeDS regression
proves to be very efficient and some user friendly functions that are designed
to easily extract required information. Several methods are also provided to
handle GeDS, GAM-GeDS and FGB-GeDS output results (see GeDS-class
,
GeDSgam-class
and GeDSboost-class
, respectively).
Throughout this document, we use the terms GeDS predictor model, GeDS regression and GeDS fit interchangeably.
Please report any issue arising or bug in the code to [email protected].
Package: | GeDS |
Version: | 0.2.6 |
Date: | 2025-02-10 |
License: | GPL-3 |
Dimitrina S. Dimitrova <[email protected]>,#' Vladimir K. Kaishev <[email protected]>, Andrea Lattuada <[email protected]>, Emilio L. Sáenz Guillén <[email protected]> and Richard J. Verrall <[email protected]>
Kaishev, V.K., Dimitrova, D.S., Haberman, S., & Verrall, R.J. (2016).
Geometrically designed, variable knot regression splines.
Computational Statistics, 31, 1079–1105.
DOI: doi:10.1007/s00180-015-0621-7
Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023).
Geometrically designed variable knot splines in generalized (non-)linear
models.
Applied Mathematics and Computation, 436.
DOI: doi:10.1016/j.amc.2022.127493
Dimitrova, D. S., Kaishev, V. K. and Saenz Guillen, E. L. (2025). GeDS: An R Package for Regression, Generalized Additive Models and Functional Gradient Boosting, based on Geometrically Designed (GeD) Splines. Manuscript submitted for publication.
Useful links:
Report bugs at https://github.com/emilioluissaenzguillen/GeDS/issues
This dataset contains the results of a neutron diffraction experiment on
Barium-Ferrum-Arsenide () powder carried out by
Kimber et al. (2009) and used in Kaishev et al. (2016). The neutron
diffraction intensity was measured at 1,151 different dispersion angles in
order to model the diffraction profile.
data(BaFe2As2)
data(BaFe2As2)
A data.frame
with 1151 cases and 2 variables:
angle: the dispersion angle, viewed as the independent variable.
intensity: the neutron diffraction intensity, viewed as the response variable.
Kimber, S.A.J., Kreyssig, A., Zhang, Y.Z., Jeschke, H.O., Valenti, R.,
Yokaichiya, F., Colombier, E., Yan, J., Hansen, T.C., Chatterji, T.,
McQueeney, R.J., Canfield, P.C., Goldman, A.I. and Argyriou, D.N. (2009).
Similarities between structural distortions under pressure and chemical
doping in superconducting . Nat Mater,
8, 471–475.
Kaishev, V.K., Dimitrova, D.S., Haberman, S. and Verrall, R.J. (2016).
Geometrically designed, variable knot regression splines.
Computational Statistics, 31, 1079–1105.
DOI: doi:10.1007/s00180-015-0621-7
## Not run: # to load the data data('BaFe2As2') # fit a GeDS regression and produce a simple plot of the result. See ?NGeDS # c.f. Kaishev et al. (2016), section 4.2 (Gmod <- NGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.6, phi = 0.99, q = 3, show.iters = T)) plot(Gmod) ## End(Not run)
## Not run: # to load the data data('BaFe2As2') # fit a GeDS regression and produce a simple plot of the result. See ?NGeDS # c.f. Kaishev et al. (2016), section 4.2 (Gmod <- NGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.6, phi = 0.99, q = 3, show.iters = T)) plot(Gmod) ## End(Not run)
These are computing engines called by NGeDS
and
GGeDS
, needed for the underlying fitting procedures.
BivariateFitter( X, Y, Z, W, weights = rep(1, length(X)), Indicator, beta = 0.5, phi = 0.99, min.intknots = 0, max.intknots = 300, q = 2, Xextr = range(X), Yextr = range(Y), show.iters = TRUE, tol = as.double(1e-12), stoptype = c("SR", "RD", "LR"), higher_order = TRUE, Xintknots = NULL, Yintknots = NULL ) GenBivariateFitter( X, Y, Z, W, family = family, weights = rep(1, length(X)), Indicator, beta = 0.5, phi = 0.5, min.intknots = 0, max.intknots = 300, q = 2, Xextr = range(X), Yextr = range(Y), show.iters = TRUE, tol = as.double(1e-12), stoptype = c("SR", "RD", "LR"), higher_order = TRUE )
BivariateFitter( X, Y, Z, W, weights = rep(1, length(X)), Indicator, beta = 0.5, phi = 0.99, min.intknots = 0, max.intknots = 300, q = 2, Xextr = range(X), Yextr = range(Y), show.iters = TRUE, tol = as.double(1e-12), stoptype = c("SR", "RD", "LR"), higher_order = TRUE, Xintknots = NULL, Yintknots = NULL ) GenBivariateFitter( X, Y, Z, W, family = family, weights = rep(1, length(X)), Indicator, beta = 0.5, phi = 0.5, min.intknots = 0, max.intknots = 300, q = 2, Xextr = range(X), Yextr = range(Y), show.iters = TRUE, tol = as.double(1e-12), stoptype = c("SR", "RD", "LR"), higher_order = TRUE )
X |
a numeric vector containing |
Y |
a numeric vector containing |
Z |
a vector of size |
W |
a design matrix with |
weights |
an optional vector of size |
Indicator |
contingency table (i.e., frequency of observations) for the
independent variables |
beta |
numeric parameter in the interval |
phi |
numeric parameter in the interval |
min.intknots |
optional parameter specifying the minimum number of internal knots required in Stage A's fit. Default is zero. |
max.intknots |
optional parameter allowing the user to set a maximum
number of internal knots to be added in Stage A by the GeDS estimation
algorithm. Default equals the number of internal knots |
q |
numeric parameter which allows to fine-tune the stopping rule of
stage A of GeDS, by default equal to 2. See details in the description of
|
Xextr |
boundary knots in the |
Yextr |
boundary knots in the |
show.iters |
logical variable indicating whether or not to print fitting
information at each step. Default is |
tol |
numeric value indicating the tolerance to be used in checking whether two knots should be considered different during the knot placement steps in stage A. |
stoptype |
a character string indicating the type of GeDS stopping rule
to be used. It should be either |
higher_order |
a logical defining whether to compute the higher
order fits (quadratic and cubic) after stage A is run. Default is
|
Xintknots |
a vector of starting internal knots in the |
Yintknots |
a vector of starting internal knots in the |
family |
a description of the error distribution and link function to be
used in the model. This can be a character string naming a family function
(e.g. |
A GeDS-Class
object, but without the Formula
,
extcall
, terms
and znames
slots.
Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023).
Geometrically designed variable knot splines in generalized (non-)linear
models.
Applied Mathematics and Computation, 436.
DOI: doi:10.1016/j.amc.2022.127493
NGeDS
, GGeDS
and UnivariateFitters
.
S3 method for GeDSboost-class
objects that calculates the
in-bag risk reduction ascribable to each base-learner of an FGB-GeDS model.
Essentially, it measures and aggregates the decrease in the empirical risk
attributable to each base-learner for every time it is selected across the
boosting iterations. This provides a measure on how much each base-learner
contributes to the overall improvement in the model's accuracy, as reflectedp
by the decrease in the empiral risk (loss function). This function is adapted
from varimp
and is compatible with the available
mboost-package
methods for varimp
,
including plot
, print
and as.data.frame
.
## S3 method for class 'GeDSboost' bl_imp(object, boosting_iter_only = FALSE, ...)
## S3 method for class 'GeDSboost' bl_imp(object, boosting_iter_only = FALSE, ...)
object |
an object of class |
boosting_iter_only |
logical value, if |
... |
potentially further arguments. |
See varimp
for details.
An object of class varimp
with available plot
,
print
and as.data.frame
methods.
Hothorn T., Buehlmann P., Kneib T., Schmid M. and Hofner B. (2022). mboost: Model-Based Boosting. R package version 2.9-7, https://CRAN.R-project.org/package=mboost.
library(GeDS) library(TH.data) set.seed(290875) data("bodyfat", package = "TH.data") data = bodyfat Gmodboost <- NGeDSboost(formula = DEXfat ~ f(hipcirc) + f(kneebreadth) + f(anthro3a), data = data, initial_learner = FALSE) MSE_Gmodboost_linear <- mean((data$DEXfat - Gmodboost$predictions$pred_linear)^2) MSE_Gmodboost_quadratic <- mean((data$DEXfat - Gmodboost$predictions$pred_quadratic)^2) MSE_Gmodboost_cubic <- mean((data$DEXfat - Gmodboost$predictions$pred_cubic)^2) # Print MSE cat("\n", "MEAN SQUARED ERROR", "\n", "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n", "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n", "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n") # Base Learner Importance bl_imp <- bl_imp(Gmodboost) print(bl_imp) plot(bl_imp)
library(GeDS) library(TH.data) set.seed(290875) data("bodyfat", package = "TH.data") data = bodyfat Gmodboost <- NGeDSboost(formula = DEXfat ~ f(hipcirc) + f(kneebreadth) + f(anthro3a), data = data, initial_learner = FALSE) MSE_Gmodboost_linear <- mean((data$DEXfat - Gmodboost$predictions$pred_linear)^2) MSE_Gmodboost_quadratic <- mean((data$DEXfat - Gmodboost$predictions$pred_quadratic)^2) MSE_Gmodboost_cubic <- mean((data$DEXfat - Gmodboost$predictions$pred_cubic)^2) # Print MSE cat("\n", "MEAN SQUARED ERROR", "\n", "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n", "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n", "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n") # Base Learner Importance bl_imp <- bl_imp(Gmodboost) print(bl_imp) plot(bl_imp)
A dataset with 112 entries containing annual numbers of accidents due to disasters in British coal mines for years from 1850 to 1962, considered in Carlin et al. (1992) and also in Eilers and Marx (1996).
data(coalMining)
data(coalMining)
A data.frame
with 112 entries, corresponding to the
years from 1850 to 1962. Each entry has:
accidents: number of severe accidents that have occurred each year.
years: year during which the accidents occurred.
https://people.reed.edu/~jones/141/Coal.html
Carlin, B.P., Gelfand, A.E. and Smith, A.F.M. (1992). Hierarchical Bayesian analysis of changepoint problems. Applied Statistics, 41(2), 389–405.
Eilers, P.H.C. and Marx, B.D. (1996). Flexible Smoothing with B-splines and Penalties. Statistical Science, 11(2), 89–121.
Methods for the functions coef
and
coefficients
that allow to extract the estimated
coefficients of a fitted GeDS regression from a GeDS-Class
object.
## S3 method for class 'GeDS' coef(object, n = 3L, onlySpline = TRUE, ...) ## S3 method for class 'GeDS' coefficients(object, n = 3L, onlySpline = TRUE, ...)
## S3 method for class 'GeDS' coef(object, n = 3L, onlySpline = TRUE, ...) ## S3 method for class 'GeDS' coefficients(object, n = 3L, onlySpline = TRUE, ...)
object |
the |
n |
integer value (2, 3 or 4) specifying the order ( |
onlySpline |
logical variable specifying whether only the coefficients for the GeDS component of a fitted multivariate regression model should be extracted or if, alternatively, also the coefficients of the parametric component should also be extracted. |
... |
potentially further arguments (required by the definition of the generic function). These will be ignored, but with a warning. |
These are simple methods for the functions coef
and
coefficients
.
As GeDS-class
objects contain three different fits (linear,
quadratic and cubic), it is possible to specify the order of the fit for
which GeDS regression coefficients are required via the input argument
n
.
As mentioned in the details of formula
, the
predictor model may be multivariate and it may include a (univariate or
bivariate) GeD spline component whereas the remaining variables may be part
of a parametric component. If the onlySpline
argument is set to
TRUE
(the default value), only the coefficients corresponding to the
GeD spline component of order n
of the multivariate predictor model
are extracted.
A named vector containing the required coefficients of the fitted
univariate or multivariate predictor model. The coefficients corresponding to
the variables that enter the parametric component of the fitted multivariate
predictor model are named as the variables themselves. The coefficients of
the GeDS component are coded as "N
" followed by the index of the
corresponding B-spline.
coef
for the standard definition;
NGeDS
for examples.
# Generate a data sample for the response variable # and the covariates set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N ,min = -2, max = 2)) Z <- runif(N) # Specify a model for the mean of the response Y to be a superposition of # a non-linear component f_1(X), a linear component 2*Z and a # free term 1, i.e. means <- f_1(X) + 2*Z + 1 # Add normal noise to the mean of y Y <- rnorm(N, means, sd = 0.1) # Fit to this sample a predictor model of the form f(X) + Z, where # f(X) is the GeDS component and Z is the linear (additive) component # see ?formula.GeDS for details (Gmod <- NGeDS(Y ~ f(X) + Z, beta = 0.6, phi = 0.995, Xextr = c(-2,2))) # Extract the GeD spline regression coefficients coef(Gmod, n = 3) # Extract all the coefficients, including the one for the linear component coef(Gmod, onlySpline = FALSE, n = 3)
# Generate a data sample for the response variable # and the covariates set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N ,min = -2, max = 2)) Z <- runif(N) # Specify a model for the mean of the response Y to be a superposition of # a non-linear component f_1(X), a linear component 2*Z and a # free term 1, i.e. means <- f_1(X) + 2*Z + 1 # Add normal noise to the mean of y Y <- rnorm(N, means, sd = 0.1) # Fit to this sample a predictor model of the form f(X) + Z, where # f(X) is the GeDS component and Z is the linear (additive) component # see ?formula.GeDS for details (Gmod <- NGeDS(Y ~ f(X) + Z, beta = 0.6, phi = 0.995, Xextr = c(-2,2))) # Extract the GeD spline regression coefficients coef(Gmod, n = 3) # Extract all the coefficients, including the one for the linear component coef(Gmod, onlySpline = FALSE, n = 3)
Methods for the functions coef
and
coefficients
that allow to extract the estimated
coefficients of a GeDSboost-class
or GeDSgam-class
object.
## S3 method for class 'GeDSboost' coef(object, n = 3L, ...) ## S3 method for class 'GeDSboost' coefficients(object, n = 3L, ...) ## S3 method for class 'GeDSgam' coef(object, n = 3L, ...) ## S3 method for class 'GeDSgam' coefficients(object, n = 3L, ...)
## S3 method for class 'GeDSboost' coef(object, n = 3L, ...) ## S3 method for class 'GeDSboost' coefficients(object, n = 3L, ...) ## S3 method for class 'GeDSgam' coef(object, n = 3L, ...) ## S3 method for class 'GeDSgam' coefficients(object, n = 3L, ...)
object |
the |
n |
integer value (2, 3 or 4) specifying the order (
By default |
... |
potentially further arguments (required by the definition of the generic function). These will be ignored, but with a warning. |
A named vector containing the required coefficients of the fitted multivariate predictor model.
coef
for the standard definition;
NGeDSboost
and NGeDSgam
for examples.
crossv_GeDS
performs k-fold cross-validation for tuning the relevant
parameters of the NGeDS
, GGeDS
, NGeDSgam
, and
NGeDSboost
models.
formula |
a description of the structure of the model structure, including the dependent and independent variables. |
data |
a data frame containing the variables referenced in the formula. |
model_fun |
the GeDS model to be fitted, that is, |
parameters |
to tune via cross-validation. These are: |
Two data frames, best_params
and results
.
best_params
contains the best combination of parameters according to
the cross-validated MSE. results
presents the cross-validated MSE and
the average number of internal knots across the folds for each possible
combination of parameters, given the input parameters
. In the case of
model_fun = NGeDSboost
, it also provides the cross-validated number of
boosting iterations.
################################################### # Generate a data sample for the response variable # Y and the single covariate X set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only a component # non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) data = data.frame(X = X, Y = Y) ## Not run: ## NGeDS # Define different combinations of parameters to cross-validate param = list(beta_grid = c(0.5), phi_grid = c(0.9, 0.95), q_grid = c(2)) cv_NGeDS <- crossv_GeDS(Y ~ f(X), data = data, NGeDS, n = 3L, parameters = param) print(cv_NGeDS$best_params) View(cv_NGeDS$results) ## NGeDSboost param = list(int.knots_init_grid = c(1, 2), shrinkage_grid = 1, beta_grid = c(0.3, 0.5), phi_grid = c(0.95, 0.99), q_grid = 2) cv_NGeDSboost <- crossv_GeDS(Y ~ f(X), data = data, NGeDSboost, n = 2L, n_folds = 2L, parameters = param) print(cv_NGeDSboost$best_params) View(cv_NGeDSboost$results) ## End(Not run)
################################################### # Generate a data sample for the response variable # Y and the single covariate X set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only a component # non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) data = data.frame(X = X, Y = Y) ## Not run: ## NGeDS # Define different combinations of parameters to cross-validate param = list(beta_grid = c(0.5), phi_grid = c(0.9, 0.95), q_grid = c(2)) cv_NGeDS <- crossv_GeDS(Y ~ f(X), data = data, NGeDS, n = 3L, parameters = param) print(cv_NGeDS$best_params) View(cv_NGeDS$results) ## NGeDSboost param = list(int.knots_init_grid = c(1, 2), shrinkage_grid = 1, beta_grid = c(0.3, 0.5), phi_grid = c(0.95, 0.99), q_grid = 2) cv_NGeDSboost <- crossv_GeDS(Y ~ f(X), data = data, NGeDSboost, n = 2L, n_folds = 2L, parameters = param) print(cv_NGeDSboost$best_params) View(cv_NGeDSboost$results) ## End(Not run)
This dataset contains crystallographic measurements obtained from a particle
accelerator experiment. The measurements correspond to the function
at various
values, which are used to analyze the scattering properties
of an unknown crystalline material. The dataset is available in two versions
based on the precision of the measurements:
- **'CrystalData10k'** (lower precision); - **'CrystalData300k'** (higher precision).
The goal of the experiment is to estimate from noisy data using
a GeDS model and compute its Fourier transform, which provides valuable insights
into the structure of the material.
data(CrystalData10k) data(CrystalData300k)
data(CrystalData10k) data(CrystalData300k)
A data.frame
with 1721 observations and 2 variables:
Q
(): The scattering vector, measured in inverse angstroms (
).
FQ
(a.u.): The measured function , given in arbitrary units (a.u.).
Data collected from a particle accelerator experiment.
## Not run: # Load the dataset (choose 10k or 300k version) data('CrystalData10k') # Fit a GeDS/GeDSboost model and compare how well the intensity peaks are captured Gmod <- NGeDS(F_Q ~ f(Q), data = CrystalData10k, phi = 0.999, q = 3) # for CrystalData300k set int.knots_init = 1, phi = 0.999, q = 4, instead Gmodboost <- NGeDSboost(F_Q ~ f(Q), data = CrystalData10k, phi = 0.9975, q = 4) par(mfrow = c(1,2)) plot(Gmod, n = 2) plot(Gmodboost, n = 2) ## End(Not run)
## Not run: # Load the dataset (choose 10k or 300k version) data('CrystalData10k') # Fit a GeDS/GeDSboost model and compare how well the intensity peaks are captured Gmod <- NGeDS(F_Q ~ f(Q), data = CrystalData10k, phi = 0.999, q = 3) # for CrystalData300k set int.knots_init = 1, phi = 0.999, q = 4, instead Gmodboost <- NGeDSboost(F_Q ~ f(Q), data = CrystalData10k, phi = 0.9975, q = 4) par(mfrow = c(1,2)) plot(Gmod, n = 2) plot(Gmodboost, n = 2) ## End(Not run)
This function computes derivatives of a fitted GeDS regression model.
Derive(object, order = 1L, x, n = 3L)
Derive(object, order = 1L, x, n = 3L)
object |
the |
order |
integer value indicating the order of differentiation required
(e.g. first, second or higher derivatives). Note that |
x |
numeric vector containing values of the independent variable at
which the derivatives of order |
n |
integer value (2, 3 or 4) specifying the order ( |
The function is based on splineDesign
and it
computes the exact derivative of the fitted GeDS regression.
The function uses the property that the -th derivative of a spline,
, expressed in terms of B-splines can be computed by
differentiating the corresponding B-spline coefficients (see e.g.
De Boor, 2001, Chapter X, formula (15)). Since the GeDS fit is a B-spline
representation of the predictor, it cannot work on the response scale in the
GNM (GLM) framework.
De Boor, C. (2001). A Practical Guide to Splines (Revised Edition). Springer, New York.
# Generate a data sample for the response variable # Y and the covariate X set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only # a component non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) # Fit GeDS regression with only a spline component in the predictor model Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2)) # Compute the second derivative of the cubic GeDS fit # at the points 0, -1 and 1 Derive(Gmod, x = c(0, -1, 1), order = 2, n = 4)
# Generate a data sample for the response variable # Y and the covariate X set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only # a component non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) # Fit GeDS regression with only a spline component in the predictor model Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2)) # Compute the second derivative of the cubic GeDS fit # at the points 0, -1 and 1 Derive(Gmod, x = c(0, -1, 1), order = 2, n = 4)
Method for the function deviance
that allows the user to
extract the value of the deviance corresponding to a selected GeDS, GeDSboost
or GeDSgam fit from a GeDS-Class
,
GeDSboost-Class
or GeDSgam-Class
object.
## S3 method for class 'GeDS' deviance(object, n = 3L, ...) ## S3 method for class 'GeDSboost' deviance(object, n = 3L, ...) ## S3 method for class 'GeDSgam' deviance(object, n = 3L, ...)
## S3 method for class 'GeDS' deviance(object, n = 3L, ...) ## S3 method for class 'GeDSboost' deviance(object, n = 3L, ...) ## S3 method for class 'GeDSgam' deviance(object, n = 3L, ...)
object |
the |
n |
integer value (2, 3 or 4) specifying the order ( |
... |
potentially further arguments (required by the definition of the generic function). These will be ignored, but with a warning. |
This is a method for the function deviance
. As
GeDS-class
, GeDSboost-class
and
GeDSgam-class
objects contain three different fits (linear,
quadratic and cubic), it is possible to specify the order of the GeDS fit
for which the deviance is required via the input argument n
.
A numeric value corresponding to the deviance of the selected GeDS/GeDSboost/GeDSgam fit.
deviance
for the standard definition;
GGeDS
for examples.
The dataset consists of information about the mortality of the English and Welsh male population aggregated over the years 2000, 2001 and 2002.
data(EWmortality)
data(EWmortality)
A data.frame
with 109 entries and 3 variables: Age
,
Deaths
and Exposure
. Exposure
is a mid-year estimate of
the population exposed to risk.
In general the GeDS predictor model may include a GeD spline regression component with respect to one or two independent variables and a parametric component in which the remaining covariates may enter as additive terms.
The function f
is to be used in the
formula
argument of NGeDS
or
GGeDS
in order to specify which independent variables
(covariates) should be included in the GeD spline regression component of the
predictor model.
f(x, xx = NULL, ...)
f(x, xx = NULL, ...)
x |
numeric vector containing |
xx |
numeric vector containing |
... |
further arguments. As GeDS currently allows for up to two covariates, specification of further arguments will return an error. |
This function is intended to be used only as part of the
formula
in a GeDS regression via
NGeDS
or GGeDS
and not to be called in other
cases by the user.
# Generate a data sample for the response variable Y and # the covariates X, reg1, reg2 and off set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N ,min = -2, max = 2)) reg1 <- runif(500, min = -0.1, max = 0.1) reg2 <- runif(500, min = -0.2, max = 0.2) off <- runif(500, min = -1, max = 1) # Specify a model for the mean of Y to include a component non linear # in X defined by the function f_1 and a linear one in the other covariates means <- f_1(X) + 2*reg1 + 0.5*reg2 + off # Add Normal noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) # Specify a formula that will be used to model Y as a # function of X, reg1, reg2 and off. # The covariate X is for the spline component modeled as GeDS, # reg1 and reg2 enter linearly, off is an offset, i.e. no coefficient # will be estimated for it formula <- Y ~ f(X) + reg1 + reg2 + offset(off) # Fit a GeDS model specified in formula using NGeDS (Gmod <- NGeDS(formula, beta = 0.6, phi = 0.995, Xextr = c(-2,2)))
# Generate a data sample for the response variable Y and # the covariates X, reg1, reg2 and off set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N ,min = -2, max = 2)) reg1 <- runif(500, min = -0.1, max = 0.1) reg2 <- runif(500, min = -0.2, max = 0.2) off <- runif(500, min = -1, max = 1) # Specify a model for the mean of Y to include a component non linear # in X defined by the function f_1 and a linear one in the other covariates means <- f_1(X) + 2*reg1 + 0.5*reg2 + off # Add Normal noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) # Specify a formula that will be used to model Y as a # function of X, reg1, reg2 and off. # The covariate X is for the spline component modeled as GeDS, # reg1 and reg2 enter linearly, off is an offset, i.e. no coefficient # will be estimated for it formula <- Y ~ f(X) + reg1 + reg2 + offset(off) # Fit a GeDS model specified in formula using NGeDS (Gmod <- NGeDS(formula, beta = 0.6, phi = 0.995, Xextr = c(-2,2)))
A description of the structure of the predictor model fitted using
NGeDS
or GGeDS
.
## S3 method for class 'GeDS' formula(x, ...)
## S3 method for class 'GeDS' formula(x, ...)
x |
fitted |
... |
unused in this case. |
In GeDS GNM (GLM) regression, implemented with NGeDS
and
GGeDS
, the mean of the response variable, correspondingly
transformed through an appropriate link function, is modeled using a
potentially multivariate predictor model. The latter comprises two components:
a GeD variable-knot spline regression involving up to two of the independent
variables, and a parametric component for the remaining independent variables.
The formula defines the structure of this potentially multivariate predictor.
The formulae that are input in NGeDS
and GGeDS
are similar to those input in lm
or
glm
except that the function f
should be
specified in order to identify which of the covariates enter the GeD spline
regression part of the predictor model. For example, if the predictor model
is univariate and it links the transformed mean of y
to x1
,
the predictor has only a GeD spline component and the
formula
should be in the form y ~ f(x1)
.
As noted, there may be additional independent variables, x2
,
x3
, ... which may enter linearly into the parametric component of the
predictor model and not be part of the GeD spline regression component. For
example one may use the formula y ~ f(x1) + x2 + x3
which assumes a
spline regression only between the transformed mean of y
and x1
,
while x2
and x3
enter the predictor model linearly.
Both function NGeDS
and function GGeDS
, generate
bivariate GeDS regression models. Therefore, if the functional dependence of
the mean of the response variable y
on x1
and x2
needs
to be jointly modeled and there are no other covariates, the formula for the
corresponding two dimensional predictor model should be specified as
y ~ f(x1,x2)
.
Within the argument formula
, similarly as in other R functions, it is
possible to specify one or more offset variables, i.e. known terms with fixed
regression coefficients equal to 1. These terms should be identified via the
function offset
.
A fitted GeDS object returned by the function NGeDS
or
GGeDS
, inheriting the methods for class "GeDS"
.
Methods for functions coef
, knots
, print
, predict
,
plot
, and lines
are available.
Type
Character string indicating the type of regression performed.
This can be "LM - Univ"
/"LM - Biv"
or
"GLM - Univ"
/"GLM - Biv"
, respectively corresponding to Normal
univariate/bivariate GeDS (implemented by NGeDS
), and to
generalized (GNM-GLM) univariate/bivariate GeDS (implemented by
GGeDS
).
Linear.Knots
vector containing the locations of the knots of the second order GeD spline fit produced at stage A.
Quadratic.Knots
vector containing the locations of the knots of the third order GeD spline fit produced in stage B.
Cubic.knots
Vector containing the locations of the knots of the fourth order GeD spline fit produced in stage B.
Dev.Linear
deviance of the second order GeD spline fit, produced in stage A.
Dev.Quadratic
deviance of the third order GeD spline fit, produced in stage B.
Dev.Cubic
deviance of the fourth order GeD spline fit, produced in stage B.
RSS
vector containing the deviances of the second order spline fits computed at each stage A's GeDS iteration.
Linear
list containing the results from running SplineReg
function to fit the second order spline fit of stage A.
Quadratic
list containing the results from running SplineReg
function used to fit the third order spline fit in stage B.
Cubic
list containing the results from a SplineReg
function used to fit the fourth order spline fit in stage B.
Stored
Matrix containing the knot locations estimated at each iteration of stage A.
Args
list containing the input arguments passed on the
Fitters
functions.
Call
call
to the Fitters
functions.
Nintknots
the final number of internal knots of the second order GeD spline fit produced in stage A.
iters
number of iterations performed during stage A of the GeDS fitting procedure.
Guesses
initial values for the coefficients used at each iteration of
stage A in order to estimate the spline coefficients. Since the initial
values are used only in the IRLS procedure, this slot is empty if the object
is not created by GGeDS
or GenUnivariateFitter
functions.
Coefficients
matrix containing the fitted coefficients of the GeD spline regression component and the parametric component at each iteration of stage A.
deviance
vector containing the deviances of the second order spline
fits computed at each IRLS iteration in stage A. Since the IRLS procedure is
used only in GGeDS
or GenUnivariateFitter
, this
slot is empty if the object is not created by one of these functions.
iterIrls
vector containing the numbers of IRLS iterations for all
iterations of stage A cumulatively. Since the IRLS procedure is used only in
GGeDS
or GenUnivariateFitter
, this slot is empty
if the object is not created by one of these functions.
stopinfo
list of values providing information related to the stopping
rule of stage A of GeDS. The sub-slots of stopinfo
are phis
,
phis_star
, oldintc
and oldslp
. The sub-slot phis
is a vector containing the values of the ratios of deviances (or the
difference of deviances if the LR
stopping rule was chosen). The
sub-slots phis_star
, oldintc
and oldslp
are non-empty
slots if the SR
stopping rule was chosen. These respectively contain
the values at each iteration of stage A of ,
and
. See Dimitrova et al. (2023)
for further details on these parameters.
Formula
the model formula
.
extcall
terms
terms
object containing information on the model frame.
Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023).
Geometrically designed variable knot splines in generalized (non-)linear
models.
Applied Mathematics and Computation, 436.
DOI: doi:10.1016/j.amc.2022.127493
A fitted GeDSboost object returned by the function NGeDSboost
,
inheriting the methods for class "GeDSboost"
. Methods for functions
coef
, knots
, plot
, print
, predict
,
visualize_boosting
, and bl_imp
are available.
extcall
call to the NGeDSboost
function.
formula
a formula object representing the model to be fitted.
args
a list containing the arguments passed to the NGeDSboost
function. This list includes:
response
: data.frame
containing the response variable
observations.
predictors
: data.frame
containing the observations
corresponding to the predictor variables included in the model.
base_learners
: description of model's base learners.
family
: the statistical family; the possible options are
mboost::Binomial(type = c("adaboost", "glm"),
link = c("logit", "probit", "cloglog", "cauchit", "log"), ...)
mboost::Gaussian()
mboost::Poisson()
mboost::GammaReg(nuirange = c(0, 100))
Other mboost
families may be suitable; however, these have not yet
been thoroughly tested and are therefore not recommended for use.
initial_learner
: if TRUE
a NGeDS
or
GGeDS
fit was used as the initial learner; otherwise, the
empirical risk minimizer corresponding to the selected family
was employed.
int.knots_init
: if initial_learner = TRUE
, this
corresponds to the maximum number of internal knots set in the
NGeDS
/GGeDS
function before the initial learner
fit.
shrinkage
: shrinkage/step-length/learning rate utilized
throughout the boosting iterations.
normalize_data
: if TRUE
, then response and predictors
were standardized before running the FGB algorithm.
X_mean
: mean of the predictor variables (only if
normalize_data = TRUE
, otherwise this is NULL
).
X_sd
: standard deviation of the predictors (only if
normalize_data = TRUE
, otherwise this is NULL
).
Y_mean
: mean of the response variable (only if
normalize_data = TRUE
, otherwise this is NULL
).
Y_sd
: standard deviation of the response variable (only if
normalize_data = TRUE
, otherwise this is NULL
).
models
A list containing the 'model' generated at each boosting
iteration. Each of these models
includes:
best_bl
: fit of the base learner that minimized the residual
sum of squares (RSS) in fitting the gradient at the i-th boosting
iteration.
Y_hat
: model fitted values at the i-th boosting
iteration.
base_learners
: knots and polynomial coefficients for each of the
base-learners at the i-th boosting iteration.
final_model
A list detailing the final GeDSboost model after the gradient descent algorithm is run:
model_name
: the boosting iteration corresponding to the final
model.
DEV
: deviance of the final model.
Y_hat
: fitted values.
base_learners
: a list containing, for each base-learner, the
intervals defined by the piecewise linear fit and its corresponding
polynomial coefficients. It also includes the knots corresponding to each
order fit, which result from computing the corresponding averaging knot
location. See Kaishev et al. (2016) for details. If the number of internal
knots of the final linear fit is less than $n-1$, the averaging knot location
is not computed.
Linear.Fit
/Quadratic.Fit
/Cubic.Fit
: final linear,
quadratic and cubic fits in B-spline form. These include the same elements
as Linear
, Quadratic
and Cubic
in a GeDS-class
object (see SplineReg
for details).
predictions
a list containing the predicted values obtained for each of the fits (linear, quadratic and cubic).
internal_knots
a list detailing the internal knots obtained for each of the different order fits (linear, quadratic, and cubic).
Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023).
Geometrically designed variable knot splines in generalized (non-)linear
models.
Applied Mathematics and Computation, 436.
DOI: doi:10.1016/j.amc.2022.127493
Dimitrova, D. S., Kaishev, V. K. and Saenz Guillen, E. L. (2025). GeDS: An R Package for Regression, Generalized Additive Models and Functional Gradient Boosting, based on Geometrically Designed (GeD) Splines. Manuscript submitted for publication.
A fitted GeDSgam object returned by the function NGeDSgam
,
inheriting the methods for class "GeDSgam"
. Methods for functions
coef
, knots
, plot
, print
and predict
are
available.
extcall
call to the NGeDSgam
function.
formula
a formula object representing the model to be fitted.
args
a list containing the arguments passed to the NGeDSgam
function. This list includes:
response
: data.frame
containing the response variable
observations.
predictors
: data.frame
containing the corresponding
observations of the predictor variables included in the model.
base_learners
: description of the model's base learners
('smooth functions').
family
: the statistical family. The possible options are
binomial(link = "logit", "probit", "cauchit", "log", "cloglog")
gaussian(link = "identity", "log", "inverse")
Gamma(link = "inverse", "identity", "log")
inverse.gaussian(link = "1/mu^2", "inverse", "identity", "log")
poisson(link = "log", "identity", "sqrt")
quasi(link = "identity", variance = "constant")
quasibinomial(link = "logit", "probit", "cloglog", "identity", "inverse", "log", "1/mu^2", "sqrt")
quasipoisson(llink = "logit", "probit", "cloglog", "identity", "inverse", "log", "1/mu^2", "sqrt")
normalize_data
: if TRUE
, then response and predictors
were standardized before running the local-scoring algorithm.
X_mean
: mean of the predictor variables (only if
normalize_data = TRUE
).
X_sd
: standard deviation of the predictors (only if
normalize_data = TRUE
, else is NULL
).
Y_mean
: mean of the response variable (only if
normalize_data = TRUE
, else is NULL
).
Y_sd
: standard deviation of the response variable (only if
normalize_data = TRUE
, else is NULL
).
final_model
A list detailing the final GeDSgam model selected after running the local scoring algorithm. The chosen model minimizes deviance across all models generated by each local-scoring iteration. This list includes:
model_name
: local-scoring iteration that yielded the "best"
model. Note that when family = "gaussian"
, it will always correspond
to iter1
, as only one local-scoring iteration is conducted in this
scenario. This occurs because, with family = "gaussian"
, the
algorithm is tantamount to directly implementing backfitting.
DEV
: the deviance for the fitted predictor model, defined as
in Dimitrova et al. (2023), which for family = "gaussian"
coincides
with the Residual Sum of Squares.
Y_hat
: fitted values.
eta
: additive predictor.
mu
: vector of means.
z
: adjusted dependent variable.
base_learners
: a list containing, for each base-learner, the
corresponding linear fit piecewise polynomial coefficients. It includes the
knots for each order fit, resulting from computing the averaging knot
location. Although if the number of internal knots of the final linear fit
is less than $n-1$, the averaging knot location is not computed.
Linear.Fit
: final model linear fit in B-spline form.
See SplineReg
for details.
Quadratic.Fit
: quadratic fit obtained via Schoenberg variation
diminishing spline approximation. See SplineReg
for details.
Cubic.Fit
: cubic fit obtained via Schoenberg variation
diminishing spline approximation. See SplineReg
for details.
predictions
A list containing the predicted values obtained for each of
the fits (linear, quadratic, and cubic). Each of the predictions contains
both the additive predictor eta
and the vector of means mu
.
internal_knots
A list detailing the internal knots obtained for the fits of different order (linear, quadratic, and cubic).
Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023).
Geometrically designed variable knot splines in generalized (non-)linear
models.
Applied Mathematics and Computation, 436.
DOI: doi:10.1016/j.amc.2022.127493
Dimitrova, D. S., Kaishev, V. K. and Saenz Guillen, E. L. (2025). GeDS: An R Package for Regression, Generalized Additive Models and Functional Gradient Boosting, based on Geometrically Designed (GeD) Splines. Manuscript submitted for publication.
GGeDS
constructs a Geometrically Designed (univariate or bivariate)
variable knots spline regression model for the predictor in the context of
Generalized (Non-)Linear Models. This is referred to as a GeDS model for a
response with a distribution from the Exponential Family.
GGeDS( formula, family = gaussian(), data, weights, beta, phi = 0.99, min.intknots, max.intknots, q = 2L, Xextr = NULL, Yextr = NULL, show.iters = FALSE, stoptype = "SR", higher_order = TRUE )
GGeDS( formula, family = gaussian(), data, weights, beta, phi = 0.99, min.intknots, max.intknots, q = 2L, Xextr = NULL, Yextr = NULL, show.iters = FALSE, stoptype = "SR", higher_order = TRUE )
formula |
a description of the structure of the predictor model to be
fitted, including the dependent and independent variables. See
|
family |
a description of the error distribution and link function to be
used in the model. This can be a character string naming a family function
(e.g. |
data |
an optional data frame, list or environment containing the
variables of the predictor model. If the formula variables are not found in
|
weights |
an optional vector of ‘prior weights’ to be put on the
observations during the fitting process in case the user requires weighted GeDS
fitting. It is |
beta |
numeric parameter in the interval |
phi |
numeric parameter in the interval |
min.intknots |
optional parameter allowing the user to set a minimum number of internal knots to be fit in stage A. By default equal to zero. |
max.intknots |
optional parameter allowing the user to set a maximum
number of internal knots to be added by the stage A's GeDS estimation
algorithm. By default equal to the number of knots for the saturated GeDS
model (i.e. |
q |
numeric parameter which allows to fine-tune the stopping rule of stage A of GeDS, by default equal to 2. See details below. |
Xextr |
numeric vector of 2 elements representing the left-most and right-most limits of the interval embedding the observations of the independent variable. See details. |
Yextr |
numeric vector of 2 elements representing the left-most and right-most limits of the interval embedding the observations of the second independent variable (if bivariate GeDS is run). See details. |
show.iters |
logical variable indicating whether or not to print
information of the fit at each GeDS iteration. By default equal to |
stoptype |
a character string indicating the type of GeDS stopping rule
to be used. It should be either one of |
higher_order |
a logical that defines whether to compute the higher
order fits (quadratic and cubic) after stage A is run. Default is
|
The GGeDS
function extends the GeDS methodology, developed by
Kaishev et al. (2016) and implemented in the NGeDS
function
for the Normal case, to the more general GNM (GLM) context, allowing for the
response to have any distribution from the Exponential Family. Under the
GeDS-GNM approach the (non-)linear predictor is viewed as a spline with
variable knots that are estimated along with the regression coefficients and
the order of the spline, using a two stage procedure. In stage A, a linear
variable-knot spline is fitted to the data applying iteratively re-weighted
least squares (see IRLSfit
function). In stage B, a Schoenberg
variation diminishing spline approximation to the fit from stage A is
constructed, thus simultaneously producing spline fits of order 2, 3 and 4,
all of which are included in the output, a GeDS-Class
object.
A detailed description of the underlying algorithm can be found in
Dimitrova et al. (2023).
As noted in formula
, the argument formula
allows the user to specify predictor models with two components: a spline
regression (non-parametric) component involving part of the independent
variables identified through the function f
, and an optional parametric
component involving the remaining independent variables. For GGeDS
only one or two independent variables are allowed for the spline component and
arbitrary many independent variables for the parametric component of the
predictor. Failure to specify the independent variable for the spline
regression component through the function f
will return an error.
See formula
.
Within the argument formula
, similarly as in other R functions, it is
possible to specify one or more offset variables, i.e. known terms with fixed
regression coefficients equal to 1. These terms should be identified via the
function offset
.
The parameter beta
tunes the placement of a new knot in stage A of the
algorithm. At the beginning of each GeDS iteration, a second-order spline is
fitted to the data. As follows, the 'working' residuals
(see IRLSfit
) are computed and grouped by their sign. A new knot
is then placed at a location defined by the cluster that maximizes a certain
measure. This measure is defined as a weighted linear combination of the range
of the independent variable at each cluster and the mean of the
absolute residuals within it. The parameter beta
determines the
weights in this measure correspondingly: beta
and 1 - beta
.
The higher beta
is, the more weight is put to the mean of the
residuals and the less to the range of their corresponding x-values (see
Kaishev et al., 2016, for further details).
The default values of beta
are beta = 0.5
if the response is
assumed to be Gaussian, beta = 0.2
if it is Poisson (or Quasipoisson),
while if it is Binomial, Quasibinomial or Gamma beta = 0.1
, which
reflect our experience of running GeDS for different underlying functional
dependencies.
The argument stoptype
allows to choose between three alternative
stopping rules for the knot selection in stage A of GeDS: "RD"
,
that stands for Ratio of Deviances; "SR"
, that stands for
Smoothed Ratio of deviances; and "LR"
, that stands for
Likelihood Ratio. The latter is based on the difference of deviances
rather than on their ratio as in the case of "RD"
and "SR"
.
Therefore "LR"
can be viewed as a log likelihood ratio test performed
at each iteration of the knot placement. In each of these cases the
corresponding stopping criterion is compared with a threshold value
phi
(see below).
The argument phi
provides a threshold value required for the stopping
rule to exit the knot placement in stage A of GeDS. The higher the value of
phi
, the more knots are added under the "RD"
and "SR"
stopping rules contrary to the case of the stopping rule "LR"
where
the lower phi
is, more knots are included in the spline regression.
Further details for each of the three alternative stopping rules can be found
in Dimitrova et al. (2023).
The argument q
is an input parameter that fine-tunes the stopping rule
in stage A. It specifies the number of consecutive iterations over which the
deviance must exhibit stable convergence to terminate knot placement in stage
A. Specifically, under any of the rules "RD"
, "SR"
or
"LR"
the deviance at the current iteration is compared to the deviance
computed q
iterations before, i.e. before
introducing the last q
knots.
A GeDS-Class
object, i.e. a list of items that
summarizes the main details of the fitted GeDS regression. See
GeDS-Class
for details. Some S3 methods are available in order
to make these objects tractable, such as coef
,
deviance
, knots
,
predict
and print
as well as S4 methods for lines
and
plot
.
Kaishev, V.K., Dimitrova, D.S., Haberman, S. and Verrall, R.J. (2016).
Geometrically designed, variable knot regression splines.
Computational Statistics, 31, 1079–1105.
DOI: doi:10.1007/s00180-015-0621-7
Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023).
Geometrically designed variable knot splines in generalized (non-)linear
models.
Applied Mathematics and Computation, 436.
DOI: doi:10.1016/j.amc.2022.127493
NGeDS
; GeDS-Class
; S3 methods such as
coef.GeDS
, deviance.GeDS
,
knots.GeDS
, print.GeDS
and
predict.GeDS
; Integrate
and Derive
;
PPolyRep
.
###################################################################### # Generate a data sample for the response variable Y and the covariate X # assuming Poisson distributed error and log link function # See section 4.1 in Dimitrova et al. (2023) set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only a component # non-linear in X, defined by the function f_1 means <- exp(f_1(X)) ############# ## POISSON ## ############# # Generate Poisson distributed Y according to the mean model Y <- rpois(N, means) # Fit a Poisson GeDS regression using GGeDS (Gmod <- GGeDS(Y ~ f(X), beta = 0.2, phi = 0.99, q = 2, family = poisson(), Xextr = c(-2,2))) # Plot the quadratic and cubic GeDS fits plot(X, log(Y), xlab = "x", ylab = expression(f[1](x))) lines(X, sapply(X, f_1), lwd = 2) lines(Gmod, n = 3, col = "red") lines(Gmod, n = 4, col = "blue", lty = 2) legend("topleft", legend = expression(f[1](x), "Quadratic", "Cubic"), col = c("black", "red", "blue"), lty = c(1, 1, 2), lwd = c(2, 1, 1), bty = "n") # Generate GeDS prediction at X=0, first on the response scale and then on # the predictor scale predict(Gmod, n = 3, newdata = data.frame(X = 0)) predict(Gmod, n = 3, newdata = data.frame(X = 0), type = "link") # Apply some of the other available methods, e.g. # knots, coefficients and deviance extractions for the # quadratic GeDS fit knots(Gmod) coef(Gmod) deviance(Gmod) # the same but for the cubic GeDS fit knots(Gmod, n = 4) coef(Gmod, n = 4) deviance(Gmod, n = 4) ########### ## GAMMA ## ########### # Generate Gamma distributed Y according to the mean model Y <- rgamma(N, shape = means, rate = 0.1) # Fit a Gamma GeDS regression using GGeDS Gmod <- GGeDS(Y ~ f(X), beta = 0.1, phi = 0.99, q = 2, family = Gamma(log), Xextr = c(-2,2)) plot(Gmod, f = function(x) exp(f_1(x))/0.1) ############## ## BINOMIAL ## ############## # Generate Binomial distributed Y according to the mean model eta <- f_1(X) - 4 means <- exp(eta)/(1+exp(eta)) Y <- rbinom(N, size = 50, prob = means) / 50 # Fit a Binomial GeDS regression using GGeDS Gmod <- GGeDS(Y ~ f(X), beta = 0.1, phi = 0.99, family = "quasibinomial", Xextr = c(-2,2)) plot(Gmod, f = function(x) exp(f_1(x) - 4)/(1 + exp(f_1(x) - 4))) ########################################## # A real data example # See Dimitrova et al. (2023), Section 4.2 data("coalMining") (Gmod2 <- GGeDS(formula = accidents ~ f(years), beta = 0.1, phi = 0.98, family = poisson(), data = coalMining)) (Gmod3 <- GGeDS(formula = accidents ~ f(years), beta = 0.1, phi = 0.985, family = poisson(), data = coalMining)) plot(coalMining$years, coalMining$accidents, type = "h", xlab = "Years", ylab = "Accidents") lines(Gmod2, tr = exp, n = 4, col = "red") lines(Gmod3, tr = exp, n = 4, col = "blue", lty = 2) legend("topright", c("phi = 0.98","phi = 0.985"), col = c("red", "blue"), lty=c(1, 2)) ## Not run: ########################################## # The same regression in the example of GeDS # but assuming Gamma and Poisson responses # See Dimitrova et al. (2023), Section 4.2 data('BaFe2As2') (Gmod4 <- GGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.6, phi = 0.995, q = 3, family = Gamma(log), stoptype = "RD")) plot(Gmod4) (Gmod5 <- GGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.1, phi = 0.995, q = 3, family = poisson(), stoptype = "SR")) plot(Gmod5) ## End(Not run) ########################################## # Life tables # See Dimitrova et al. (2023), Section 4.2 data(EWmortality) attach(EWmortality) (M1 <- GGeDS(formula = Deaths ~ f(Age) + offset(log(Exposure)), family = quasipoisson(), phi = 0.99, beta = 0.1, q = 3, stoptype = "LR")) Exposure_init <- Exposure + 0.5 * Deaths Rate <- Deaths / Exposure_init (M2 <- GGeDS(formula = Rate ~ f(Age), weights = Exposure_init, family = quasibinomial(), phi = 0.99, beta = 0.1, q = 3, stoptype = "LR")) op <- par(mfrow=c(2,2)) plot(Age, Deaths/Exposure, ylab = expression(mu[x]), xlab = "Age") lines(M1, n = 3, tr = exp, lwd = 1, col = "red") plot(Age, Rate, ylab = expression(q[x]), xlab = "Age") lines(M2, n = 3, tr = quasibinomial()$linkinv, lwd = 1, col = "red") plot(Age, log(Deaths/Exposure), ylab = expression(log(mu[x])), xlab = "Age") lines(M1, n = 3, lwd = 1, col = "red") plot(Age, quasibinomial()$linkfun(Rate), ylab = expression(logit(q[x])), xlab = "Age") lines(M2, n = 3, lwd = 1, col = "red") par(op) ######################################### # bivariate example set.seed(123) doublesin <- function(x) { # Adjusting the output to ensure it's positive exp(sin(2*x[,1]) + sin(2*x[,2])) } X <- round(runif(400, min = 0, max = 3), 2) Y <- round(runif(400, min = 0, max = 3), 2) # Calculate lambda for Poisson distribution lambda <- doublesin(cbind(X,Y)) # Generate Z from Poisson distribution Z <- rpois(400, lambda) data <- data.frame(X, Y, Z) # Fit a Poisson GeDS regression using GGeDS BivGeDS <- GGeDS(Z ~ f(X,Y), beta = 0.2, phi = 0.99, family = "poisson") # Poisson mean deviance w.r.t data deviance(BivGeDS, n = 2) # or sum(poisson()$dev.resids(Z, BivGeDS$Linear.Fit$Predicted, wt = 1)) deviance(BivGeDS, n = 3) deviance(BivGeDS, n = 4) # Poisson mean deviance w.r.t true function#' f_XY <- apply(cbind(X, Y), 1, function(row) doublesin(matrix(row, ncol = 2))) mean(poisson()$dev.resids(f_XY, BivGeDS$Linear.Fit$Predicted, wt = 1)) mean(poisson()$dev.resids(f_XY, BivGeDS$Quadratic.Fit$Predicted, wt = 1)) mean(poisson()$dev.resids(f_XY, BivGeDS$Cubic.Fit$Predicted, wt = 1)) # Surface plot of the generating function (doublesin) plot(BivGeDS, f = doublesin) # Surface plot of the fitted model plot(BivGeDS)
###################################################################### # Generate a data sample for the response variable Y and the covariate X # assuming Poisson distributed error and log link function # See section 4.1 in Dimitrova et al. (2023) set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only a component # non-linear in X, defined by the function f_1 means <- exp(f_1(X)) ############# ## POISSON ## ############# # Generate Poisson distributed Y according to the mean model Y <- rpois(N, means) # Fit a Poisson GeDS regression using GGeDS (Gmod <- GGeDS(Y ~ f(X), beta = 0.2, phi = 0.99, q = 2, family = poisson(), Xextr = c(-2,2))) # Plot the quadratic and cubic GeDS fits plot(X, log(Y), xlab = "x", ylab = expression(f[1](x))) lines(X, sapply(X, f_1), lwd = 2) lines(Gmod, n = 3, col = "red") lines(Gmod, n = 4, col = "blue", lty = 2) legend("topleft", legend = expression(f[1](x), "Quadratic", "Cubic"), col = c("black", "red", "blue"), lty = c(1, 1, 2), lwd = c(2, 1, 1), bty = "n") # Generate GeDS prediction at X=0, first on the response scale and then on # the predictor scale predict(Gmod, n = 3, newdata = data.frame(X = 0)) predict(Gmod, n = 3, newdata = data.frame(X = 0), type = "link") # Apply some of the other available methods, e.g. # knots, coefficients and deviance extractions for the # quadratic GeDS fit knots(Gmod) coef(Gmod) deviance(Gmod) # the same but for the cubic GeDS fit knots(Gmod, n = 4) coef(Gmod, n = 4) deviance(Gmod, n = 4) ########### ## GAMMA ## ########### # Generate Gamma distributed Y according to the mean model Y <- rgamma(N, shape = means, rate = 0.1) # Fit a Gamma GeDS regression using GGeDS Gmod <- GGeDS(Y ~ f(X), beta = 0.1, phi = 0.99, q = 2, family = Gamma(log), Xextr = c(-2,2)) plot(Gmod, f = function(x) exp(f_1(x))/0.1) ############## ## BINOMIAL ## ############## # Generate Binomial distributed Y according to the mean model eta <- f_1(X) - 4 means <- exp(eta)/(1+exp(eta)) Y <- rbinom(N, size = 50, prob = means) / 50 # Fit a Binomial GeDS regression using GGeDS Gmod <- GGeDS(Y ~ f(X), beta = 0.1, phi = 0.99, family = "quasibinomial", Xextr = c(-2,2)) plot(Gmod, f = function(x) exp(f_1(x) - 4)/(1 + exp(f_1(x) - 4))) ########################################## # A real data example # See Dimitrova et al. (2023), Section 4.2 data("coalMining") (Gmod2 <- GGeDS(formula = accidents ~ f(years), beta = 0.1, phi = 0.98, family = poisson(), data = coalMining)) (Gmod3 <- GGeDS(formula = accidents ~ f(years), beta = 0.1, phi = 0.985, family = poisson(), data = coalMining)) plot(coalMining$years, coalMining$accidents, type = "h", xlab = "Years", ylab = "Accidents") lines(Gmod2, tr = exp, n = 4, col = "red") lines(Gmod3, tr = exp, n = 4, col = "blue", lty = 2) legend("topright", c("phi = 0.98","phi = 0.985"), col = c("red", "blue"), lty=c(1, 2)) ## Not run: ########################################## # The same regression in the example of GeDS # but assuming Gamma and Poisson responses # See Dimitrova et al. (2023), Section 4.2 data('BaFe2As2') (Gmod4 <- GGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.6, phi = 0.995, q = 3, family = Gamma(log), stoptype = "RD")) plot(Gmod4) (Gmod5 <- GGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.1, phi = 0.995, q = 3, family = poisson(), stoptype = "SR")) plot(Gmod5) ## End(Not run) ########################################## # Life tables # See Dimitrova et al. (2023), Section 4.2 data(EWmortality) attach(EWmortality) (M1 <- GGeDS(formula = Deaths ~ f(Age) + offset(log(Exposure)), family = quasipoisson(), phi = 0.99, beta = 0.1, q = 3, stoptype = "LR")) Exposure_init <- Exposure + 0.5 * Deaths Rate <- Deaths / Exposure_init (M2 <- GGeDS(formula = Rate ~ f(Age), weights = Exposure_init, family = quasibinomial(), phi = 0.99, beta = 0.1, q = 3, stoptype = "LR")) op <- par(mfrow=c(2,2)) plot(Age, Deaths/Exposure, ylab = expression(mu[x]), xlab = "Age") lines(M1, n = 3, tr = exp, lwd = 1, col = "red") plot(Age, Rate, ylab = expression(q[x]), xlab = "Age") lines(M2, n = 3, tr = quasibinomial()$linkinv, lwd = 1, col = "red") plot(Age, log(Deaths/Exposure), ylab = expression(log(mu[x])), xlab = "Age") lines(M1, n = 3, lwd = 1, col = "red") plot(Age, quasibinomial()$linkfun(Rate), ylab = expression(logit(q[x])), xlab = "Age") lines(M2, n = 3, lwd = 1, col = "red") par(op) ######################################### # bivariate example set.seed(123) doublesin <- function(x) { # Adjusting the output to ensure it's positive exp(sin(2*x[,1]) + sin(2*x[,2])) } X <- round(runif(400, min = 0, max = 3), 2) Y <- round(runif(400, min = 0, max = 3), 2) # Calculate lambda for Poisson distribution lambda <- doublesin(cbind(X,Y)) # Generate Z from Poisson distribution Z <- rpois(400, lambda) data <- data.frame(X, Y, Z) # Fit a Poisson GeDS regression using GGeDS BivGeDS <- GGeDS(Z ~ f(X,Y), beta = 0.2, phi = 0.99, family = "poisson") # Poisson mean deviance w.r.t data deviance(BivGeDS, n = 2) # or sum(poisson()$dev.resids(Z, BivGeDS$Linear.Fit$Predicted, wt = 1)) deviance(BivGeDS, n = 3) deviance(BivGeDS, n = 4) # Poisson mean deviance w.r.t true function#' f_XY <- apply(cbind(X, Y), 1, function(row) doublesin(matrix(row, ncol = 2))) mean(poisson()$dev.resids(f_XY, BivGeDS$Linear.Fit$Predicted, wt = 1)) mean(poisson()$dev.resids(f_XY, BivGeDS$Quadratic.Fit$Predicted, wt = 1)) mean(poisson()$dev.resids(f_XY, BivGeDS$Cubic.Fit$Predicted, wt = 1)) # Surface plot of the generating function (doublesin) plot(BivGeDS, f = doublesin) # Surface plot of the fitted model plot(BivGeDS)
This function computes defined integrals of a fitted GeDS regression model.
Integrate(object = NULL, knots = NULL, coef = NULL, from, to, n = 3L)
Integrate(object = NULL, knots = NULL, coef = NULL, from, to, n = 3L)
object |
the |
knots |
a numeric vector of knots. This is required if |
coef |
a numeric vector of coefficients. This is required if |
from |
optional numeric vector containing the lower limit(s) of
integration. It should be either of size one or of the same size as the
argument |
to |
numeric vector containing the upper limit(s) of integration. |
n |
integer value (2, 3 or 4) specifying the order ( |
The function is based on the well known property (c.f. De Boor, 2001, Chapter X, formula (33)) that the integral of a linear combination of appropriately normalized B-splines is equal to the sum of its corresponding coefficients, noting that the GeDS regression is in fact such a linear combination.
Since the function is based on this property, it is designed to work only on the predictor scale in the GNM (GLM) framework.
If the argument from
is a single value, then it is taken as the lower
limit of integration for all the defined integrals required, whereas the upper
limits of integration are the values contained in the argument to
. If
the arguments from
and to
are of similar size, the integrals
(as many as the size) are computed by sequentially taking the pairs of values
in the from
and to
vectors as limits of integration.
De Boor, C. (2001). A Practical Guide to Splines (Revised Edition). Springer, New York.
# Generate a data sample for the response variable # Y and the single covariate X # see Dimitrova et al. (2023), section 4.1 set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only # a component non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) # Fit GeDS regression using NGeDS Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = .995, Xextr = c(-2,2)) # Compute defined integrals (in TeX style) $\int_{1}^{-1} f(x)dx$ # and $\int_{1}^{1} f(x)dx$ # $f$ being the quadratic fit Integrate(Gmod, from = 1, to = c(-1,1), n = 3) # Compute defined integrals (in TeX style) $\int_{1}^{-1} f(x)dx$ # and $\int_{-1}^{1} f(x)dx$ # $f$ being the quadratic fit Integrate(Gmod, from = c(1,-1), to = c(-1,1), n = 3) ## Not run: ## This gives an error Integrate(Gmod, from = c(1,-1), to = c(1,1), n = 3)
# Generate a data sample for the response variable # Y and the single covariate X # see Dimitrova et al. (2023), section 4.1 set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only # a component non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) # Fit GeDS regression using NGeDS Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = .995, Xextr = c(-2,2)) # Compute defined integrals (in TeX style) $\int_{1}^{-1} f(x)dx$ # and $\int_{1}^{1} f(x)dx$ # $f$ being the quadratic fit Integrate(Gmod, from = 1, to = c(-1,1), n = 3) # Compute defined integrals (in TeX style) $\int_{1}^{-1} f(x)dx$ # and $\int_{-1}^{1} f(x)dx$ # $f$ being the quadratic fit Integrate(Gmod, from = c(1,-1), to = c(-1,1), n = 3) ## Not run: ## This gives an error Integrate(Gmod, from = c(1,-1), to = c(1,1), n = 3)
This function is an implementation of the IRLS estimation algorithm adjusted
to the specific usage within the function SplineReg_GLM
.
IRLSfit( x, y, weights = rep(1, nobs), mustart = NULL, offset = rep(0, nobs), family = gaussian(), control = list() )
IRLSfit( x, y, weights = rep(1, nobs), mustart = NULL, offset = rep(0, nobs), family = gaussian(), control = list() )
x |
a matrix of regression functions (e.g. B-splines and/or terms of the parametric part) evaluated at the sample values of the covariate(s). |
y |
a vector of size |
weights |
an optional vector of prior weights for the observations, used when weighted IRLS fitting is required. By default, this is a vector of 1s. |
mustart |
initial values for the vector of means of the response
variable in the IRLS regression estimation. Must be a vector of length |
offset |
a vector of size |
family |
a description of the error distribution and link function to be
used in the model. This can be a character string naming a family function
(e.g. |
control |
a list of parameters for controlling the IRLS fitting process
to be passed on to |
This function is a slightly modified version of the
glm.fit
from the package stats to which we refer
for further details. The difference in the inputs of IRLSfit
and
glm.fit
is that the former admits initial values only
for the vector of means.
The output from IRLSfit
has some additional slots compared to
glm.fit
. We note that the slots weights
,
res2
and z
contain values of the IRLS weights, “working
residuals" and transformed responses computed after the last IRLS
iteration, i.e. they are based on the estimated coefficients that are
returned by IRLSfit
.
The source code of IRLSfit
contains also some commented lines that
produce useful plots at each IRLS iteration. Normally, printing these plots
is time consuming, but they could be run for inspection purposes.
A list containing:
coefficients |
a named vector containing the estimated regression coefficients; |
residuals |
the working residuals, which are the residuals from the
final iteration of the IRLS fit. Cases with zero weights are omitted, and
their working residuals are |
res2 |
the working residuals after the final IRLS iteration. They are used within the knot placement steps of stage A of GeDS; |
fitted.values |
the fitted mean values, obtained by transforming the predictor by the inverse of the link function; |
rank |
the numeric rank of the fitted linear model; |
family |
the |
linear.predictors |
the fitted predictor; |
deviance |
a vector containing the deviances obtained at each IRLS iteration; |
lastdeviance |
the deviance at the last IRLS iteration; |
null.deviance |
The deviance for the null model (see
|
iter |
the number of IRLS iterations performed; |
weights |
the working weights after the last IRLS iteration; |
prior.weights |
the “prior weights" (see the |
df.residual |
the residual degrees of freedom; |
df.null |
the residual degrees of freedom for the null model; |
y |
the vector of values of the response variable used in the fitting; |
z |
the transformed responses computed after the last IRLS iteration; |
converged |
logical. Was the IRLS algorithm judged to have converged? |
boundary |
logical. Is the fitted value on the boundary of the attainable values? |
In addition, non-empty fits will have components qr
, R
and
effects
relating to the final weighted linear fit, see
lm.fit
documentation.
Method for the generic function knots
that allows the
user to extract the vector of knots of a GeDS, GeDSboost or GeDSgam fit of a
specified order contained in a GeDS-class
,
GeDSboost-class
or GeDSgam-class
object,
respectively.
## S3 method for class 'GeDS' knots(Fn, n = 3L, options = c("all", "internal"), ...) ## S3 method for class 'GeDSboost' knots(Fn, n = 3L, options = c("all", "internal"), ...) ## S3 method for class 'GeDSgam' knots(Fn, n = 3L, options = c("all", "internal"), ...)
## S3 method for class 'GeDS' knots(Fn, n = 3L, options = c("all", "internal"), ...) ## S3 method for class 'GeDSboost' knots(Fn, n = 3L, options = c("all", "internal"), ...) ## S3 method for class 'GeDSgam' knots(Fn, n = 3L, options = c("all", "internal"), ...)
Fn |
the |
n |
integer value (2, 3 or 4) specifying the order ( |
options |
a character string specifying whether " |
... |
potentially further arguments (required for compatibility with the definition of the generic function). Currently ignored, but with a warning. |
This is a method for the function knots
in the
stats package.
As GeDS-class
, GeDSboost-class
and
GeDSgam-class
objects contain three different fits (linear,
quadratic and cubic), it is possible to specify the order of the GeDS fit
whose knots are required via the input argument n
.
A vector in which each element represents a knot of the GeDS/FGB-GeDS/GAM-GeDS fit of the required order.
knots
for the definition of the generic function; NGeDS
, GGeDS
,
NGeDSboost
and NGeDSgam
for examples.
Lines method for GeDS objects. Adds a GeDS curve to an existing plot.
## S4 method for signature 'GeDS' lines( x, n = 3L, transform = function(x) x, onlySpline = TRUE, data = data.frame(), ... )
## S4 method for signature 'GeDS' lines( x, n = 3L, transform = function(x) x, onlySpline = TRUE, data = data.frame(), ... )
x |
a |
n |
integer value (2, 3 or 4) specifying the order ( |
transform |
a function that can be used to transform the scale of the Y axis. Typically it can be the inverse of the link function if the plot is on the scale of the response variable. |
onlySpline |
logical variable specifying whether only the spline
component of the fitted GeDS predictor model should be plotted or
alternatively also the parametric component (see
|
data |
an optional |
... |
further arguments to be passed to the default
|
This method can be used to add a curve corresponding to a particular GeDS fit to an active plot.
As GeDS objects contain three different fits (linear, quadratic and cubic),
it is possible to specify the order of the GeDS regression to be plotted via
the input argument n
.
lines
for the definition of the generic
function; NGeDS
and GGeDS
for examples.
# Generate a data sample for the response variable # Y and the single covariate X set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only a component # non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) # Fit a GeDS regression model using NGeDS (Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2))) # Plot the GeDS third order fit (the quadratic one) # without its corresponding Polygon plot(Gmod, type = "none") # Add a curve corresponding to the second order fit (the linear one) lines(Gmod, n = 2, col = "green", lwd = 2, lty = 3)
# Generate a data sample for the response variable # Y and the single covariate X set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only a component # non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) # Fit a GeDS regression model using NGeDS (Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2))) # Plot the GeDS third order fit (the quadratic one) # without its corresponding Polygon plot(Gmod, type = "none") # Add a curve corresponding to the second order fit (the linear one) lines(Gmod, n = 2, col = "green", lwd = 2, lty = 3)
NGeDS
constructs a Geometrically Designed variable knots spline
regression model referred to as a GeDS model, for a response having a Normal
distribution.
NGeDS( formula, data, weights, beta = 0.5, phi = 0.99, min.intknots = 0, max.intknots = 500, q = 2, Xextr = NULL, Yextr = NULL, show.iters = FALSE, stoptype = "RD", higher_order = TRUE, intknots_init = NULL, fit_init = NULL, only_pred = FALSE )
NGeDS( formula, data, weights, beta = 0.5, phi = 0.99, min.intknots = 0, max.intknots = 500, q = 2, Xextr = NULL, Yextr = NULL, show.iters = FALSE, stoptype = "RD", higher_order = TRUE, intknots_init = NULL, fit_init = NULL, only_pred = FALSE )
formula |
a description of the structure of the model to be fitted,
including the dependent and independent variables. See
|
data |
an optional data frame, list or environment containing the
variables of the model. If not found in |
weights |
an optional vector of ‘prior weights’ to be put on the
observations in the fitting process in case the user requires weighted GeDS
fitting. It should be |
beta |
numeric parameter in the interval |
phi |
numeric parameter in the interval |
min.intknots |
optional parameter allowing the user to set a minimum number of internal knots required. By default equal to zero. |
max.intknots |
optional parameter allowing the user to set a maximum number of internal knots to be added by the GeDS estimation algorithm. By default equal to the number of knots for the saturated GeDS model. |
q |
numeric parameter which allows to fine-tune the stopping rule of stage A of GeDS, by default equal to 2. See details. |
Xextr |
numeric vector of 2 elements representing the left-most and right-most limits of the interval embedding the observations of the first independent variable. See details. |
Yextr |
numeric vector of 2 elements representing the left-most and right-most limits of the interval embedding the observations of the second independent variable (if the bivariate GeDS is run). See details. |
show.iters |
logical variable indicating whether or not to print information at each step. |
stoptype |
a character string indicating the type of GeDS stopping rule
to be used. It should be either one of |
higher_order |
a logical that defines whether to compute the higher
order fits (quadratic and cubic) after stage A is run. Default is
|
intknots_init |
vector of starting internal knots. Default is |
fit_init |
A list containing fitted values |
only_pred |
logical, if |
The NGeDS
function implements the GeDS methodology, recently
developed by Kaishev et al. (2016) and extended in the GGeDS
function for the more general GNM, (GLM) context, allowing for the response
to have any distribution from the Exponential Family. Under the GeDS approach
the (non-)linear predictor is viewed as a spline with variable knots which
are estimated along with the regression coefficients and the order of the
spline, using a two stage algorithm. In stage A, a linear variable-knot
spline is fitted to the data applying iteratively least squares regression
(see lm
function). In stage B, a Schoenberg variation
diminishing spline approximation to the fit from stage A is constructed, thus
simultaneously producing spline fits of order 2, 3 and 4, all of which are
included in the output, a GeDS-Class
object.
As noted in formula
, the argument formula
allows the user to specify models with two components, a spline regression
(non-parametric) component involving part of the independent variables
identified through the function f
and an optional parametric
component involving the remaining independent variables. For NGeDS
one
or two independent variables are allowed for the spline component and
arbitrary many independent variables for the parametric component. Failure to
specify the independent variable for the spline regression component through
the function f
will return an error. See
formula
.
Within the argument formula
, similarly as in other R functions, it is
possible to specify one or more offset variables, i.e. known terms with fixed
regression coefficients equal to 1. These terms should be identified via the
function offset
.
The parameter beta
tunes the placement of a new knot in stage A of the
algorithm. Once a current second-order spline is fitted to the data the
regression residuals are computed and grouped by their sign. A new knot is
placed at a location defined by the group for which a certain measure
attains its maximum. The latter measure is defined as a weighted linear
combination of the range of each group and the mean of the absolute
residuals within it. The parameter beta
determines the weights in this
measure correspondingly as beta
and 1 - beta
. The higher it
is, the more weight is put to the mean of the residuals and the less to the
range of their corresponding x-values. The default value of beta
is
0.5
.
The argument stoptype
allows to choose between three alternative
stopping rules for the knot selection in stage A of GeDS, the "RD"
,
that stands for Ratio of Deviances, the "SR"
, that stands for
Smoothed Ratio of deviances and the "LR"
, that stands for
Likelihood Ratio. The latter is based on the difference of deviances
rather than on their ratio as in the case of "RD"
and "SR"
.
Therefore "LR"
can be viewed as a log likelihood ratio test performed
at each iteration of the knot placement. In each of these cases the
corresponding stopping criterion is compared with a threshold value
phi
(see below).
The argument phi
provides a threshold value required for the stopping
rule to exit the knot placement in stage A of GeDS. The higher the value of
phi
, the more knots are added under the "RD"
and "SR"
stopping rules contrary to the case of the stopping rule "LR"
where
the lower phi
is, more knots are included in the spline regression.
Further details for each of the three alternative stopping rules can be found
in Dimitrova et al. (2023).
The argument q
is an input parameter that allows to fine-tune the
stopping rule in stage A. It identifies the number of consecutive iterations
over which the deviance should exhibit stable convergence so as the knot
placement in stage A is terminated. More precisely, under any of the rules
"RD"
, "SR"
, or "LR"
, the deviance at the current
iteration is compared to the deviance computed q
iterations before,
i.e., before selecting the last q
knots. Setting a higher q
will lead to more knots being added before exiting stage A of GeDS.
GeDS-Class
object, i.e. a list of items that summarizes
the main details of the fitted GeDS regression. See GeDS-Class
for details. Some S3 methods are available in order to make these objects
tractable, such as coef
,
deviance
, knots
,
predict
and print
as
well as S4 methods for lines
and
plot
.
Kaishev, V.K., Dimitrova, D.S., Haberman, S. and Verrall, R.J. (2016).
Geometrically designed, variable knot regression splines.
Computational Statistics, 31, 1079–1105.
DOI: doi:10.1007/s00180-015-0621-7
Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023).
Geometrically designed variable knot splines in generalized (non-)linear
models.
Applied Mathematics and Computation, 436.
DOI: doi:10.1016/j.amc.2022.127493
GGeDS; GeDS-Class; S3 methods such as coef.GeDS, deviance.GeDS, knots.GeDS, print.GeDS and predict.GeDS; Integrate and Derive; PPolyRep.
################################################### # Generate a data sample for the response variable # Y and the single covariate X set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only a component # non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) # Fit a Normal GeDS regression using NGeDS (Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2))) # Apply some of the available methods, e.g. # coefficients, knots and deviance extractions for the # quadratic GeDS fit # Note that the first call to the function knots returns # also the left and right limits of the interval containing # the data coef(Gmod, n = 3) knots(Gmod, n = 3) knots(Gmod, n = 3, options = "internal") deviance(Gmod, n = 3) # Add a covariate, Z, that enters linearly Z <- runif(N) Y2 <- Y + 2*Z + 1 # Re-fit the data using NGeDS (Gmod2 <- NGeDS(Y2 ~ f(X) + Z, beta = 0.6, phi = 0.995, Xextr = c(-2,2))) coef(Gmod2, n = 3) coef(Gmod2, onlySpline = FALSE, n = 3) ## Not run: ########################################## # Real data example # See Kaishev et al. (2016), section 4.2 data('BaFe2As2') (Gmod2 <- NGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.6, phi = 0.99, q = 3)) plot(Gmod2) ## End(Not run) ######################################### # bivariate example # See Dimitrova et al. (2023), section 5 # Generate a data sample for the response variable # Z and the covariates X and Y assuming Normal noise set.seed(123) doublesin <- function(x){ sin(2*x[,1])*sin(2*x[,2]) } X <- (round(runif(400, min = 0, max = 3),2)) Y <- (round(runif(400, min = 0, max = 3),2)) Z <- doublesin(cbind(X,Y)) Z <- Z+rnorm(400, 0, sd = 0.1) # Fit a two dimensional GeDS model using NGeDS (BivGeDS <- NGeDS(Z ~ f(X, Y), phi = 0.9)) # Extract quadratic coefficients/knots/deviance coef(BivGeDS, n = 3) knots(BivGeDS, n = 3) deviance(BivGeDS, n = 3) # Surface plot of the generating function (doublesin) plot(BivGeDS, f = doublesin) # Surface plot of the fitted model plot(BivGeDS)
################################################### # Generate a data sample for the response variable # Y and the single covariate X set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only a component # non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) # Fit a Normal GeDS regression using NGeDS (Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2))) # Apply some of the available methods, e.g. # coefficients, knots and deviance extractions for the # quadratic GeDS fit # Note that the first call to the function knots returns # also the left and right limits of the interval containing # the data coef(Gmod, n = 3) knots(Gmod, n = 3) knots(Gmod, n = 3, options = "internal") deviance(Gmod, n = 3) # Add a covariate, Z, that enters linearly Z <- runif(N) Y2 <- Y + 2*Z + 1 # Re-fit the data using NGeDS (Gmod2 <- NGeDS(Y2 ~ f(X) + Z, beta = 0.6, phi = 0.995, Xextr = c(-2,2))) coef(Gmod2, n = 3) coef(Gmod2, onlySpline = FALSE, n = 3) ## Not run: ########################################## # Real data example # See Kaishev et al. (2016), section 4.2 data('BaFe2As2') (Gmod2 <- NGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.6, phi = 0.99, q = 3)) plot(Gmod2) ## End(Not run) ######################################### # bivariate example # See Dimitrova et al. (2023), section 5 # Generate a data sample for the response variable # Z and the covariates X and Y assuming Normal noise set.seed(123) doublesin <- function(x){ sin(2*x[,1])*sin(2*x[,2]) } X <- (round(runif(400, min = 0, max = 3),2)) Y <- (round(runif(400, min = 0, max = 3),2)) Z <- doublesin(cbind(X,Y)) Z <- Z+rnorm(400, 0, sd = 0.1) # Fit a two dimensional GeDS model using NGeDS (BivGeDS <- NGeDS(Z ~ f(X, Y), phi = 0.9)) # Extract quadratic coefficients/knots/deviance coef(BivGeDS, n = 3) knots(BivGeDS, n = 3) deviance(BivGeDS, n = 3) # Surface plot of the generating function (doublesin) plot(BivGeDS, f = doublesin) # Surface plot of the fitted model plot(BivGeDS)
NGeDSboost
implements component-wise gradient boosting (Bühlmann and Yu
(2003), Bühlmann and Hothorn (2007)) using normal GeD splines (i.e., fitted
with NGeDS
function) as base-learners (see Dimitrova et al. (2025)).
NGeDSboost( formula, data, weights = NULL, normalize_data = FALSE, family = mboost::Gaussian(), link = NULL, initial_learner = TRUE, int.knots_init = 2L, min_iterations, max_iterations, shrinkage = 1, phi_boost_exit = 0.99, q_boost = 2L, beta = 0.5, phi = 0.99, int.knots_boost = 500L, q = 2L, higher_order = TRUE, boosting_with_memory = FALSE )
NGeDSboost( formula, data, weights = NULL, normalize_data = FALSE, family = mboost::Gaussian(), link = NULL, initial_learner = TRUE, int.knots_init = 2L, min_iterations, max_iterations, shrinkage = 1, phi_boost_exit = 0.99, q_boost = 2L, beta = 0.5, phi = 0.99, int.knots_boost = 500L, q = 2L, higher_order = TRUE, boosting_with_memory = FALSE )
formula |
a description of the structure of the model to be fitted,
including the dependent and independent variables. Unlike |
data |
a data frame containing the variables referenced in the formula. |
weights |
an optional vector of ‘prior weights’ to be put on the
observations during the fitting process. It should be |
normalize_data |
a logical that defines whether the data should be
normalized (standardized) before fitting the baseline linear model, i.e.,
before running the FGB algorithm. Normalizing the data involves scaling the
predictor variables to have a mean of 0 and a standard deviation of 1. Note
that this process alters the scale and interpretation of the knots and
coefficients estimated. Default is equal to |
family |
determines the loss function to be optimized by the boosting
algorithm. In case |
link |
in case the |
initial_learner |
a logical value. If set to |
int.knots_init |
optional parameter allowing the user to set a
maximum number of internal knots to be added by the initial GeDS learner in
case |
min_iterations |
optional parameter to manually set a minimum number of boosting iterations to be run. If not specified, it defaults to 0L. |
max_iterations |
optional parameter to manually set the maximum number
of boosting iterations to be run. If not specified, it defaults to 100L.
This setting serves as a fallback when the stopping rule, based on
consecutive deviances and tuned by |
shrinkage |
numeric parameter in the interval |
phi_boost_exit |
numeric parameter in the interval |
q_boost |
numeric parameter which allows to fine-tune the boosting
iterations stopping rule, by default equal to |
beta |
numeric parameter in the interval |
phi |
numeric parameter in the interval |
int.knots_boost |
The maximum number of internal knots that can be added
by the GeDS base-learners in each boosting iteration, effectively setting the
value of |
q |
numeric parameter which allows to fine-tune the stopping rule of
stage A of GeDS, by default equal to |
higher_order |
a logical that defines whether to compute the higher
order fits (quadratic and cubic) after the FGB algorithm is run. Default is
|
boosting_with_memory |
logical value. If |
The NGeDSboost
function implements functional gradient boosting
algorithm for some pre-defined loss function, using linear GeD splines as
base learners. At each boosting iteration, the negative gradient vector is
fitted through the base procedure encapsulated within the NGeDS
function. The latter constructs a Geometrically Designed variable knots
spline regression model for a response having a Normal distribution. The FGB
algorithm yields a final linear fit. Higher order fits (quadratic and cubic)
are then computed by calculating the Schoenberg’s variation diminishing
spline (VDS) approximation of the linear fit.
On the one hand, NGeDSboost
includes all the parameters of
NGeDS
, which in this case tune the base-learner fit at each
boosting iteration. On the other hand, NGeDSboost
includes some
additional parameters proper to the FGB procedure. We describe the main ones
as follows.
First, family
allows to specify the loss function and corresponding
risk function to be optimized by the boosting algorithm. If
initial_learner = FALSE
, the initial learner employed will be the
empirical risk minimizer corresponding to the family chosen. If
initial_learner = TRUE
then the initial learner will be an
NGeDS
fit with maximum number of internal knots equal to
int.knots_init
.
shrinkage
tunes the step length/shrinkage parameter which helps to
control the learning rate of the model. In other words, when a new base
learner is added to the ensemble, its contribution to the final prediction is
multiplied by the shrinkage parameter. The smaller shrinkage
is, the
slower/more gradual the learning process will be, and viceversa.
The number of boosting iterations is controlled by a
Ratio of Deviances stopping rule similar to the one presented for
GGeDS
. In the same way phi
and q
tune the
stopping rule of GGeDS
, phi_boost_exit
and
q_boost
tune the stopping rule of NGeDSboost
. The user can also
manually control the number of boosting iterations through
min_iterations
and max_iterations
.
GeDSboost-Class
object, i.e. a list of items that
summarizes the main details of the fitted FGB-GeDS model. See
GeDSboost-Class
for details. Some S3 methods are available in
order to make these objects tractable, such as
coef
, knots
,
print
and
predict
. Also variable importance measures
(bl_imp
) and improved plotting facilities
(visualize_boosting
).
Friedman, J.H. (2001).
Greedy function approximation: A gradient boosting machine.
The Annals of Statistics, 29 (5), 1189–1232.
DOI: doi:10.1214/aos/1013203451
Bühlmann P., Yu B. (2003). Boosting With the L2 Loss. Journal of the American Statistical Association, 98(462), 324–339. doi:10.1198/016214503000125
Bühlmann P., Hothorn T. (2007).
Boosting Algorithms: Regularization, Prediction and Model Fitting.
Statistical Science, 22(4), 477 – 505.
DOI: doi:10.1214/07-STS242
Kaishev, V.K., Dimitrova, D.S., Haberman, S. and Verrall, R.J. (2016).
Geometrically designed, variable knot regression splines.
Computational Statistics, 31, 1079–1105.
DOI: doi:10.1007/s00180-015-0621-7
Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023).
Geometrically designed variable knot splines in generalized (non-)linear
models.
Applied Mathematics and Computation, 436.
DOI: doi:10.1016/j.amc.2022.127493
Dimitrova, D. S., Kaishev, V. K. and Saenz Guillen, E. L. (2025). GeDS: An R Package for Regression, Generalized Additive Models and Functional Gradient Boosting, based on Geometrically Designed (GeD) Splines. Manuscript submitted for publication.
NGeDS
; GGeDS
; GeDSboost-Class
;
S3 methods such as knots.GeDSboost
; coef.GeDSboost
;
deviance.GeDSboost
; predict.GeDSboost
################################# Example 1 ################################# # Generate a data sample for the response variable # Y and the single covariate X set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only a component # non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.2) data = data.frame(X, Y) # Fit a Normal FGB-GeDS regression using NGeDSboost Gmodboost <- NGeDSboost(Y ~ f(X), data = data) MSE_Gmodboost_linear <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_linear)^2) MSE_Gmodboost_quadratic <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_quadratic)^2) MSE_Gmodboost_cubic <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_cubic)^2) cat("\n", "MEAN SQUARED ERROR", "\n", "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n", "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n", "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n") # Compute predictions on new randomly generated data X <- sort(runif(100, min = -2, max = 2)) pred_linear <- predict(Gmodboost, newdata = data.frame(X), n = 2L) pred_quadratic <- predict(Gmodboost, newdata = data.frame(X), n = 3L) pred_cubic <- predict(Gmodboost, newdata = data.frame(X), n = 4L) MSE_Gmodboost_linear <- mean((sapply(X, f_1) - pred_linear)^2) MSE_Gmodboost_quadratic <- mean((sapply(X, f_1) - pred_quadratic)^2) MSE_Gmodboost_cubic <- mean((sapply(X, f_1) - pred_cubic)^2) cat("\n", "MEAN SQUARED ERROR", "\n", "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n", "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n", "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n") ## S3 methods for class 'GeDSboost' # Print print(Gmodboost) # Knots knots(Gmodboost, n = 2L) knots(Gmodboost, n = 3L) knots(Gmodboost, n = 4L) # Coefficients coef(Gmodboost, n = 2L) coef(Gmodboost, n = 3L) coef(Gmodboost, n = 4L) # Deviances deviance(Gmodboost, n = 2L) deviance(Gmodboost, n = 3L) deviance(Gmodboost, n = 4L) ############################ Example 2 - Bodyfat ############################ library(TH.data) data("bodyfat", package = "TH.data") Gmodboost <- NGeDSboost(formula = DEXfat ~ age + f(hipcirc, waistcirc) + f(kneebreadth), data = bodyfat, phi_boost_exit = 0.9, q_boost = 1, phi = 0.9, q = 1) MSE_Gmodboost_linear <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_linear)^2) MSE_Gmodboost_quadratic <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_quadratic)^2) MSE_Gmodboost_cubic <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_cubic)^2) # Comparison cat("\n", "MSE", "\n", "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n", "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n", "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n")
################################# Example 1 ################################# # Generate a data sample for the response variable # Y and the single covariate X set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only a component # non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.2) data = data.frame(X, Y) # Fit a Normal FGB-GeDS regression using NGeDSboost Gmodboost <- NGeDSboost(Y ~ f(X), data = data) MSE_Gmodboost_linear <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_linear)^2) MSE_Gmodboost_quadratic <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_quadratic)^2) MSE_Gmodboost_cubic <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_cubic)^2) cat("\n", "MEAN SQUARED ERROR", "\n", "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n", "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n", "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n") # Compute predictions on new randomly generated data X <- sort(runif(100, min = -2, max = 2)) pred_linear <- predict(Gmodboost, newdata = data.frame(X), n = 2L) pred_quadratic <- predict(Gmodboost, newdata = data.frame(X), n = 3L) pred_cubic <- predict(Gmodboost, newdata = data.frame(X), n = 4L) MSE_Gmodboost_linear <- mean((sapply(X, f_1) - pred_linear)^2) MSE_Gmodboost_quadratic <- mean((sapply(X, f_1) - pred_quadratic)^2) MSE_Gmodboost_cubic <- mean((sapply(X, f_1) - pred_cubic)^2) cat("\n", "MEAN SQUARED ERROR", "\n", "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n", "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n", "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n") ## S3 methods for class 'GeDSboost' # Print print(Gmodboost) # Knots knots(Gmodboost, n = 2L) knots(Gmodboost, n = 3L) knots(Gmodboost, n = 4L) # Coefficients coef(Gmodboost, n = 2L) coef(Gmodboost, n = 3L) coef(Gmodboost, n = 4L) # Deviances deviance(Gmodboost, n = 2L) deviance(Gmodboost, n = 3L) deviance(Gmodboost, n = 4L) ############################ Example 2 - Bodyfat ############################ library(TH.data) data("bodyfat", package = "TH.data") Gmodboost <- NGeDSboost(formula = DEXfat ~ age + f(hipcirc, waistcirc) + f(kneebreadth), data = bodyfat, phi_boost_exit = 0.9, q_boost = 1, phi = 0.9, q = 1) MSE_Gmodboost_linear <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_linear)^2) MSE_Gmodboost_quadratic <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_quadratic)^2) MSE_Gmodboost_cubic <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_cubic)^2) # Comparison cat("\n", "MSE", "\n", "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n", "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n", "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n")
Implements the Local Scoring Algorithm (Hastie and Tibshirani
(1986)), applying normal linear GeD splines (i.e., NGeDS
function) to fit the targets within each backfitting iteration. Higher order
fits are computed by pursuing stage B of GeDS after the local-scoring algorithm
is run.
NGeDSgam( formula, family = "gaussian", data, weights = NULL, offset = NULL, normalize_data = FALSE, min_iterations, max_iterations, phi_gam_exit = 0.99, q_gam = 2, beta = 0.5, phi = 0.99, internal_knots = 500, q = 2, higher_order = TRUE )
NGeDSgam( formula, family = "gaussian", data, weights = NULL, offset = NULL, normalize_data = FALSE, min_iterations, max_iterations, phi_gam_exit = 0.99, q_gam = 2, beta = 0.5, phi = 0.99, internal_knots = 500, q = 2, higher_order = TRUE )
formula |
a description of the model structure to be fitted,
specifying both the dependent and independent variables. Unlike |
family |
a character string indicating the response variable distribution
and link function to be used. Default is |
data |
a data frame containing the variables referenced in the formula. |
weights |
an optional vector of ‘prior weights’ to be put on the
observations during the fitting process. It should be |
offset |
a vector of size |
normalize_data |
a logical that defines whether the data should be
normalized (standardized) before fitting the baseline linear model, i.e.,
before running the local-scoring algorithm. Normalizing the data involves
scaling the predictor variables to have a mean of 0 and a standard deviation
of 1. This process alters the scale and interpretation of the knots and
coefficients estimated. Default is equal to |
min_iterations |
optional parameter to manually set a minimum number of boosting iterations to be run. If not specified, it defaults to 0L. |
max_iterations |
optional parameter to manually set the maximum number
of boosting iterations to be run. If not specified, it defaults to 100L.
This setting serves as a fallback when the stopping rule, based on
consecutive deviances and tuned by |
phi_gam_exit |
Convergence threshold for local-scoring and backfitting.
Both algorithms stop when the relative change in the deviance is below this
threshold. Default is |
q_gam |
numeric parameter which allows to fine-tune the stopping rule of
the local-scoring and backfitting iterations. By default equal to |
beta |
numeric parameter in the interval |
phi |
numeric parameter in the interval |
internal_knots |
The maximum number of internal knots that can be added
by the GeDS base-learners in each boosting iteration, effectively setting the
value of |
q |
numeric parameter which allows to fine-tune the stopping rule of
stage A of GeDS, for each of the GeD spline components of the model. By
default equal to |
higher_order |
a logical that defines whether to compute the higher order
fits (quadratic and cubic) after the local-scoring algorithm is run. Default
is |
The NGeDSgam
function employs the local scoring algorithm to fit a
Generalized Additive Model (GAM). This algorithm iteratively fits weighted
additive models by backfitting. Normal linear GeD splines, as well as linear
learners, are supported as function smoothers within the backfitting
algorithm. The local-scoring algorithm ultimately produces a linear fit.
Higher order fits (quadratic and cubic) are then computed by calculating the
Schoenberg’s variation diminishing spline (VDS) approximation of the linear
fit.
On the one hand, NGeDSgam
includes all the parameters of
NGeDS
, which in this case tune the function smoother fit at each
backfitting iteration. On the other hand, NGeDSgam
includes some
additional parameters proper to the local-scoring procedure. We describe
the main ones as follows.
The family
chosen determines the link function, adjusted dependent
variable and weights to be used in the local-scoring algorithm. The number of
local-scoring and backfitting iterations is controlled by a
Ratio of Deviances stopping rule similar to the one presented for
GGeDS
. In the same way phi
and q
tune the stopping
rule of GGeDS
, phi_boost_exit
and q_boost
tune the
stopping rule of NGeDSgam
. The user can also manually control the number
of local-scoring iterations through min_iterations
and
max_iterations
.
GeDSgam-Class
object, i.e. a list of items that
summarizes the main details of the fitted GAM-GeDS model. See
GeDSgam-Class
for details. Some S3 methods are available in
order to make these objects tractable, such as
coef
, knots
,
print
and predict
.
Hastie, T. and Tibshirani, R. (1986). Generalized Additive Models.
Statistical Science 1 (3) 297 - 310.
DOI: doi:10.1214/ss/1177013604
Kaishev, V.K., Dimitrova, D.S., Haberman, S. and Verrall, R.J. (2016).
Geometrically designed, variable knot regression splines.
Computational Statistics, 31, 1079–1105.
DOI: doi:10.1007/s00180-015-0621-7
Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023).
Geometrically designed variable knot splines in generalized (non-)linear
models.
Applied Mathematics and Computation, 436.
DOI: doi:10.1016/j.amc.2022.127493
Dimitrova, D. S., Kaishev, V. K. and Saenz Guillen, E. L. (2025). GeDS: An R Package for Regression, Generalized Additive Models and Functional Gradient Boosting, based on Geometrically Designed (GeD) Splines. Manuscript submitted for publication.
NGeDS
; GGeDS
; GeDSgam-Class
;
S3 methods such as knots.GeDSgam
; coef.GeDSgam
;
deviance.GeDSgam
; predict.GeDSgam
# Load package library(GeDS) data(airquality) data = na.omit(airquality) data$Ozone <- data$Ozone^(1/3) formula = Ozone ~ f(Solar.R) + f(Wind, Temp) Gmodgam <- NGeDSgam(formula = formula, data = data, phi = 0.8) MSE_Gmodgam_linear <- mean((data$Ozone - Gmodgam$predictions$pred_linear)^2) MSE_Gmodgam_quadratic <- mean((data$Ozone - Gmodgam$predictions$pred_quadratic)^2) MSE_Gmodgam_cubic <- mean((data$Ozone - Gmodgam$predictions$pred_cubic)^2) cat("\n", "MEAN SQUARED ERROR", "\n", "Linear NGeDSgam:", MSE_Gmodgam_linear, "\n", "Quadratic NGeDSgam:", MSE_Gmodgam_quadratic, "\n", "Cubic NGeDSgam:", MSE_Gmodgam_cubic, "\n") ## S3 methods for class 'GeDSboost' # Print print(Gmodgam) # Knots knots(Gmodgam, n = 2L) knots(Gmodgam, n = 3L) knots(Gmodgam, n = 4L) # Coefficients coef(Gmodgam, n = 2L) coef(Gmodgam, n = 3L) coef(Gmodgam, n = 4L) # Deviances deviance(Gmodgam, n = 2L) deviance(Gmodgam, n = 3L) deviance(Gmodgam, n = 4L)
# Load package library(GeDS) data(airquality) data = na.omit(airquality) data$Ozone <- data$Ozone^(1/3) formula = Ozone ~ f(Solar.R) + f(Wind, Temp) Gmodgam <- NGeDSgam(formula = formula, data = data, phi = 0.8) MSE_Gmodgam_linear <- mean((data$Ozone - Gmodgam$predictions$pred_linear)^2) MSE_Gmodgam_quadratic <- mean((data$Ozone - Gmodgam$predictions$pred_quadratic)^2) MSE_Gmodgam_cubic <- mean((data$Ozone - Gmodgam$predictions$pred_cubic)^2) cat("\n", "MEAN SQUARED ERROR", "\n", "Linear NGeDSgam:", MSE_Gmodgam_linear, "\n", "Quadratic NGeDSgam:", MSE_Gmodgam_quadratic, "\n", "Cubic NGeDSgam:", MSE_Gmodgam_cubic, "\n") ## S3 methods for class 'GeDSboost' # Print print(Gmodgam) # Knots knots(Gmodgam, n = 2L) knots(Gmodgam, n = 3L) knots(Gmodgam, n = 4L) # Coefficients coef(Gmodgam, n = 2L) coef(Gmodgam, n = 3L) coef(Gmodgam, n = 4L) # Deviances deviance(Gmodgam, n = 2L) deviance(Gmodgam, n = 3L) deviance(Gmodgam, n = 4L)
Plot method for GeDS objects. Plots GeDS fits.
## S4 method for signature 'GeDS,ANY' plot( x, f = NULL, which, DEV = FALSE, ask = FALSE, main, legend.pos = "topright", legend.text = NULL, new.window = FALSE, wait = 0.5, n = 3L, type = c("none", "Polygon", "NCI", "ACI"), ... )
## S4 method for signature 'GeDS,ANY' plot( x, f = NULL, which, DEV = FALSE, ask = FALSE, main, legend.pos = "topright", legend.text = NULL, new.window = FALSE, wait = 0.5, n = 3L, type = c("none", "Polygon", "NCI", "ACI"), ... )
x |
a |
f |
(optional) specifies the underlying function or generating process to which the model was fit. This parameter is useful if the user wishes to plot the specified function/process alongside the model fit and the data |
which |
a numeric vector specifying the iterations of stage A for which
the corresponding GeDS fits should be plotted.
It has to be a subset of |
DEV |
logical variable specifying whether a plot representing the deviance at each iteration of stage A should be produced or not. |
ask |
logical variable specifying whether the user should be prompted before changing the plot page. |
main |
an optional character string used as the plot title. If set to '"detail"', the knots vector will be displayed on the plot. |
legend.pos |
the position of the legend within the panel. See legend for details. |
legend.text |
a character vector specifying the legend text. |
new.window |
logical variable specifying whether the plot should be shown in a new window or in the active one. |
wait |
time, in seconds, the system should wait before plotting a new
page. Ignored if |
n |
integer value (2, 3 or 4) specifying the order ( |
type |
character string specifying the type of plot required. Should be
set either to |
... |
further arguments to be passed to the
|
This method is provided in order to allow the user to plot the GeDS fits
contained in the GeDS-Class
objects.
Since in Stage A of the GeDS algorithm the knots of a linear spline fit are
sequentially located, one at a time, the user may wish to visually inspect
this process using the argument which
. The latter specifies a
particular iteration number (or a vector of such numbers) for which the
corresponding linear fit(s) should be plotted. The ask
and wait
arguments can help the user to manage these pages.
By means of ask
the user can determine for how long each page should
appear on the screen. Pages are sequentially replaced by pressing the enter
button.
Note that, in order to ensure stability, if the object was produced by the
function GGeDS
, plotting intermediate fits of stage A is
allowed only if n = 2
, in contrast to objects produced by
NGeDS
for which plotting intermediate results is allowed also
for n =
2 or 3 results.
The confidence intervals obtained by setting type = "NCI"
are
approximate local bands obtained considering the knots as fixed constants.
Hence the columns of the design matrix are seen as covariates and standard
methodology relying on the se.fit
option of predict.lm
or
predict.glm
is applied.
Setting type = "ACI"
, asymptotic confidence intervals are plotted.
This option is applicable only if the canonical link function has been used
in the fitting procedure.
################################################### # Generate a data sample for the response variable # Y and the single covariate X, assuming Normal noise set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only a component # non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) # Fit a Normal GeDS regression using NGeDS (Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2))) # Plot the final quadratic GeDS fit (red solid line) # with its control polygon (blue dashed line) plot(Gmod) # Plot the quadratic fit obtained from the linear fit at the 10th # iteration of stage A i.e. after 9 internal knots have been inserted # by the GeDS procedure plot(Gmod, which=10) # Generate plots of all the intermediate fits obtained # by running the GeDS procedure ## Not run: plot(Gmod, which=1:16) ## End(Not run) ################################################### # Generate a data sample for the response variable Y and the covariate # X assuming Poisson distributed error and a log link function set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N ,min = -2, max = 2)) # Specify a model for the mean of Y to include only a component # non-linear in X, defined by the function f_1 means <- exp(f_1(X)) # Generate Poisson distributed Y according to the mean model Y <- rpois(N,means) # Fit a Poisson GeDS regression model using GGeDS (Gmod2 <- GGeDS(Y ~ f(X), beta = 0.2, phi = 0.995, family = poisson(), Xextr = c(-2,2))) # similar plots as before, but for the linear fit plot(Gmod2, n = 2) plot(Gmod2, which = 10, n = 2) ## Not run: plot(Gmod2, which = 1:16, n = 2) plot(Gmod2, which = 1:16, n = 2, ask = T) ## End(Not run)
################################################### # Generate a data sample for the response variable # Y and the single covariate X, assuming Normal noise set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only a component # non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) # Fit a Normal GeDS regression using NGeDS (Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2))) # Plot the final quadratic GeDS fit (red solid line) # with its control polygon (blue dashed line) plot(Gmod) # Plot the quadratic fit obtained from the linear fit at the 10th # iteration of stage A i.e. after 9 internal knots have been inserted # by the GeDS procedure plot(Gmod, which=10) # Generate plots of all the intermediate fits obtained # by running the GeDS procedure ## Not run: plot(Gmod, which=1:16) ## End(Not run) ################################################### # Generate a data sample for the response variable Y and the covariate # X assuming Poisson distributed error and a log link function set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N ,min = -2, max = 2)) # Specify a model for the mean of Y to include only a component # non-linear in X, defined by the function f_1 means <- exp(f_1(X)) # Generate Poisson distributed Y according to the mean model Y <- rpois(N,means) # Fit a Poisson GeDS regression model using GGeDS (Gmod2 <- GGeDS(Y ~ f(X), beta = 0.2, phi = 0.995, family = poisson(), Xextr = c(-2,2))) # similar plots as before, but for the linear fit plot(Gmod2, n = 2) plot(Gmod2, which = 10, n = 2) ## Not run: plot(Gmod2, which = 1:16, n = 2) plot(Gmod2, which = 1:16, n = 2, ask = T) ## End(Not run)
Plots the component functions of a GeDSboost object fitted using
NGeDSboost
. If the model has a single base-learner, the plot
will be returned on the response scale. Otherwise, plots are produced on the
linear predictor scale. Note that only univariate base-learner plots are
returned, as representation of the boosted model as a single spline model is
available only for univariate base-learners (see Dimitrova et al. (2025)). In
addition since component-wise gradient boosting inherently performs base-learner
selection, you should only expect plots for the base-learners that where selected
across the boosting iterations.
## S4 method for signature 'GeDSboost,ANY' plot(x, n = 3L, ...)
## S4 method for signature 'GeDSboost,ANY' plot(x, n = 3L, ...)
x |
a |
n |
integer value (2, 3 or 4) specifying the order ( |
... |
further arguments to be passed to the
|
Dimitrova, D. S., Kaishev, V. K. and Saenz Guillen, E. L. (2025). GeDS: An R Package for Regression, Generalized Additive Models and Functional Gradient Boosting, based on Geometrically Designed (GeD) Splines. Manuscript submitted for publication.
Plots on the linear predictor scale the component functions of a GeDSgam
object fitted using NGeDSgam
.
## S4 method for signature 'GeDSgam,ANY' plot(x, base_learners = NULL, f = NULL, n = 3L, ...)
## S4 method for signature 'GeDSgam,ANY' plot(x, base_learners = NULL, f = NULL, n = 3L, ...)
x |
a |
base_learners |
either NULL or a vector of character string specifying the base-learners of the model for which predictions should be plotted. Note that single base-learner predictions are provided on the linear predictor scale. |
f |
(optional) specifies the underlying component function or generating process to which the model was fit. This parameter is useful if the user wishes to plot the specified function/process alongside the model fit and the data. |
n |
integer value (2, 3 or 4) specifying the order ( |
... |
further arguments to be passed to the
|
The function converts a GeDS fit which has a B-spline representation to a piecewise polynomial form.
PPolyRep(object, n = 3)
PPolyRep(object, n = 3)
object |
the |
n |
integer value (2, 3 or 4) specifying the order ( |
This function converts a selected GeDS fit from a GeDS-class
object represented in terms of B-splines into an object where the fit is
represented in terms of piecewise polynomials.
The function wraps polySpline
in order to let it
accept GeDS-class
objects as input. Hence the function provides
a useful link between the package GeDS and the package splines,
allowing the user to take advantage of the functions provided in the
splines package.
An object that inherits from classes "spline"
and
"polySpline"
. It is a list whose arguments are:
knots |
a vector of size |
coefficients |
a |
Let us note that the first rows of the matrix contain the
n
coefficients of the consecutive pieces of the piecewise
polynomial representation. The last
-th row is extraneous and it
appears as a result of the use of the function
polySpline
.
# Generate a data sample for the response variable # Y and the single covariate X set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only # a component non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) # Fit a Normal GeDS regression using NGeDS Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2)) # construct the PP representation of the cubic GeDS fit # and apply some functions of the package splines Polymod <- PPolyRep(Gmod, 4) require(splines) class(Polymod) splineKnots(Polymod) knots(Gmod, n = 4) plot(Polymod) # Generate a plot showing the PP representation # based on the same example knt <- splineKnots(Polymod) coeffs <- coef(Polymod) plot(Gmod, n = 4, legend.pos = FALSE, main = "Cubic Curves") cols <- sample(heat.colors(length(knt)), length(knt)) for(i in 1:(length(knt))){ curve(coeffs[i,1] + coeffs[i,2]*(x - knt[i])+ coeffs[i,3]*(x - knt[i])^2+ coeffs[i,4]*(x - knt[i])^3, add = TRUE, col = cols[i]) abline(v = knt[i]) }
# Generate a data sample for the response variable # Y and the single covariate X set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only # a component non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) # Fit a Normal GeDS regression using NGeDS Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2)) # construct the PP representation of the cubic GeDS fit # and apply some functions of the package splines Polymod <- PPolyRep(Gmod, 4) require(splines) class(Polymod) splineKnots(Polymod) knots(Gmod, n = 4) plot(Polymod) # Generate a plot showing the PP representation # based on the same example knt <- splineKnots(Polymod) coeffs <- coef(Polymod) plot(Gmod, n = 4, legend.pos = FALSE, main = "Cubic Curves") cols <- sample(heat.colors(length(knt)), length(knt)) for(i in 1:(length(knt))){ curve(coeffs[i,1] + coeffs[i,2]*(x - knt[i])+ coeffs[i,3]*(x - knt[i])^2+ coeffs[i,4]*(x - knt[i])^3, add = TRUE, col = cols[i]) abline(v = knt[i]) }
This is a user friendly method to compute predictions from GeDS objects.
## S3 method for class 'GeDS' predict(object, newdata, type = c("response", "link", "terms"), n = 3L, ...)
## S3 method for class 'GeDS' predict(object, newdata, type = c("response", "link", "terms"), n = 3L, ...)
object |
the |
newdata |
an optional |
type |
character string indicating the type of prediction required. By
default it is equal to |
n |
integer value (2, 3 or 4) specifying the order ( |
... |
potentially further arguments (required by the definition of the generic function). They are ignored, but with a warning. |
This is a method for the function predict
that allows
the user to handle GeDS-Class
objects.
In analogy with the function predict.glm
in the
stats package, the user can specify the scale on which the predictions
should be computed through the argument type
. If the predictions are
required to be on the scale of the response variable, the user should set
type = "response"
, which is the default. Alternatively if one wants
the predictions to be on the predictor scale, it is necessary to set
type = "link"
.
By specifying type = "terms"
, it is possible to inspect the predicted
values separately for each single independent variable which enter either the
GeD spline component or the parametric component of the predictor model. In
this case the returned result is a matrix whose columns correspond to the
terms supplied via newdata
or extracted from the object
.
As GeDS objects contain three different fits (linear, quadratic and cubic),
it is possible to specify the order for which GeDS predictions are required
via the input argument n
.
A numeric vector corresponding to the predicted values (if
type = "link"
or type = "response"
). If type = "terms"
a
numeric matrix with a column per term.
predict
for the standard definition;
GGeDS
for examples.
This method computes predictions from GeDSboost and GeDSgam objects. It is designed to be user-friendly and accommodate different orders of the GeDSboost or GeDSgam fits.
## S3 method for class 'GeDSboost' predict(object, newdata, n = 3L, base_learner = NULL, ...) ## S3 method for class 'GeDSgam' predict(object, newdata, n = 3L, base_learner = NULL, ...)
## S3 method for class 'GeDSboost' predict(object, newdata, n = 3L, base_learner = NULL, ...) ## S3 method for class 'GeDSgam' predict(object, newdata, n = 3L, base_learner = NULL, ...)
object |
the |
newdata |
an optional data frame for prediction. |
n |
the order of the GeDS fit ( |
base_learner |
either |
... |
potentially further arguments. |
Numeric vector of predictions (vector of means).
Gu, C. and Wahba, G. (1991). Minimizing GCV/GML Scores with Multiple Smoothing Parameters via the Newton Method. SIAM J. Sci. Comput., 12, 383–398.
## Gu and Wahba 4 univariate term example ## # Generate a data sample for the response variable # y and the covariates x0, x1 and x2; include a noise predictor x3 set.seed(123) N <- 400 f_x0x1x2 <- function(x0,x1,x2) { f0 <- function(x0) 2 * sin(pi * x0) f1 <- function(x1) exp(2 * x1) f2 <- function(x2) 0.2 * x2^11 * (10 * (1 - x2))^6 + 10 * (10 * x2)^3 * (1 - x2)^10 f <- f0(x0) + f1(x1) + f2(x2) return(f) } x0 <- runif(N, 0, 1) x1 <- runif(N, 0, 1) x2 <- runif(N, 0, 1) x3 <- runif(N, 0, 1) # Specify a model for the mean of y f <- f_x0x1x2(x0 = x0, x1 = x1, x2 = x2) # Add (Normal) noise to the mean of y y <- rnorm(N, mean = f, sd = 0.2) data <- data.frame(y = y, x0 = x0, x1 = x1, x2 = x2, x3 = x3) # Fit a GeDSgam model Gmodgam <- NGeDSgam(y ~ f(x0) + f(x1) + f(x2) + f(x3), data = data) # Check that the sum of the individual base-learner predictions equals the final # model prediction pred0 <- predict(Gmodgam, n = 2, newdata = data, base_learner = "f(x0)") pred1 <- predict(Gmodgam, n = 2, newdata = data, base_learner = "f(x2)") pred2 <- predict(Gmodgam, n = 2, newdata = data, base_learner = "f(x1)") pred3 <- predict(Gmodgam, n = 2, newdata = data, base_learner = "f(x3)") round(predict(Gmodgam, n = 2, newdata = data) - (mean(predict(Gmodgam, n = 2, newdata = data)) + pred0 + pred1 + pred2 + pred3), 12) pred0 <- predict(Gmodgam, n = 3, newdata = data, base_learner = "f(x0)") pred1 <- predict(Gmodgam, n = 3, newdata = data, base_learner = "f(x2)") pred2 <- predict(Gmodgam, n = 3, newdata = data, base_learner = "f(x1)") pred3 <- predict(Gmodgam, n = 3, newdata = data, base_learner = "f(x3)") round(predict(Gmodgam, n = 3, newdata = data) - (pred0 + pred1 + pred2 + pred3), 12) pred0 <- predict(Gmodgam, n = 4, newdata = data, base_learner = "f(x0)") pred1 <- predict(Gmodgam, n = 4, newdata = data, base_learner = "f(x2)") pred2 <- predict(Gmodgam, n = 4, newdata = data, base_learner = "f(x1)") pred3 <- predict(Gmodgam, n = 4, newdata = data, base_learner = "f(x3)") round(predict(Gmodgam, n = 4, newdata = data) - (pred0 + pred1 + pred2 + pred3), 12) # Plot GeDSgam partial fits to f(x0), f(x1), f(x2) par(mfrow = c(1,3)) for (i in 1:3) { # Plot the base learner plot(Gmodgam, n = 3, base_learners = paste0("f(x", i-1, ")"), col = "seagreen", cex.lab = 1.5, cex.axis = 1.5) # Add legend if (i == 2) { position <- "topleft" } else if (i == 3) { position <- "topright" } else { position <- "bottom" } legend(position, legend = c("GAM-GeDS Quadratic", paste0("f(x", i-1, ")")), col = c("seagreen", "darkgray"), lwd = c(2, 2), bty = "n", cex = 1.5) }
## Gu and Wahba 4 univariate term example ## # Generate a data sample for the response variable # y and the covariates x0, x1 and x2; include a noise predictor x3 set.seed(123) N <- 400 f_x0x1x2 <- function(x0,x1,x2) { f0 <- function(x0) 2 * sin(pi * x0) f1 <- function(x1) exp(2 * x1) f2 <- function(x2) 0.2 * x2^11 * (10 * (1 - x2))^6 + 10 * (10 * x2)^3 * (1 - x2)^10 f <- f0(x0) + f1(x1) + f2(x2) return(f) } x0 <- runif(N, 0, 1) x1 <- runif(N, 0, 1) x2 <- runif(N, 0, 1) x3 <- runif(N, 0, 1) # Specify a model for the mean of y f <- f_x0x1x2(x0 = x0, x1 = x1, x2 = x2) # Add (Normal) noise to the mean of y y <- rnorm(N, mean = f, sd = 0.2) data <- data.frame(y = y, x0 = x0, x1 = x1, x2 = x2, x3 = x3) # Fit a GeDSgam model Gmodgam <- NGeDSgam(y ~ f(x0) + f(x1) + f(x2) + f(x3), data = data) # Check that the sum of the individual base-learner predictions equals the final # model prediction pred0 <- predict(Gmodgam, n = 2, newdata = data, base_learner = "f(x0)") pred1 <- predict(Gmodgam, n = 2, newdata = data, base_learner = "f(x2)") pred2 <- predict(Gmodgam, n = 2, newdata = data, base_learner = "f(x1)") pred3 <- predict(Gmodgam, n = 2, newdata = data, base_learner = "f(x3)") round(predict(Gmodgam, n = 2, newdata = data) - (mean(predict(Gmodgam, n = 2, newdata = data)) + pred0 + pred1 + pred2 + pred3), 12) pred0 <- predict(Gmodgam, n = 3, newdata = data, base_learner = "f(x0)") pred1 <- predict(Gmodgam, n = 3, newdata = data, base_learner = "f(x2)") pred2 <- predict(Gmodgam, n = 3, newdata = data, base_learner = "f(x1)") pred3 <- predict(Gmodgam, n = 3, newdata = data, base_learner = "f(x3)") round(predict(Gmodgam, n = 3, newdata = data) - (pred0 + pred1 + pred2 + pred3), 12) pred0 <- predict(Gmodgam, n = 4, newdata = data, base_learner = "f(x0)") pred1 <- predict(Gmodgam, n = 4, newdata = data, base_learner = "f(x2)") pred2 <- predict(Gmodgam, n = 4, newdata = data, base_learner = "f(x1)") pred3 <- predict(Gmodgam, n = 4, newdata = data, base_learner = "f(x3)") round(predict(Gmodgam, n = 4, newdata = data) - (pred0 + pred1 + pred2 + pred3), 12) # Plot GeDSgam partial fits to f(x0), f(x1), f(x2) par(mfrow = c(1,3)) for (i in 1:3) { # Plot the base learner plot(Gmodgam, n = 3, base_learners = paste0("f(x", i-1, ")"), col = "seagreen", cex.lab = 1.5, cex.axis = 1.5) # Add legend if (i == 2) { position <- "topleft" } else if (i == 3) { position <- "topright" } else { position <- "bottom" } legend(position, legend = c("GAM-GeDS Quadratic", paste0("f(x", i-1, ")")), col = c("seagreen", "darkgray"), lwd = c(2, 2), bty = "n", cex = 1.5) }
Method for the generic function print
that allows to
print on screen the main information related to the fitted predictor model
that can be extracted from a GeDS-class
,
GeDSboost-class
or GeDSgam-class
object.
## S3 method for class 'GeDS' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'GeDSboost' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'GeDSgam' print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'GeDS' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'GeDSboost' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'GeDSgam' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
the |
digits |
number of digits to be printed. |
... |
potentially further arguments (required by the definition of the generic function). |
This method allows to print on screen basic information related to the fitted
predictor model such as the function call
, the number of internal
knots for the linear GeDS/FGB-GeDS/GAM-GeDS fit and the deviances for the
three (linear, quadratic and cubic) fitted predictor models embedded in the
GeDSboost-class
or GeDSgam-class
object.
This function returns (invisibly) the same input object, but adding
the slot Print
that contains the three sub-slots:
Nknots |
the number of internal knots of the linear GeDS/FGB-GeDS/GAM-GeDS fit |
Deviances |
the deviances of the three (linear, quadratic and cubic) GeDS/FGB-GeDS/GAM-GeDS fits |
Call |
the |
print
for the standard definition.
Functions that estimate the coefficients of a predictor model involving a spline component and possibly a parametric component applying (Iteratively Re-weighted) Least Squares (IR)LS iteration.
SplineReg_LM( X, Y, Z = NULL, offset = rep(0, length(X)), weights = rep(1, length(X)), InterKnots, n, extr = range(X), prob = 0.95, coefficients = NULL, only_pred = FALSE ) SplineReg_GLM( X, Y, Z, offset = rep(0, nobs), weights = rep(1, length(X)), InterKnots, n, extr = range(X), family, mustart, inits = NULL, etastart = NULL )
SplineReg_LM( X, Y, Z = NULL, offset = rep(0, length(X)), weights = rep(1, length(X)), InterKnots, n, extr = range(X), prob = 0.95, coefficients = NULL, only_pred = FALSE ) SplineReg_GLM( X, Y, Z, offset = rep(0, nobs), weights = rep(1, length(X)), InterKnots, n, extr = range(X), family, mustart, inits = NULL, etastart = NULL )
X |
a numeric vector containing |
Y |
a vector of size |
Z |
a design matrix with |
offset |
a vector of size |
weights |
an optional vector of ‘prior weights’ to be put on the observations in the fitting process in case the user requires weighted fitting. It is a vector of 1s by default. |
InterKnots |
a numeric vector containing the locations of the internal knots necessary to compute the B-splines. In GeDS these are the internal knots in a current iteration of stage A. |
n |
integer value specifying the order of the spline to be evaluated.
It should be 2 (linear spline), 3 (quadratic spline) or 4 (cubic spline).
Non-integer values will be passed to the function |
extr |
optional numeric vector of 2 elements representing the left-most
and right-most limits of the interval embedding the sample values of
|
prob |
the confidence level to be used for the confidence bands in the
|
coefficients |
optional vector of spline coefficients. If provided,
|
only_pred |
logical, if |
family |
a description of the error distribution and link function to be
used in the model. This can be a character string naming a family function, a
family function or the result of a call to a family function. See
|
mustart |
initial values for the vector of means in the IRLS estimation.
Must be a vector of length |
inits |
a numeric vector of length
|
etastart |
initial values for the predictor in the IRLS estimation
(alternative to providing either |
The functions estimate the coefficients of a predictor model with a spline
component (and possibly a parametric component) for a given, fixed order and
vector of knots of the spline and a specified distribution of the response
variable (from the Exponential Family). The functions SplineReg_LM
and
SplineReg_GLM
are based correspondingly on LS and IRLS and used
correspondingly in NGeDS
and GGeDS
, to estimate
the coefficients of the final GeDS fits of stage B, after their knots have
been positioned to coincide with the Greville abscissas of the knots of the
linear fit from stage A (see Dimitrova et al. 2023). Additional inference
related quantities are also computed (see Value below). The function
SplineReg_GLM
is also used to estimate the coefficients of the linear
GeDS fit of stage A within GGeDS
, whereas in
NGeDS
this estimation is performed internally leading to faster
R code.
In addition SplineReg_LM
computes some useful quantities among which
confidence intervals and the Control Polygon (see Section 2 of
Kaishev et al. 2016).
The confidence intervals contained in the output slot NCI
are
approximate local bands obtained considering the knots as fixed constants.
Hence the columns of the design matrix are seen as covariates and standard
methodology relying on the se.fit
option of predict.lm
or
predict.glm
is used. In the ACI
slot, asymptotic confidence
intervals are provided, following Kaishev et al (2006). If the variance
matrix is singular the Moore-Penrose pseudo-inverse is computed instead.
As mentioned, SplineReg_GLM
is intensively used in Stage A of the GeDS
algorithm implemented in GGeDS
and in order to make it as fast
as possible input data validation is mild. Hence it is expected that the user
checks carefully the input parameters before using SplineReg_GLM
. The
"Residuals
" in the output of this function are similar to the so
called “working residuals" in the glm
function.
"Residuals
" are the residuals used in the knot placement
procedure, i.e.
but
in contrast to glm
“working residuals", they are
computed using the final IRLS fitted . "
Residuals
"
are then used in locating the knots of the linear spline fit of Stage A.
In SplineReg_GLM
confidence intervals are not computed.
A list
containing:
Theta |
a vector containing the fitted coefficients of the spline regression component and the parametric component of the predictor model. |
Predicted |
a vector of |
Residuals |
a vector containing the normal regression residuals if
|
RSS |
the deviance for the fitted predictor model, defined as in
Dimitrova et al. (2023), which for |
NCI |
a list containing the lower ( |
Basis |
the matrix of B-spline regression functions and the covariates of the parametric part evaluated at the sample values of the covariate(s). |
Polygon |
a list containing x-y coordinates (" |
deviance |
a vector containing deviances computed at each IRLS step
(computed only with the |
temporary |
the result of the function |
ACI |
a list containing the lower ( |
Kaishev, V. K., Dimitrova, D. S., Haberman, S. & Verrall, R. J. (2006).
Geometrically designed, variable know regression splines: asymptotics and
inference (Statistical Research Paper No. 28).
London, UK: Faculty of Actuarial Science & Insurance, City University London.
URL: openaccess.city.ac.uk
Kaishev, V.K., Dimitrova, D.S., Haberman, S., & Verrall, R.J. (2016).
Geometrically designed, variable knot regression splines.
Computational Statistics, 31, 1079–1105.
DOI: doi:10.1007/s00180-015-0621-7
Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023).
Geometrically designed variable knot splines in generalized (non-)linear
models. Applied Mathematics and Computation, 436.
DOI: doi:10.1016/j.amc.2022.127493
NGeDS
, GGeDS
, Fitters
,
IRLSfit
, lm
and
glm.fit
.
These are computing engines called by NGeDS
and
GGeDS
, needed for the underlying fitting procedures.
UnivariateFitter( X, Y, Z = NULL, offset = rep(0, NROW(Y)), weights = rep(1, length(X)), beta = 0.5, phi = 0.5, min.intknots = 0, max.intknots = 300, q = 2, extr = range(X), show.iters = FALSE, tol = as.double(1e-12), stoptype = c("SR", "RD", "LR"), higher_order = TRUE, intknots_init = NULL, fit_init = NULL, only_pred = FALSE ) GenUnivariateFitter( X, Y, Z = NULL, offset = rep(0, NROW(Y)), weights = rep(1, length(X)), family = gaussian(), beta = 0.5, phi = 0.5, min.intknots = 0, max.intknots = 300, q = 2, extr = range(X), show.iters = F, tol = as.double(1e-12), stoptype = c("SR", "RD", "LR"), higher_order = TRUE )
UnivariateFitter( X, Y, Z = NULL, offset = rep(0, NROW(Y)), weights = rep(1, length(X)), beta = 0.5, phi = 0.5, min.intknots = 0, max.intknots = 300, q = 2, extr = range(X), show.iters = FALSE, tol = as.double(1e-12), stoptype = c("SR", "RD", "LR"), higher_order = TRUE, intknots_init = NULL, fit_init = NULL, only_pred = FALSE ) GenUnivariateFitter( X, Y, Z = NULL, offset = rep(0, NROW(Y)), weights = rep(1, length(X)), family = gaussian(), beta = 0.5, phi = 0.5, min.intknots = 0, max.intknots = 300, q = 2, extr = range(X), show.iters = F, tol = as.double(1e-12), stoptype = c("SR", "RD", "LR"), higher_order = TRUE )
X |
a numeric vector containing |
Y |
a vector of size |
Z |
a design matrix with |
offset |
a vector of size |
weights |
an optional vector of size |
beta |
numeric parameter in the interval |
phi |
numeric parameter in the interval |
min.intknots |
optional parameter allowing the user to set a minimum number of internal knots required. By default equal to zero. |
max.intknots |
optional parameter allowing the user to set a maximum
number of internal knots to be added by the GeDS estimation algorithm. By
default equal to the number of internal knots |
q |
numeric parameter which allows to fine-tune the stopping rule of
stage A of GeDS, by default equal to 2. See details in the description of
|
extr |
numeric vector of 2 elements representing the left-most and
right-most limits of the interval embedding the sample values of |
show.iters |
logical variable indicating whether or not to print
information at each step. By default equal to |
tol |
numeric value indicating the tolerance to be used in the knot placement steps in stage A. By default equal to 1E-12. See details below. |
stoptype |
a character string indicating the type of GeDS stopping rule
to be used. It should be either |
higher_order |
a logical that defines whether to compute the higher
order fits (quadratic and cubic) after stage A is run. Default is
|
intknots_init |
vector of initial internal knots from which to start the GeDS
Stage A iterations. See Section 3 of Kaishev et al. (2016). Default is |
fit_init |
A list containing fitted values |
only_pred |
logical, if |
family |
a description of the error distribution and link function to be
used in the model. This can be a character string naming a family function
(e.g. |
The functions UnivariateFitter
and GenUnivariateFitter
are in
general not intended to be used directly, they should be called through
NGeDS
and GGeDS
. However, in case there is a need
for multiple GeDS fitting (as may be the case e.g. in Monte Carlo simulations)
it may be efficient to use the fitters outside the main functions.
The argument tol
is used in the knot placement procedure of stage A of
the GeDS algorithm in order to check whether the current knot
is set at an acceptable location or not. If there exists a knot
such that
tol
, , then the
new knot is considered to be coalescent with an existing one, it is discarded
and the algorithm seeks alternative knot locations. By default it is equal to
1e-12.
See NGeDS
and GGeDS
, Kaishev et al. (2016) and
Dimitrova et al. (2023) for further details.
A GeDS-Class
object, but without the Formula
,
extcall
, terms
and znames
slots.
Kaishev, V.K., Dimitrova, D.S., Haberman, S., & Verrall, R.J. (2016).
Geometrically designed, variable knot regression splines.
Computational Statistics, 31, 1079–1105.
DOI: doi:10.1007/s00180-015-0621-7
Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023).
Geometrically designed variable knot splines in generalized (non-)linear
models.
Applied Mathematics and Computation, 436.
DOI: doi:10.1016/j.amc.2022.127493
# Examples similar to the ones # presented in NGeDS and in GGeDS # Generate a data sample for the response variable # Y and the covariate X set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N ,min = -2, max = 2)) # Specify a model for the mean of Y to include only # a component non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) # Fit a Normal GeDS regression model using the fitter function (Gmod <- UnivariateFitter(X, Y, beta = 0.6, phi = 0.995, extr = c(-2,2))) ############################################################## # second: very similar example, but based on Poisson data set.seed(123) X <- sort(runif(N , min = -2, max = 2)) means <- exp(f_1(X)) Y <- rpois(N,means) (Gmod2 <- GenUnivariateFitter(X, Y, beta = 0.2, phi = 0.995, family = poisson(), extr = c(-2,2))) # a plot showing quadratic and cubic fits, # in the predictor scale plot(X,log(Y), xlab = "x", ylab = expression(f[1](x))) lines(Gmod2, n = 3, col = "red") lines(Gmod2, n = 4, col = "blue", lty = 2) legend("topleft", c("Quadratic", "Cubic"), col = c("red", "blue"), lty = c(1,2))
# Examples similar to the ones # presented in NGeDS and in GGeDS # Generate a data sample for the response variable # Y and the covariate X set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N ,min = -2, max = 2)) # Specify a model for the mean of Y to include only # a component non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.1) # Fit a Normal GeDS regression model using the fitter function (Gmod <- UnivariateFitter(X, Y, beta = 0.6, phi = 0.995, extr = c(-2,2))) ############################################################## # second: very similar example, but based on Poisson data set.seed(123) X <- sort(runif(N , min = -2, max = 2)) means <- exp(f_1(X)) Y <- rpois(N,means) (Gmod2 <- GenUnivariateFitter(X, Y, beta = 0.2, phi = 0.995, family = poisson(), extr = c(-2,2))) # a plot showing quadratic and cubic fits, # in the predictor scale plot(X,log(Y), xlab = "x", ylab = expression(f[1](x))) lines(Gmod2, n = 3, col = "red") lines(Gmod2, n = 4, col = "blue", lty = 2) legend("topleft", c("Quadratic", "Cubic"), col = c("red", "blue"), lty = c(1,2))
This function plots the NGeDSboost
fit to the data at the
beginning of a given boosting iteration and then plots the subsequent
NGeDS
fit on the corresponding negative gradient.
Note: Applicable only for NGeDSboost
models with a single
univariate base-learner.
## S3 method for class 'GeDSboost' visualize_boosting(object, iters = NULL, final_fits = FALSE)
## S3 method for class 'GeDSboost' visualize_boosting(object, iters = NULL, final_fits = FALSE)
object |
a |
iters |
numeric, specifies the iteration(s) number. |
final_fits |
logical indicating whether the final linear, quadratic and cubic fits should be plotted. |
# Load packages library(GeDS) # Generate a data sample for the response variable # Y and the single covariate X set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only a component # non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.2) data = data.frame(X, Y) Gmodboost <- NGeDSboost(Y ~ f(X), data = data, normalize_data = TRUE) # Plot plot(X, Y, pch=20, col=c("darkgrey")) lines(X, sapply(X, f_1), col = "black", lwd = 2) lines(X, Gmodboost$predictions$pred_linear, col = "green4", lwd = 2) lines(X, Gmodboost$predictions$pred_quadratic, col="red", lwd=2) lines(X, Gmodboost$predictions$pred_cubic, col="purple", lwd=2) legend("topright", legend = c("Order 2 (degree=1)", "Order 3 (degree=2)", "Order 4 (degree=3)"), col = c("green4", "red", "purple"), lty = c(1, 1), lwd = c(2, 2, 2), cex = 0.75, bty="n", bg = "white") # Visualize boosting iterations + final fits par(mfrow=c(4,2)) visualize_boosting(Gmodboost, iters = 0:3, final_fits = TRUE) par(mfrow=c(1,1))
# Load packages library(GeDS) # Generate a data sample for the response variable # Y and the single covariate X set.seed(123) N <- 500 f_1 <- function(x) (10*x/(1+100*x^2))*4+4 X <- sort(runif(N, min = -2, max = 2)) # Specify a model for the mean of Y to include only a component # non-linear in X, defined by the function f_1 means <- f_1(X) # Add (Normal) noise to the mean of Y Y <- rnorm(N, means, sd = 0.2) data = data.frame(X, Y) Gmodboost <- NGeDSboost(Y ~ f(X), data = data, normalize_data = TRUE) # Plot plot(X, Y, pch=20, col=c("darkgrey")) lines(X, sapply(X, f_1), col = "black", lwd = 2) lines(X, Gmodboost$predictions$pred_linear, col = "green4", lwd = 2) lines(X, Gmodboost$predictions$pred_quadratic, col="red", lwd=2) lines(X, Gmodboost$predictions$pred_cubic, col="purple", lwd=2) legend("topright", legend = c("Order 2 (degree=1)", "Order 3 (degree=2)", "Order 4 (degree=3)"), col = c("green4", "red", "purple"), lty = c(1, 1), lwd = c(2, 2, 2), cex = 0.75, bty="n", bg = "white") # Visualize boosting iterations + final fits par(mfrow=c(4,2)) visualize_boosting(Gmodboost, iters = 0:3, final_fits = TRUE) par(mfrow=c(1,1))