Package 'GeDS'

Title:	Geometrically Designed Spline Regression
Description:	Spline Regression, Generalized Additive Models, and Component-wise Gradient Boosting, utilizing Geometrically Designed (GeD) Splines. GeDS regression is a non-parametric method inspired by geometric principles, for fitting spline regression models with variable knots in one or two independent variables. It efficiently estimates the number of knots and their positions, as well as the spline order, assuming the response variable follows a distribution from the exponential family. GeDS models integrate the broader category of Generalized (Non-)Linear Models, offering a flexible approach to modeling complex relationships. A description of the method can be found in Kaishev et al. (2016) <doi:10.1007/s00180-015-0621-7> and Dimitrova et al. (2023) <doi:10.1016/j.amc.2022.127493>. Further extending its capabilities, GeDS's implementation includes Generalized Additive Models (GAM) and Functional Gradient Boosting (FGB), enabling versatile multivariate predictor modeling, as discussed in the forthcoming work of Dimitrova et al. (2025).
Authors:	Dimitrina S. Dimitrova [aut], Vladimir K. Kaishev [aut], Andrea Lattuada [aut], Emilio L. Sáenz Guillén [aut, cre], Richard J. Verrall [aut]
Maintainer:	Emilio L. Sáenz Guillén <[email protected]>
License:	GPL-3
Version:	0.2.7
Built:	2025-03-11 08:45:28 UTC
Source:	https://github.com/emilioluissaenzguillen/geds

Help Index

GeDS
Barium-Ferrum-Arsenide Powder Diffraction Data
Fitter function for GeD Spline Regression for bivariate data
Base Learner Importance for GeDSboost objects
Coal Mining Disasters data
Coef method for GeDS objects
Coef method for GeDSboost, GeDSgam
k-fold cross-validation
Crystallographic Scattering Data
Derivative of GeDS objects
Deviance method for GeDS, GeDSboost, GeDSgam
Death counts in England and Wales
Defining the covariates for the spline component in a GeDS formula.
Formula for the predictor model
GeDS Class
GeDSboost Class
GeDSgam Class
Generalized Geometrically Designed Spline regression estimation
Defined integral of GeDS objects
IRLS Estimation
Knots method for GeDS, GeDSboost, GeDSgam
Lines method for GeDS objects.
Geometrically Designed Spline regression estimation
Component-wise gradient boosting with NGeDS base-learners
NGeDSgam: Local Scoring Algorithm with GeD Splines in Backfitting
Plot method for GeDS objects.
Plot method for GeDSboost objects.
Plot method for GeDSgam objects.
Piecewise Polynomial Spline Representation
Predict method for GeDS objects
Predict method for GeDSboost, GeDSgam
Print method for GeDS, GeDSboost, GeDSgam
Estimation of the coefficients of a predictor model with spline and possibly parametric components.
Functions used to fit GeDS objects with an univariate spline regression component
Visualize Boosting Iterations

Geometrically Designed Splines (GeDS) regression is a non-parametric method for fitting spline regression models with variable knots. The GeDS technique is inspired by geometric principles and falls within the domain of generalized non-linear models (GNM), which include generalized linear models (GLM) as a special case. GeDS regression is fitted based on a sample of $N$ observations of a response variable $y$ , dependent on a set of (currently up to two) covariates, assuming $y$ has a distribution from the exponential family. In addition, GeDS methodology is implemented both in the context of Generalized Additive Models (GAM) and Functional Gradient Boosting (FGB). On the one hand, GAM consist of an additive modeling technique where the impact of the predictor variables is captured through smooth (GeDS, in this case) functions. On the other hand, GeDS incorporates gradient boosting machine learning technique by implementing functional gradient descent algorithm to optimize general risk functions utilizing component-wise GeDS estimates.

Details

GeDS provides a novel solution to the spline regression problem and in particular, to the problem of estimating the number and position of the knots. The GeDS estimation method is based on: first, constructing a piecewise linear fit (spline fit of order 2) which captures the underlying functional shape of determined by the data (Stage A); second, approximating the latter fit through shape preserving (variation diminishing) spline fits of higher orders $n = 3$ , $n = 4$ , $\dots$ (i.e., degrees 2, 3, $\dots$ ) at stage B. As a result, GeDS simultaneously produces a linear, a quadratic and a cubic spline fit.

The GeDS method was originally developed by Kaishev et al. (2016) assuming the response variable $y$ to be normally distributed and a corresponding Mathematica code was provided.

The GeDS method was extended by Dimitrova et al. (2023) to cover any distribution from the exponential family. The GeDS R package presented here includes an enhanced R implementation of the original Normal GeDS Mathematica code due to Kaishev et al. (2016), implemented as the NGeDS function and a generalization of it in the function GGeDS which covers the case of any distribution from the exponential family.

The GeDS package allows also to fit two dimensional response surfaces and to construct multivariate (predictor) models with a GeD spline component and a parametric component (see the functions f, formula, NGeDS and GGeDS for details).

Dimitrova et al. (2025) have recently made significant enhancements to the GeDS methodology, by incorporating generalized additive models (GAM) and functional gradient boosting (FGB). On the one hand, generalized additive models are encompassed by implementing the local-scoring algorithm using normal GeD splines (i.e., NGeDS) as function smoothers within the backfitting iterations. This is implemented via the function NGeDSgam. On the other hand, the GeDS package incorporates the functional gradient descent algorithm by utilizing normal GeD splines (i.e., NGeDS) as base learners. This is implemented via the function NGeDSboost.

The outputs of both NGeDS and GGeDS functions are GeDS-class objects, while the outputs of NGeDSgam and NGeDSboost functions are GeDSgam-class and GeDSboost-class objects, respectively. GeDS-class, GeDSgam-class and GeDSboost-class objects contain second, third and fourth order spline fits. As described in Kaishev et al. (2016), Dimitrova et al. (2023) and Dimitrova et al. (2025), the "final" GeDS fit is the one minimizing the empirical deviance. Nevertheless, the user can choose to use any of the available fits.

The GeDS package also includes some datasets where GeDS regression proves to be very efficient and some user friendly functions that are designed to easily extract required information. Several methods are also provided to handle GeDS, GAM-GeDS and FGB-GeDS output results (see GeDS-class, GeDSgam-class and GeDSboost-class, respectively).

Throughout this document, we use the terms GeDS predictor model, GeDS regression and GeDS fit interchangeably.

Please report any issue arising or bug in the code to [email protected].

Package:	GeDS
Version:	0.2.6
Date:	2025-02-10
License:	GPL-3

Author(s)

Dimitrina S. Dimitrova <[email protected]>,#' Vladimir K. Kaishev <[email protected]>, Andrea Lattuada <[email protected]>, Emilio L. Sáenz Guillén <[email protected]> and Richard J. Verrall <[email protected]>

References

Kaishev, V.K., Dimitrova, D.S., Haberman, S., & Verrall, R.J. (2016). Geometrically designed, variable knot regression splines. Computational Statistics, 31, 1079–1105.
DOI: doi:10.1007/s00180-015-0621-7

Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J. (2023). Geometrically designed variable knot splines in generalized (non-)linear models. Applied Mathematics and Computation, 436.
DOI: doi:10.1016/j.amc.2022.127493

Dimitrova, D. S., Kaishev, V. K. and Saenz Guillen, E. L. (2025). GeDS: An R Package for Regression, Generalized Additive Models and Functional Gradient Boosting, based on Geometrically Designed (GeD) Splines. Manuscript submitted for publication.

Barium-Ferrum-Arsenide Powder Diffraction Data

Description

This dataset contains the results of a neutron diffraction experiment on Barium-Ferrum-Arsenide ( $\mathrm{Ba Fe_2 As_2}$ ) powder carried out by Kimber et al. (2009) and used in Kaishev et al. (2016). The neutron diffraction intensity was measured at 1,151 different dispersion angles in order to model the diffraction profile.

Usage

data(BaFe2As2)
data(BaFe2As2)

Format

A data.frame with 1151 cases and 2 variables:

angle: the dispersion angle, viewed as the independent variable.
intensity: the neutron diffraction intensity, viewed as the response variable.

Source

openaccess.city.ac.uk

References

Kimber, S.A.J., Kreyssig, A., Zhang, Y.Z., Jeschke, H.O., Valenti, R., Yokaichiya, F., Colombier, E., Yan, J., Hansen, T.C., Chatterji, T., McQueeney, R.J., Canfield, P.C., Goldman, A.I. and Argyriou, D.N. (2009). Similarities between structural distortions under pressure and chemical doping in superconducting $\mathrm{Ba Fe_2 As_2}$ . Nat Mater, 8, 471–475.

Kaishev, V.K., Dimitrova, D.S., Haberman, S. and Verrall, R.J. (2016). Geometrically designed, variable knot regression splines. Computational Statistics, 31, 1079–1105.
DOI: doi:10.1007/s00180-015-0621-7

Examples

## Not run: 
# to load the data
data('BaFe2As2')

# fit a GeDS regression and produce a simple plot of the result. See ?NGeDS
# c.f. Kaishev et al. (2016), section 4.2
(Gmod <- NGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.6, phi = 0.99,
               q = 3, show.iters = T))
plot(Gmod)

## End(Not run)

## Not run: 
# to load the data
data('BaFe2As2')

# fit a GeDS regression and produce a simple plot of the result. See ?NGeDS
# c.f. Kaishev et al. (2016), section 4.2
(Gmod <- NGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.6, phi = 0.99,
               q = 3, show.iters = T))
plot(Gmod)

## End(Not run)

Fitter function for GeD Spline Regression for bivariate data

Description

These are computing engines called by NGeDS and GGeDS, needed for the underlying fitting procedures.

Usage

BivariateFitter(
  X,
  Y,
  Z,
  W,
  weights = rep(1, length(X)),
  Indicator,
  beta = 0.5,
  phi = 0.99,
  min.intknots = 0,
  max.intknots = 300,
  q = 2,
  Xextr = range(X),
  Yextr = range(Y),
  show.iters = TRUE,
  tol = as.double(1e-12),
  stoptype = c("SR", "RD", "LR"),
  higher_order = TRUE,
  Xintknots = NULL,
  Yintknots = NULL
)

GenBivariateFitter(
  X,
  Y,
  Z,
  W,
  family = family,
  weights = rep(1, length(X)),
  Indicator,
  beta = 0.5,
  phi = 0.5,
  min.intknots = 0,
  max.intknots = 300,
  q = 2,
  Xextr = range(X),
  Yextr = range(Y),
  show.iters = TRUE,
  tol = as.double(1e-12),
  stoptype = c("SR", "RD", "LR"),
  higher_order = TRUE
)
BivariateFitter(
  X,
  Y,
  Z,
  W,
  weights = rep(1, length(X)),
  Indicator,
  beta = 0.5,
  phi = 0.99,
  min.intknots = 0,
  max.intknots = 300,
  q = 2,
  Xextr = range(X),
  Yextr = range(Y),
  show.iters = TRUE,
  tol = as.double(1e-12),
  stoptype = c("SR", "RD", "LR"),
  higher_order = TRUE,
  Xintknots = NULL,
  Yintknots = NULL
)

GenBivariateFitter(
  X,
  Y,
  Z,
  W,
  family = family,
  weights = rep(1, length(X)),
  Indicator,
  beta = 0.5,
  phi = 0.5,
  min.intknots = 0,
  max.intknots = 300,
  q = 2,
  Xextr = range(X),
  Yextr = range(Y),
  show.iters = TRUE,
  tol = as.double(1e-12),
  stoptype = c("SR", "RD", "LR"),
  higher_order = TRUE
)

Arguments

`X`	a numeric vector containing $N$ sample values of the first independent variable chosen to enter the spline regression component of the predictor model.
`Y`	a numeric vector containing $N$ sample values of the second independent variable chosen to enter the spline regression component of the predictor model.
`Z`	a vector of size $N$ containing the observed values of the response variable.
`W`	a design matrix with $N$ rows containing other covariates selected to enter the parametric component of the predictor model (see `formula`). If no such covariates are selected, it is set to `NULL` by default.
`weights`	an optional vector of size $N$ of ‘prior weights’ to be put on the observations in the fitting process in case the user requires weighted GeDS fitting. It is `NULL` by default.
`Indicator`	contingency table (i.e., frequency of observations) for the independent variables `X` and `Y`.
`beta`	numeric parameter in the interval $[0,1]$ tuning the knot placement in stage A of GeDS. See the description of `NGeDS` or `GGeDS`.
`phi`	numeric parameter in the interval $[0,1]$ specifying the threshold for the stopping rule (model selector) in stage A of GeDS. See also `stoptype` and details in the description of `NGeDS` or `GGeDS`.
`min.intknots`	optional parameter specifying the minimum number of internal knots required in Stage A's fit. Default is zero.
`max.intknots`	optional parameter allowing the user to set a maximum number of internal knots to be added in Stage A by the GeDS estimation algorithm. Default equals the number of internal knots $\kappa$ for the saturated GeDS model (i.e. $\kappa=N-2$ ).
`q`	numeric parameter which allows to fine-tune the stopping rule of stage A of GeDS, by default equal to 2. See details in the description of `NGeDS` or `GGeDS`.
`Xextr`	boundary knots in the `X` direction. By default equal to the range of `X`.
`Yextr`	boundary knots in the `Y` direction. By default equal to the range of `Y`.
`show.iters`	logical variable indicating whether or not to print fitting information at each step. Default is `FALSE`.
`tol`	numeric value indicating the tolerance to be used in checking whether two knots should be considered different during the knot placement steps in stage A.
`stoptype`	a character string indicating the type of GeDS stopping rule to be used. It should be either `"SR"`, `"RD"` or `"LR"`, partial match allowed. See details of `NGeDS` or `GGeDS`.
`higher_order`	a logical defining whether to compute the higher order fits (quadratic and cubic) after stage A is run. Default is `TRUE`.
`Xintknots`	a vector of starting internal knots in the `X` direction. Allows the user to begin Stage A's GeDS algorithm with a linear spline fit using a predefined vector of internal `X` knots, instead of starting with a straight line fit. Default is `NULL`.
`Yintknots`	a vector of starting internal knots in the `Y` direction. Allows the user to begin Stage A's GeDS algorithm with a linear spline fit using a predefined vector of internal `X` knots, instead of starting with a straight line fit. Default is `NULL`.
`family`	a description of the error distribution and link function to be used in the model. This can be a character string naming a family function (e.g. `"gaussian"`), the family function itself (e.g. `gaussian`) or the result of a call to a family function (e.g. `gaussian()`). See family for details on family functions.

Value

A GeDS-Class object, but without the Formula, extcall, terms and znames slots.

References

Base Learner Importance for GeDSboost objects

Description

S3 method for GeDSboost-class objects that calculates the in-bag risk reduction ascribable to each base-learner of an FGB-GeDS model. Essentially, it measures and aggregates the decrease in the empirical risk attributable to each base-learner for every time it is selected across the boosting iterations. This provides a measure on how much each base-learner contributes to the overall improvement in the model's accuracy, as reflectedp by the decrease in the empiral risk (loss function). This function is adapted from varimp and is compatible with the available mboost-package methods for varimp, including plot, print and as.data.frame.

Usage

## S3 method for class 'GeDSboost'
bl_imp(object, boosting_iter_only = FALSE, ...)
## S3 method for class 'GeDSboost'
bl_imp(object, boosting_iter_only = FALSE, ...)

Arguments

`object`	an object of class `GeDSboost-class`.
`boosting_iter_only`	logical value, if `TRUE` then base-learner in-bag risk reduction is only computed across boosting iterations, i.e., without taking into account a potential initial GeDS learner.
`...`	potentially further arguments.

Details

See varimp for details.

Value

An object of class varimp with available plot, print and as.data.frame methods.

References

Hothorn T., Buehlmann P., Kneib T., Schmid M. and Hofner B. (2022). mboost: Model-Based Boosting. R package version 2.9-7, https://CRAN.R-project.org/package=mboost.

Examples

library(GeDS)
library(TH.data)
set.seed(290875)
data("bodyfat", package = "TH.data")
data = bodyfat
Gmodboost <- NGeDSboost(formula = DEXfat ~ f(hipcirc) + f(kneebreadth) + f(anthro3a),
                        data = data, initial_learner = FALSE)
MSE_Gmodboost_linear <- mean((data$DEXfat - Gmodboost$predictions$pred_linear)^2)
MSE_Gmodboost_quadratic <- mean((data$DEXfat - Gmodboost$predictions$pred_quadratic)^2)
MSE_Gmodboost_cubic <- mean((data$DEXfat - Gmodboost$predictions$pred_cubic)^2)

# Print MSE
cat("\n", "MEAN SQUARED ERROR", "\n",
    "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n",
    "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n",
    "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n")

# Base Learner Importance
bl_imp <- bl_imp(Gmodboost)
print(bl_imp)
plot(bl_imp)

library(GeDS)
library(TH.data)
set.seed(290875)
data("bodyfat", package = "TH.data")
data = bodyfat
Gmodboost <- NGeDSboost(formula = DEXfat ~ f(hipcirc) + f(kneebreadth) + f(anthro3a),
                        data = data, initial_learner = FALSE)
MSE_Gmodboost_linear <- mean((data$DEXfat - Gmodboost$predictions$pred_linear)^2)
MSE_Gmodboost_quadratic <- mean((data$DEXfat - Gmodboost$predictions$pred_quadratic)^2)
MSE_Gmodboost_cubic <- mean((data$DEXfat - Gmodboost$predictions$pred_cubic)^2)

# Print MSE
cat("\n", "MEAN SQUARED ERROR", "\n",
    "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n",
    "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n",
    "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n")

# Base Learner Importance
bl_imp <- bl_imp(Gmodboost)
print(bl_imp)
plot(bl_imp)

Coal Mining Disasters data

Description

A dataset with 112 entries containing annual numbers of accidents due to disasters in British coal mines for years from 1850 to 1962, considered in Carlin et al. (1992) and also in Eilers and Marx (1996).

Usage

data(coalMining)
data(coalMining)

Format

A data.frame with 112 entries, corresponding to the years from 1850 to 1962. Each entry has:

accidents: number of severe accidents that have occurred each year.
years: year during which the accidents occurred.

Source

https://people.reed.edu/~jones/141/Coal.html

References

Carlin, B.P., Gelfand, A.E. and Smith, A.F.M. (1992). Hierarchical Bayesian analysis of changepoint problems. Applied Statistics, 41(2), 389–405.

Eilers, P.H.C. and Marx, B.D. (1996). Flexible Smoothing with B-splines and Penalties. Statistical Science, 11(2), 89–121.

Coef method for GeDS objects

Description

Methods for the functions coef and coefficients that allow to extract the estimated coefficients of a fitted GeDS regression from a GeDS-Class object.

Usage

## S3 method for class 'GeDS'
coef(object, n = 3L, onlySpline = TRUE, ...)

## S3 method for class 'GeDS'
coefficients(object, n = 3L, onlySpline = TRUE, ...)
## S3 method for class 'GeDS'
coef(object, n = 3L, onlySpline = TRUE, ...)

## S3 method for class 'GeDS'
coefficients(object, n = 3L, onlySpline = TRUE, ...)

Arguments

`object`	the `GeDS-class` object from which the coefficients of the selected GeDS regression should be extracted.
`n`	integer value (2, 3 or 4) specifying the order ( $=$ degree $+ 1$ ) of the GeDS fit whose coefficients should be extracted. By default equal to `3L`. Non-integer values will be passed to the function `as.integer`.
`onlySpline`	logical variable specifying whether only the coefficients for the GeDS component of a fitted multivariate regression model should be extracted or if, alternatively, also the coefficients of the parametric component should also be extracted.
`...`	potentially further arguments (required by the definition of the generic function). These will be ignored, but with a warning.

Details

These are simple methods for the functions coef and coefficients.

As GeDS-class objects contain three different fits (linear, quadratic and cubic), it is possible to specify the order of the fit for which GeDS regression coefficients are required via the input argument n.

As mentioned in the details of formula, the predictor model may be multivariate and it may include a (univariate or bivariate) GeD spline component whereas the remaining variables may be part of a parametric component. If the onlySpline argument is set to TRUE (the default value), only the coefficients corresponding to the GeD spline component of order n of the multivariate predictor model are extracted.

Value

A named vector containing the required coefficients of the fitted univariate or multivariate predictor model. The coefficients corresponding to the variables that enter the parametric component of the fitted multivariate predictor model are named as the variables themselves. The coefficients of the GeDS component are coded as "N" followed by the index of the corresponding B-spline.

Examples

# Generate a data sample for the response variable
# and the covariates
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N ,min = -2, max = 2))
Z <- runif(N)
# Specify a model for the mean of the response Y to be a superposition of
# a non-linear component f_1(X), a linear component 2*Z and a
# free term 1, i.e.
means <- f_1(X) + 2*Z + 1
# Add normal noise to the mean of y
Y <- rnorm(N, means, sd = 0.1)

# Fit to this sample a predictor model of the form f(X) + Z, where
# f(X) is the GeDS component and Z is the linear (additive) component
# see ?formula.GeDS for details
(Gmod <- NGeDS(Y ~ f(X) + Z, beta = 0.6, phi = 0.995, Xextr = c(-2,2)))

# Extract the GeD spline regression coefficients
coef(Gmod, n = 3)

# Extract all the coefficients, including the one for the linear component
coef(Gmod, onlySpline = FALSE, n = 3)

# Generate a data sample for the response variable
# and the covariates
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N ,min = -2, max = 2))
Z <- runif(N)
# Specify a model for the mean of the response Y to be a superposition of
# a non-linear component f_1(X), a linear component 2*Z and a
# free term 1, i.e.
means <- f_1(X) + 2*Z + 1
# Add normal noise to the mean of y
Y <- rnorm(N, means, sd = 0.1)

# Fit to this sample a predictor model of the form f(X) + Z, where
# f(X) is the GeDS component and Z is the linear (additive) component
# see ?formula.GeDS for details
(Gmod <- NGeDS(Y ~ f(X) + Z, beta = 0.6, phi = 0.995, Xextr = c(-2,2)))

# Extract the GeD spline regression coefficients
coef(Gmod, n = 3)

# Extract all the coefficients, including the one for the linear component
coef(Gmod, onlySpline = FALSE, n = 3)

Coef method for GeDSboost, GeDSgam

Description

Methods for the functions coef and coefficients that allow to extract the estimated coefficients of a GeDSboost-class or GeDSgam-class object.

Usage

## S3 method for class 'GeDSboost'
coef(object, n = 3L, ...)

## S3 method for class 'GeDSboost'
coefficients(object, n = 3L, ...)

## S3 method for class 'GeDSgam'
coef(object, n = 3L, ...)

## S3 method for class 'GeDSgam'
coefficients(object, n = 3L, ...)
## S3 method for class 'GeDSboost'
coef(object, n = 3L, ...)

## S3 method for class 'GeDSboost'
coefficients(object, n = 3L, ...)

## S3 method for class 'GeDSgam'
coef(object, n = 3L, ...)

## S3 method for class 'GeDSgam'
coefficients(object, n = 3L, ...)

Arguments

object

the GeDSboost-class or GeDSgam-class object from which the coefficients should be extracted.

n

integer value (2, 3 or 4) specifying the order ( $=$ degree $+ 1$ ) of the FGB-GeDS/GAM-GeDS fit whose coefficients should be extracted.

If n = 2L the piecewise polynomial coefficients of the univariate GeDS base-learners are provided. For bivariate GeDS base-learners, and if class(object) == "GeDSboost", the B-spline coefficients for each iteration where the corresponding bivariate base-learner was selected are provided. For bivariate base-learners, and if class(object) == "GeDSgam", the final local-scoring B-spline coefficients are provided.
If n = 3L or n = 4L B-spline coefficients are provided for both univariate and bivariate GeDS learners.

By default n is equal to 3L. Non-integer values will be passed to the function as.integer.

...

potentially further arguments (required by the definition of the generic function). These will be ignored, but with a warning.

Value

A named vector containing the required coefficients of the fitted multivariate predictor model.

k-fold cross-validation

Description

crossv_GeDS performs k-fold cross-validation for tuning the relevant parameters of the NGeDS, GGeDS, NGeDSgam, and NGeDSboost models.

Arguments

`formula`	a description of the structure of the model structure, including the dependent and independent variables.
`data`	a data frame containing the variables referenced in the formula.
`model_fun`	the GeDS model to be fitted, that is, `NGeDS`, `GGeDS`, `NGeDSgam` or `NGeDSboost`.
`parameters`	to tune via cross-validation. These are: `beta`, `phi` and `q` in the case of `NGeDS`, `GGeDS` and `NGeDSgam`. In addition, for `NGeDSboost`, `int.knots_init` and `shrinkage` can also be tuned. Default values are `int.knots_init_grid = c(0, 1, 2)`, `shrinkage_grid = c(0.1, 0.5, 1)`, `beta_grid = c(0.3, 0.5, 0.7)`, `phi_grid = c(0.9, 0.95, 0.99)`, `q_grid = c(2, 3))`.

Value

Two data frames, best_params and results. best_params contains the best combination of parameters according to the cross-validated MSE. results presents the cross-validated MSE and the average number of internal knots across the folds for each possible combination of parameters, given the input parameters. In the case of model_fun = NGeDSboost, it also provides the cross-validated number of boosting iterations.

Examples

###################################################
# Generate a data sample for the response variable
# Y and the single covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)
data = data.frame(X = X, Y = Y)

## Not run: 
## NGeDS
# Define different combinations of parameters to cross-validate
param = list(beta_grid = c(0.5),
             phi_grid = c(0.9, 0.95),
             q_grid = c(2))

cv_NGeDS <- crossv_GeDS(Y ~ f(X), data = data, NGeDS, n = 3L,
                        parameters = param)

print(cv_NGeDS$best_params)
View(cv_NGeDS$results)

## NGeDSboost
param = list(int.knots_init_grid = c(1, 2),
             shrinkage_grid = 1,
             beta_grid = c(0.3, 0.5),
             phi_grid = c(0.95, 0.99),
             q_grid = 2)

cv_NGeDSboost <- crossv_GeDS(Y ~ f(X), data = data, NGeDSboost, n = 2L,
                             n_folds = 2L, parameters = param)

print(cv_NGeDSboost$best_params)
View(cv_NGeDSboost$results)

## End(Not run)

###################################################
# Generate a data sample for the response variable
# Y and the single covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)
data = data.frame(X = X, Y = Y)

## Not run: 
## NGeDS
# Define different combinations of parameters to cross-validate
param = list(beta_grid = c(0.5),
             phi_grid = c(0.9, 0.95),
             q_grid = c(2))

cv_NGeDS <- crossv_GeDS(Y ~ f(X), data = data, NGeDS, n = 3L,
                        parameters = param)

print(cv_NGeDS$best_params)
View(cv_NGeDS$results)

## NGeDSboost
param = list(int.knots_init_grid = c(1, 2),
             shrinkage_grid = 1,
             beta_grid = c(0.3, 0.5),
             phi_grid = c(0.95, 0.99),
             q_grid = 2)

cv_NGeDSboost <- crossv_GeDS(Y ~ f(X), data = data, NGeDSboost, n = 2L,
                             n_folds = 2L, parameters = param)

print(cv_NGeDSboost$best_params)
View(cv_NGeDSboost$results)

## End(Not run)

Crystallographic Scattering Data

Description

This dataset contains crystallographic measurements obtained from a particle accelerator experiment. The measurements correspond to the function $F(Q)$ at various $Q$ values, which are used to analyze the scattering properties of an unknown crystalline material. The dataset is available in two versions based on the precision of the measurements:

- **'CrystalData10k'** (lower precision); - **'CrystalData300k'** (higher precision).

The goal of the experiment is to estimate $F(Q)$ from noisy data using a GeDS model and compute its Fourier transform, which provides valuable insights into the structure of the material.

Usage

data(CrystalData10k)

data(CrystalData300k)
data(CrystalData10k)

data(CrystalData300k)

Format

A data.frame with 1721 observations and 2 variables:

Q ( $\text{Å}^{-1}$ ): The scattering vector, measured in inverse angstroms ( $\text{Å}^{-1}$ ).
FQ (a.u.): The measured function $F(Q)$ , given in arbitrary units (a.u.).

Source

Data collected from a particle accelerator experiment.

Examples

## Not run: 
# Load the dataset (choose 10k or 300k version)
data('CrystalData10k')

# Fit a GeDS/GeDSboost model and compare how well the intensity peaks are captured
Gmod <- NGeDS(F_Q ~ f(Q), data = CrystalData10k, phi = 0.999, q = 3)
# for CrystalData300k set int.knots_init = 1, phi = 0.999, q = 4, instead
Gmodboost <- NGeDSboost(F_Q ~ f(Q), data = CrystalData10k, phi = 0.9975, q = 4) 

par(mfrow = c(1,2))
plot(Gmod, n = 2)
plot(Gmodboost, n = 2) 

## End(Not run)

## Not run: 
# Load the dataset (choose 10k or 300k version)
data('CrystalData10k')

# Fit a GeDS/GeDSboost model and compare how well the intensity peaks are captured
Gmod <- NGeDS(F_Q ~ f(Q), data = CrystalData10k, phi = 0.999, q = 3)
# for CrystalData300k set int.knots_init = 1, phi = 0.999, q = 4, instead
Gmodboost <- NGeDSboost(F_Q ~ f(Q), data = CrystalData10k, phi = 0.9975, q = 4) 

par(mfrow = c(1,2))
plot(Gmod, n = 2)
plot(Gmodboost, n = 2) 

## End(Not run)

Derivative of GeDS objects

Description

This function computes derivatives of a fitted GeDS regression model.

Usage

Derive(object, order = 1L, x, n = 3L)
Derive(object, order = 1L, x, n = 3L)

Arguments

`object`	the `GeDS-Class` object containing the GeDS fit which should be differentiated. It should be the result of fitting a univariate GeDS regression via `NGeDS` or `GGeDS`.
`order`	integer value indicating the order of differentiation required (e.g. first, second or higher derivatives). Note that `order` should be lower than `n` and that non-integer values will be passed to the function `as.integer`.
`x`	numeric vector containing values of the independent variable at which the derivatives of order `order` should be computed.
`n`	integer value (2, 3 or 4) specifying the order ( $=$ degree $+ 1$ ) of the GeDS fit to be differentiated. By default equal to `3L`.

Details

The function is based on splineDesign and it computes the exact derivative of the fitted GeDS regression.

The function uses the property that the $m$ -th derivative of a spline, $m= 1,2,...$ , expressed in terms of B-splines can be computed by differentiating the corresponding B-spline coefficients (see e.g. De Boor, 2001, Chapter X, formula (15)). Since the GeDS fit is a B-spline representation of the predictor, it cannot work on the response scale in the GNM (GLM) framework.

References

De Boor, C. (2001). A Practical Guide to Splines (Revised Edition). Springer, New York.

Examples


# Generate a data sample for the response variable
# Y and the covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only
# a component non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)

# Fit GeDS regression with only a spline component in the predictor model
Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2))

# Compute the second derivative of the cubic GeDS fit
# at the points 0, -1 and 1
Derive(Gmod, x = c(0, -1, 1), order = 2, n = 4)

# Generate a data sample for the response variable
# Y and the covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only
# a component non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)

# Fit GeDS regression with only a spline component in the predictor model
Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2))

# Compute the second derivative of the cubic GeDS fit
# at the points 0, -1 and 1
Derive(Gmod, x = c(0, -1, 1), order = 2, n = 4)

Deviance method for GeDS, GeDSboost, GeDSgam

Description

Method for the function deviance that allows the user to extract the value of the deviance corresponding to a selected GeDS, GeDSboost or GeDSgam fit from a GeDS-Class, GeDSboost-Class or GeDSgam-Class object.

Usage

## S3 method for class 'GeDS'
deviance(object, n = 3L, ...)

## S3 method for class 'GeDSboost'
deviance(object, n = 3L, ...)

## S3 method for class 'GeDSgam'
deviance(object, n = 3L, ...)
## S3 method for class 'GeDS'
deviance(object, n = 3L, ...)

## S3 method for class 'GeDSboost'
deviance(object, n = 3L, ...)

## S3 method for class 'GeDSgam'
deviance(object, n = 3L, ...)

Arguments

`object`	the `GeDS-class`, `GeDSboost-class` or `GeDSgam-class` object from which the deviance should be extracted.
`n`	integer value (2, 3 or 4) specifying the order ( $=$ degree $+ 1$ ) of the GeDS/GeDSboost/GeDSgam fit whose deviance should be extracted. By default equal to `3L`. Non-integer values will be passed to the function `as.integer`.
`...`	potentially further arguments (required by the definition of the generic function). These will be ignored, but with a warning.

Details

This is a method for the function deviance. As GeDS-class, GeDSboost-class and GeDSgam-class objects contain three different fits (linear, quadratic and cubic), it is possible to specify the order of the GeDS fit for which the deviance is required via the input argument n.

Value

A numeric value corresponding to the deviance of the selected GeDS/GeDSboost/GeDSgam fit.

Death counts in England and Wales

Description

The dataset consists of information about the mortality of the English and Welsh male population aggregated over the years 2000, 2001 and 2002.

Usage

data(EWmortality)
data(EWmortality)

Format

A data.frame with 109 entries and 3 variables: Age, Deaths and Exposure. Exposure is a mid-year estimate of the population exposed to risk.

Defining the covariates for the spline component in a GeDS formula.

Description

In general the GeDS predictor model may include a GeD spline regression component with respect to one or two independent variables and a parametric component in which the remaining covariates may enter as additive terms.

The function f is to be used in the formula argument of NGeDS or GGeDS in order to specify which independent variables (covariates) should be included in the GeD spline regression component of the predictor model.

Usage

f(x, xx = NULL, ...)
f(x, xx = NULL, ...)

Arguments

`x`	numeric vector containing $N$ sample values of the covariate chosen to enter the spline regression component of the predictor model.
`xx`	numeric vector containing $N$ sample values for the second covariate (in case `NGeDS` is run for two dimensions). It has to be either `NULL` (the default) or a vector of size $N$ , same as `x`.
`...`	further arguments. As GeDS currently allows for up to two covariates, specification of further arguments will return an error.

Note

This function is intended to be used only as part of the formula in a GeDS regression via NGeDS or GGeDS and not to be called in other cases by the user.

Examples

# Generate a data sample for the response variable Y and
# the covariates X, reg1, reg2 and off
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N ,min = -2, max = 2))
reg1 <- runif(500, min = -0.1, max = 0.1)
reg2 <- runif(500, min = -0.2, max = 0.2)
off <- runif(500, min = -1, max = 1)
# Specify a model for the mean of Y to include a component non linear
# in X defined by the function f_1 and a linear one in the other covariates
means <- f_1(X) + 2*reg1 + 0.5*reg2 + off
# Add Normal noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)

# Specify a formula that will be used to model Y as a
# function of X, reg1, reg2 and off.
# The covariate X is for the spline component modeled as GeDS,
# reg1 and reg2 enter linearly, off is an offset, i.e. no coefficient
# will be estimated for it
formula <- Y ~ f(X) + reg1 + reg2 + offset(off)

# Fit a GeDS model specified in formula using NGeDS
(Gmod <- NGeDS(formula, beta = 0.6, phi = 0.995, Xextr = c(-2,2)))


# Generate a data sample for the response variable Y and
# the covariates X, reg1, reg2 and off
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N ,min = -2, max = 2))
reg1 <- runif(500, min = -0.1, max = 0.1)
reg2 <- runif(500, min = -0.2, max = 0.2)
off <- runif(500, min = -1, max = 1)
# Specify a model for the mean of Y to include a component non linear
# in X defined by the function f_1 and a linear one in the other covariates
means <- f_1(X) + 2*reg1 + 0.5*reg2 + off
# Add Normal noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)

# Specify a formula that will be used to model Y as a
# function of X, reg1, reg2 and off.
# The covariate X is for the spline component modeled as GeDS,
# reg1 and reg2 enter linearly, off is an offset, i.e. no coefficient
# will be estimated for it
formula <- Y ~ f(X) + reg1 + reg2 + offset(off)

# Fit a GeDS model specified in formula using NGeDS
(Gmod <- NGeDS(formula, beta = 0.6, phi = 0.995, Xextr = c(-2,2)))

Formula for the predictor model

Description

A description of the structure of the predictor model fitted using NGeDS or GGeDS.

Usage

## S3 method for class 'GeDS'
formula(x, ...)
## S3 method for class 'GeDS'
formula(x, ...)

Arguments

`x`	fitted `GeDS-class` object, produced by `NGeDS` or `GGeDS`, from which the predictor model `formula` should be extracted.
`...`	unused in this case.

Details

In GeDS GNM (GLM) regression, implemented with NGeDS and GGeDS, the mean of the response variable, correspondingly transformed through an appropriate link function, is modeled using a potentially multivariate predictor model. The latter comprises two components: a GeD variable-knot spline regression involving up to two of the independent variables, and a parametric component for the remaining independent variables. The formula defines the structure of this potentially multivariate predictor.

The formulae that are input in NGeDS and GGeDS are similar to those input in lm or glm except that the function f should be specified in order to identify which of the covariates enter the GeD spline regression part of the predictor model. For example, if the predictor model is univariate and it links the transformed mean of y to x1, the predictor has only a GeD spline component and the formula should be in the form y ~ f(x1).

As noted, there may be additional independent variables, x2, x3, ... which may enter linearly into the parametric component of the predictor model and not be part of the GeD spline regression component. For example one may use the formula y ~ f(x1) + x2 + x3 which assumes a spline regression only between the transformed mean of y and x1, while x2 and x3 enter the predictor model linearly.

Both function NGeDS and function GGeDS, generate bivariate GeDS regression models. Therefore, if the functional dependence of the mean of the response variable y on x1 and x2 needs to be jointly modeled and there are no other covariates, the formula for the corresponding two dimensional predictor model should be specified as y ~ f(x1,x2).

Within the argument formula, similarly as in other R functions, it is possible to specify one or more offset variables, i.e. known terms with fixed regression coefficients equal to 1. These terms should be identified via the function offset.

GeDS Class

Description

A fitted GeDS object returned by the function NGeDS or GGeDS, inheriting the methods for class "GeDS". Methods for functions coef, knots, print, predict, plot, and lines are available.

Slots

Type: Character string indicating the type of regression performed. This can be "LM - Univ"/"LM - Biv" or "GLM - Univ"/"GLM - Biv", respectively corresponding to Normal univariate/bivariate GeDS (implemented by NGeDS), and to generalized (GNM-GLM) univariate/bivariate GeDS (implemented by GGeDS).
Linear.Knots: vector containing the locations of the knots of the second order GeD spline fit produced at stage A.
Quadratic.Knots: vector containing the locations of the knots of the third order GeD spline fit produced in stage B.
Cubic.knots: Vector containing the locations of the knots of the fourth order GeD spline fit produced in stage B.
Dev.Linear: deviance of the second order GeD spline fit, produced in stage A.
Dev.Quadratic: deviance of the third order GeD spline fit, produced in stage B.
Dev.Cubic: deviance of the fourth order GeD spline fit, produced in stage B.
RSS: vector containing the deviances of the second order spline fits computed at each stage A's GeDS iteration.
Linear: list containing the results from running SplineReg function to fit the second order spline fit of stage A.
Quadratic: list containing the results from running SplineReg function used to fit the third order spline fit in stage B.
Cubic: list containing the results from a SplineReg function used to fit the fourth order spline fit in stage B.
Stored: Matrix containing the knot locations estimated at each iteration of stage A.
Args: list containing the input arguments passed on the Fitters functions.
Call: call to the Fitters functions.
Nintknots: the final number of internal knots of the second order GeD spline fit produced in stage A.
iters: number of iterations performed during stage A of the GeDS fitting procedure.
Guesses: initial values for the coefficients used at each iteration of stage A in order to estimate the spline coefficients. Since the initial values are used only in the IRLS procedure, this slot is empty if the object is not created by GGeDS or GenUnivariateFitter functions.
Coefficients: matrix containing the fitted coefficients of the GeD spline regression component and the parametric component at each iteration of stage A.
deviance: vector containing the deviances of the second order spline fits computed at each IRLS iteration in stage A. Since the IRLS procedure is used only in GGeDS or GenUnivariateFitter, this slot is empty if the object is not created by one of these functions.
iterIrls: vector containing the numbers of IRLS iterations for all iterations of stage A cumulatively. Since the IRLS procedure is used only in GGeDS or GenUnivariateFitter, this slot is empty if the object is not created by one of these functions.
stopinfo: list of values providing information related to the stopping rule of stage A of GeDS. The sub-slots of stopinfo are phis, phis_star, oldintc and oldslp. The sub-slot phis is a vector containing the values of the ratios of deviances (or the difference of deviances if the LR stopping rule was chosen). The sub-slots phis_star, oldintc and oldslp are non-empty slots if the SR stopping rule was chosen. These respectively contain the values at each iteration of stage A of $\hat{\phi}_{\kappa}$ , $\hat{\gamma}_0$ and $\hat{\gamma}_1$ . See Dimitrova et al. (2023) for further details on these parameters.
Formula: the model formula.
extcall: call to the NGeDS or GGeDS functions.
terms: terms object containing information on the model frame.

References

GeDSboost Class

Description

A fitted GeDSboost object returned by the function NGeDSboost, inheriting the methods for class "GeDSboost". Methods for functions coef, knots, plot, print, predict, visualize_boosting, and bl_imp are available.

Slots

extcall

call to the NGeDSboost function.

formula

a formula object representing the model to be fitted.

args

a list containing the arguments passed to the NGeDSboost function. This list includes:

response: data.frame containing the response variable observations.
predictors: data.frame containing the observations corresponding to the predictor variables included in the model.
base_learners: description of model's base learners.
family: the statistical family; the possible options are
- mboost::Binomial(type = c("adaboost", "glm"), link = c("logit", "probit", "cloglog", "cauchit", "log"), ...)
- mboost::Gaussian()
- mboost::Poisson()
- mboost::GammaReg(nuirange = c(0, 100))
Other mboost families may be suitable; however, these have not yet been thoroughly tested and are therefore not recommended for use.
initial_learner: if TRUE a NGeDS or GGeDS fit was used as the initial learner; otherwise, the empirical risk minimizer corresponding to the selected family was employed.
int.knots_init: if initial_learner = TRUE, this corresponds to the maximum number of internal knots set in the NGeDS/GGeDS function before the initial learner fit.
shrinkage: shrinkage/step-length/learning rate utilized throughout the boosting iterations.
normalize_data: if TRUE, then response and predictors were standardized before running the FGB algorithm.
X_mean: mean of the predictor variables (only if normalize_data = TRUE, otherwise this is NULL).
X_sd: standard deviation of the predictors (only if normalize_data = TRUE, otherwise this is NULL).
Y_mean: mean of the response variable (only if normalize_data = TRUE, otherwise this is NULL).
Y_sd: standard deviation of the response variable (only if normalize_data = TRUE, otherwise this is NULL).

models

A list containing the 'model' generated at each boosting iteration. Each of these models includes:

best_bl: fit of the base learner that minimized the residual sum of squares (RSS) in fitting the gradient at the i-th boosting iteration.
Y_hat: model fitted values at the i-th boosting iteration.
base_learners: knots and polynomial coefficients for each of the base-learners at the i-th boosting iteration.

final_model

A list detailing the final GeDSboost model after the gradient descent algorithm is run:

model_name: the boosting iteration corresponding to the final model.
DEV: deviance of the final model.
Y_hat: fitted values.
base_learners: a list containing, for each base-learner, the intervals defined by the piecewise linear fit and its corresponding polynomial coefficients. It also includes the knots corresponding to each order fit, which result from computing the corresponding averaging knot location. See Kaishev et al. (2016) for details. If the number of internal knots of the final linear fit is less than $n-1$, the averaging knot location is not computed.
Linear.Fit/Quadratic.Fit/Cubic.Fit: final linear, quadratic and cubic fits in B-spline form. These include the same elements as Linear, Quadratic and Cubic in a GeDS-class object (see SplineReg for details).

predictions

a list containing the predicted values obtained for each of the fits (linear, quadratic and cubic).

internal_knots

a list detailing the internal knots obtained for each of the different order fits (linear, quadratic, and cubic).

References

GeDSgam Class

Description

A fitted GeDSgam object returned by the function NGeDSgam, inheriting the methods for class "GeDSgam". Methods for functions coef, knots, plot, print and predict are available.

Slots

extcall

call to the NGeDSgam function.

formula

a formula object representing the model to be fitted.

args

a list containing the arguments passed to the NGeDSgam function. This list includes:

response: data.frame containing the response variable observations.
predictors: data.frame containing the corresponding observations of the predictor variables included in the model.
base_learners: description of the model's base learners ('smooth functions').
family: the statistical family. The possible options are
- binomial(link = "logit", "probit", "cauchit", "log", "cloglog")
- gaussian(link = "identity", "log", "inverse")
- Gamma(link = "inverse", "identity", "log")
- inverse.gaussian(link = "1/mu^2", "inverse", "identity", "log")
- poisson(link = "log", "identity", "sqrt")
- quasi(link = "identity", variance = "constant")
- quasibinomial(link = "logit", "probit", "cloglog", "identity", "inverse", "log", "1/mu^2", "sqrt")
- quasipoisson(llink = "logit", "probit", "cloglog", "identity", "inverse", "log", "1/mu^2", "sqrt")
normalize_data: if TRUE, then response and predictors were standardized before running the local-scoring algorithm.
X_mean: mean of the predictor variables (only if normalize_data = TRUE).
X_sd: standard deviation of the predictors (only if normalize_data = TRUE, else is NULL).
Y_mean: mean of the response variable (only if normalize_data = TRUE, else is NULL).
Y_sd: standard deviation of the response variable (only if normalize_data = TRUE, else is NULL).

final_model

A list detailing the final GeDSgam model selected after running the local scoring algorithm. The chosen model minimizes deviance across all models generated by each local-scoring iteration. This list includes:

model_name: local-scoring iteration that yielded the "best" model. Note that when family = "gaussian", it will always correspond to iter1, as only one local-scoring iteration is conducted in this scenario. This occurs because, with family = "gaussian", the algorithm is tantamount to directly implementing backfitting.
DEV: the deviance for the fitted predictor model, defined as in Dimitrova et al. (2023), which for family = "gaussian" coincides with the Residual Sum of Squares.
Y_hat: fitted values.
- eta: additive predictor.
- mu: vector of means.
- z: adjusted dependent variable.
base_learners: a list containing, for each base-learner, the corresponding linear fit piecewise polynomial coefficients. It includes the knots for each order fit, resulting from computing the averaging knot location. Although if the number of internal knots of the final linear fit is less than $n-1$, the averaging knot location is not computed.
Linear.Fit: final model linear fit in B-spline form. See SplineReg for details.
Quadratic.Fit: quadratic fit obtained via Schoenberg variation diminishing spline approximation. See SplineReg for details.
Cubic.Fit: cubic fit obtained via Schoenberg variation diminishing spline approximation. See SplineReg for details.

predictions

A list containing the predicted values obtained for each of the fits (linear, quadratic, and cubic). Each of the predictions contains both the additive predictor eta and the vector of means mu.

internal_knots

A list detailing the internal knots obtained for the fits of different order (linear, quadratic, and cubic).

References

Generalized Geometrically Designed Spline regression estimation

Description

GGeDS constructs a Geometrically Designed (univariate or bivariate) variable knots spline regression model for the predictor in the context of Generalized (Non-)Linear Models. This is referred to as a GeDS model for a response with a distribution from the Exponential Family.

Usage

GGeDS(
  formula,
  family = gaussian(),
  data,
  weights,
  beta,
  phi = 0.99,
  min.intknots,
  max.intknots,
  q = 2L,
  Xextr = NULL,
  Yextr = NULL,
  show.iters = FALSE,
  stoptype = "SR",
  higher_order = TRUE
)
GGeDS(
  formula,
  family = gaussian(),
  data,
  weights,
  beta,
  phi = 0.99,
  min.intknots,
  max.intknots,
  q = 2L,
  Xextr = NULL,
  Yextr = NULL,
  show.iters = FALSE,
  stoptype = "SR",
  higher_order = TRUE
)

Arguments

`formula`	a description of the structure of the predictor model to be fitted, including the dependent and independent variables. See `formula` for details.
`family`	a description of the error distribution and link function to be used in the model. This can be a character string naming a family function (e.g. `"gaussian"`), the family function itself (e.g. `gaussian`) or the result of a call to a family function (e.g. `gaussian()`). See family for details on family functions.
`data`	an optional data frame, list or environment containing the variables of the predictor model. If the formula variables are not found in `data`, they are taken from `environment(formula)`, typically the environment from which `GGeDS` is called.
`weights`	an optional vector of ‘prior weights’ to be put on the observations during the fitting process in case the user requires weighted GeDS fitting. It is `NULL` by default.
`beta`	numeric parameter in the interval $[0,1]$ tuning the knot placement in stage A of GeDS. See details below.
`phi`	numeric parameter in the interval $[0,1]$ specifying the threshold for the stopping rule (model selector) in stage A of GeDS. See also `stoptype` and details below.
`min.intknots`	optional parameter allowing the user to set a minimum number of internal knots to be fit in stage A. By default equal to zero.
`max.intknots`	optional parameter allowing the user to set a maximum number of internal knots to be added by the stage A's GeDS estimation algorithm. By default equal to the number of knots for the saturated GeDS model (i.e. $N-2$ , where $N$ is the number of observations).
`q`	numeric parameter which allows to fine-tune the stopping rule of stage A of GeDS, by default equal to 2. See details below.
`Xextr`	numeric vector of 2 elements representing the left-most and right-most limits of the interval embedding the observations of the independent variable. See details.
`Yextr`	numeric vector of 2 elements representing the left-most and right-most limits of the interval embedding the observations of the second independent variable (if bivariate GeDS is run). See details.
`show.iters`	logical variable indicating whether or not to print information of the fit at each GeDS iteration. By default equal to `FALSE`.
`stoptype`	a character string indicating the type of GeDS stopping rule to be used. It should be either one of `"SR"`, `"RD"` or `"LR"`, partial match allowed. See details below.
`higher_order`	a logical that defines whether to compute the higher order fits (quadratic and cubic) after stage A is run. Default is `TRUE`.

Details

The GGeDS function extends the GeDS methodology, developed by Kaishev et al. (2016) and implemented in the NGeDS function for the Normal case, to the more general GNM (GLM) context, allowing for the response to have any distribution from the Exponential Family. Under the GeDS-GNM approach the (non-)linear predictor is viewed as a spline with variable knots that are estimated along with the regression coefficients and the order of the spline, using a two stage procedure. In stage A, a linear variable-knot spline is fitted to the data applying iteratively re-weighted least squares (see IRLSfit function). In stage B, a Schoenberg variation diminishing spline approximation to the fit from stage A is constructed, thus simultaneously producing spline fits of order 2, 3 and 4, all of which are included in the output, a GeDS-Class object. A detailed description of the underlying algorithm can be found in Dimitrova et al. (2023).

As noted in formula, the argument formula allows the user to specify predictor models with two components: a spline regression (non-parametric) component involving part of the independent variables identified through the function f, and an optional parametric component involving the remaining independent variables. For GGeDS only one or two independent variables are allowed for the spline component and arbitrary many independent variables for the parametric component of the predictor. Failure to specify the independent variable for the spline regression component through the function f will return an error. See formula.

The parameter beta tunes the placement of a new knot in stage A of the algorithm. At the beginning of each GeDS iteration, a second-order spline is fitted to the data. As follows, the 'working' residuals (see IRLSfit) are computed and grouped by their sign. A new knot is then placed at a location defined by the cluster that maximizes a certain measure. This measure is defined as a weighted linear combination of the range of the independent variable at each cluster and the mean of the absolute residuals within it. The parameter beta determines the weights in this measure correspondingly: beta and 1 - beta. The higher beta is, the more weight is put to the mean of the residuals and the less to the range of their corresponding x-values (see Kaishev et al., 2016, for further details).

The default values of beta are beta = 0.5 if the response is assumed to be Gaussian, beta = 0.2 if it is Poisson (or Quasipoisson), while if it is Binomial, Quasibinomial or Gamma beta = 0.1, which reflect our experience of running GeDS for different underlying functional dependencies.

The argument stoptype allows to choose between three alternative stopping rules for the knot selection in stage A of GeDS: "RD", that stands for Ratio of Deviances; "SR", that stands for Smoothed Ratio of deviances; and "LR", that stands for Likelihood Ratio. The latter is based on the difference of deviances rather than on their ratio as in the case of "RD" and "SR". Therefore "LR" can be viewed as a log likelihood ratio test performed at each iteration of the knot placement. In each of these cases the corresponding stopping criterion is compared with a threshold value phi (see below).

The argument phi provides a threshold value required for the stopping rule to exit the knot placement in stage A of GeDS. The higher the value of phi, the more knots are added under the "RD" and "SR" stopping rules contrary to the case of the stopping rule "LR" where the lower phi is, more knots are included in the spline regression. Further details for each of the three alternative stopping rules can be found in Dimitrova et al. (2023).

The argument q is an input parameter that fine-tunes the stopping rule in stage A. It specifies the number of consecutive iterations over which the deviance must exhibit stable convergence to terminate knot placement in stage A. Specifically, under any of the rules "RD", "SR" or "LR" the deviance at the current iteration is compared to the deviance computed q iterations before, i.e. before introducing the last q knots.

Value

A GeDS-Class object, i.e. a list of items that summarizes the main details of the fitted GeDS regression. See GeDS-Class for details. Some S3 methods are available in order to make these objects tractable, such as coef, deviance, knots, predict and print as well as S4 methods for lines and plot.

References

Examples

######################################################################
# Generate a data sample for the response variable Y and the covariate X
# assuming Poisson distributed error and log link function
# See section 4.1 in Dimitrova et al. (2023)
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- exp(f_1(X))

#############
## POISSON ##
#############
# Generate Poisson distributed Y according to the mean model
Y <- rpois(N, means)

# Fit a Poisson GeDS regression using GGeDS
(Gmod <- GGeDS(Y ~ f(X), beta = 0.2, phi = 0.99, q = 2, family = poisson(),
               Xextr = c(-2,2)))
# Plot the quadratic and cubic GeDS fits
plot(X, log(Y), xlab = "x", ylab = expression(f[1](x)))
lines(X, sapply(X, f_1), lwd = 2)
lines(Gmod, n = 3, col = "red")
lines(Gmod, n = 4, col = "blue", lty = 2)
legend("topleft",
       legend = expression(f[1](x), "Quadratic", "Cubic"),
       col = c("black", "red", "blue"),
       lty = c(1, 1, 2),
       lwd = c(2, 1, 1),
       bty = "n")

# Generate GeDS prediction at X=0, first on the response scale and then on
# the predictor scale
predict(Gmod, n = 3, newdata = data.frame(X = 0))
predict(Gmod, n = 3, newdata = data.frame(X = 0), type = "link")

# Apply some of the other available methods, e.g.
# knots, coefficients and deviance extractions for the
# quadratic GeDS fit
knots(Gmod)
coef(Gmod)
deviance(Gmod)

# the same but for the cubic GeDS fit
knots(Gmod, n = 4)
coef(Gmod, n = 4)
deviance(Gmod, n = 4)

###########
## GAMMA ##
###########
# Generate Gamma distributed Y according to the mean model
Y <- rgamma(N, shape = means, rate = 0.1)
# Fit a Gamma GeDS regression using GGeDS
Gmod <- GGeDS(Y ~ f(X), beta = 0.1, phi = 0.99, q = 2, family = Gamma(log),
              Xextr = c(-2,2))
plot(Gmod, f = function(x) exp(f_1(x))/0.1)

##############
## BINOMIAL ##
##############
# Generate Binomial distributed Y according to the mean model
eta <- f_1(X) - 4
means <- exp(eta)/(1+exp(eta))
Y <- rbinom(N, size = 50, prob = means) / 50
# Fit a Binomial GeDS regression using GGeDS
Gmod <- GGeDS(Y ~ f(X), beta = 0.1, phi = 0.99, family = "quasibinomial",
              Xextr = c(-2,2))
plot(Gmod, f = function(x) exp(f_1(x) - 4)/(1 + exp(f_1(x) - 4)))


##########################################
# A real data example
# See Dimitrova et al. (2023), Section 4.2

data("coalMining")
(Gmod2 <- GGeDS(formula = accidents ~ f(years), beta = 0.1, phi = 0.98,
                 family = poisson(), data = coalMining))
(Gmod3 <- GGeDS(formula = accidents ~ f(years), beta = 0.1, phi = 0.985,
                 family = poisson(), data = coalMining))
plot(coalMining$years, coalMining$accidents, type = "h", xlab = "Years",
     ylab = "Accidents")
lines(Gmod2, tr = exp, n = 4, col = "red")
lines(Gmod3, tr = exp, n = 4, col = "blue", lty = 2)
legend("topright", c("phi = 0.98","phi = 0.985"), col = c("red", "blue"),
       lty=c(1, 2))


## Not run: 
##########################################
# The same regression in the example of GeDS
# but assuming Gamma and Poisson responses
# See Dimitrova et al. (2023), Section 4.2

data('BaFe2As2')
(Gmod4 <- GGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.6, phi = 0.995, q = 3,
                family = Gamma(log), stoptype = "RD"))
plot(Gmod4)

(Gmod5 <- GGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.1, phi = 0.995, q = 3,
                family = poisson(), stoptype = "SR"))
plot(Gmod5)

## End(Not run)

##########################################
# Life tables
# See Dimitrova et al. (2023), Section 4.2

data(EWmortality)
attach(EWmortality)
(M1 <- GGeDS(formula = Deaths ~ f(Age) + offset(log(Exposure)),
              family = quasipoisson(), phi = 0.99, beta = 0.1, q = 3,
              stoptype = "LR"))

Exposure_init <- Exposure + 0.5 * Deaths
Rate <- Deaths / Exposure_init
(M2 <- GGeDS(formula = Rate ~ f(Age), weights = Exposure_init,
              family = quasibinomial(), phi = 0.99, beta = 0.1,
              q = 3, stoptype = "LR"))


op <- par(mfrow=c(2,2))
plot(Age, Deaths/Exposure, ylab = expression(mu[x]), xlab = "Age")
lines(M1, n = 3, tr = exp, lwd = 1, col = "red")
plot(Age, Rate, ylab = expression(q[x]), xlab = "Age")
lines(M2, n = 3, tr = quasibinomial()$linkinv, lwd = 1, col = "red")
plot(Age, log(Deaths/Exposure), ylab = expression(log(mu[x])), xlab = "Age")
lines(M1, n = 3, lwd = 1, col = "red")
plot(Age, quasibinomial()$linkfun(Rate), ylab = expression(logit(q[x])), xlab = "Age")
lines(M2, n = 3, lwd = 1, col = "red")
par(op)

#########################################
# bivariate example
set.seed(123)
doublesin <- function(x) {
# Adjusting the output to ensure it's positive
exp(sin(2*x[,1]) + sin(2*x[,2]))
}
X <- round(runif(400, min = 0, max = 3), 2)
Y <- round(runif(400, min = 0, max = 3), 2)
# Calculate lambda for Poisson distribution
lambda <- doublesin(cbind(X,Y))
# Generate Z from Poisson distribution
Z <- rpois(400, lambda)
data <- data.frame(X, Y, Z)

# Fit a Poisson GeDS regression using GGeDS
BivGeDS <- GGeDS(Z ~ f(X,Y), beta = 0.2, phi = 0.99, family = "poisson")

# Poisson mean deviance w.r.t data
deviance(BivGeDS, n = 2) # or sum(poisson()$dev.resids(Z, BivGeDS$Linear.Fit$Predicted, wt = 1))
deviance(BivGeDS, n = 3)
deviance(BivGeDS, n = 4)

# Poisson mean deviance w.r.t true function#' 
f_XY <- apply(cbind(X, Y), 1, function(row) doublesin(matrix(row, ncol = 2)))
mean(poisson()$dev.resids(f_XY, BivGeDS$Linear.Fit$Predicted, wt = 1))
mean(poisson()$dev.resids(f_XY, BivGeDS$Quadratic.Fit$Predicted, wt = 1))
mean(poisson()$dev.resids(f_XY, BivGeDS$Cubic.Fit$Predicted, wt = 1))

# Surface plot of the generating function (doublesin)
plot(BivGeDS, f = doublesin)
# Surface plot of the fitted model
plot(BivGeDS)

######################################################################
# Generate a data sample for the response variable Y and the covariate X
# assuming Poisson distributed error and log link function
# See section 4.1 in Dimitrova et al. (2023)
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- exp(f_1(X))

#############
## POISSON ##
#############
# Generate Poisson distributed Y according to the mean model
Y <- rpois(N, means)

# Fit a Poisson GeDS regression using GGeDS
(Gmod <- GGeDS(Y ~ f(X), beta = 0.2, phi = 0.99, q = 2, family = poisson(),
               Xextr = c(-2,2)))
# Plot the quadratic and cubic GeDS fits
plot(X, log(Y), xlab = "x", ylab = expression(f[1](x)))
lines(X, sapply(X, f_1), lwd = 2)
lines(Gmod, n = 3, col = "red")
lines(Gmod, n = 4, col = "blue", lty = 2)
legend("topleft",
       legend = expression(f[1](x), "Quadratic", "Cubic"),
       col = c("black", "red", "blue"),
       lty = c(1, 1, 2),
       lwd = c(2, 1, 1),
       bty = "n")

# Generate GeDS prediction at X=0, first on the response scale and then on
# the predictor scale
predict(Gmod, n = 3, newdata = data.frame(X = 0))
predict(Gmod, n = 3, newdata = data.frame(X = 0), type = "link")

# Apply some of the other available methods, e.g.
# knots, coefficients and deviance extractions for the
# quadratic GeDS fit
knots(Gmod)
coef(Gmod)
deviance(Gmod)

# the same but for the cubic GeDS fit
knots(Gmod, n = 4)
coef(Gmod, n = 4)
deviance(Gmod, n = 4)

###########
## GAMMA ##
###########
# Generate Gamma distributed Y according to the mean model
Y <- rgamma(N, shape = means, rate = 0.1)
# Fit a Gamma GeDS regression using GGeDS
Gmod <- GGeDS(Y ~ f(X), beta = 0.1, phi = 0.99, q = 2, family = Gamma(log),
              Xextr = c(-2,2))
plot(Gmod, f = function(x) exp(f_1(x))/0.1)

##############
## BINOMIAL ##
##############
# Generate Binomial distributed Y according to the mean model
eta <- f_1(X) - 4
means <- exp(eta)/(1+exp(eta))
Y <- rbinom(N, size = 50, prob = means) / 50
# Fit a Binomial GeDS regression using GGeDS
Gmod <- GGeDS(Y ~ f(X), beta = 0.1, phi = 0.99, family = "quasibinomial",
              Xextr = c(-2,2))
plot(Gmod, f = function(x) exp(f_1(x) - 4)/(1 + exp(f_1(x) - 4)))


##########################################
# A real data example
# See Dimitrova et al. (2023), Section 4.2

data("coalMining")
(Gmod2 <- GGeDS(formula = accidents ~ f(years), beta = 0.1, phi = 0.98,
                 family = poisson(), data = coalMining))
(Gmod3 <- GGeDS(formula = accidents ~ f(years), beta = 0.1, phi = 0.985,
                 family = poisson(), data = coalMining))
plot(coalMining$years, coalMining$accidents, type = "h", xlab = "Years",
     ylab = "Accidents")
lines(Gmod2, tr = exp, n = 4, col = "red")
lines(Gmod3, tr = exp, n = 4, col = "blue", lty = 2)
legend("topright", c("phi = 0.98","phi = 0.985"), col = c("red", "blue"),
       lty=c(1, 2))


## Not run: 
##########################################
# The same regression in the example of GeDS
# but assuming Gamma and Poisson responses
# See Dimitrova et al. (2023), Section 4.2

data('BaFe2As2')
(Gmod4 <- GGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.6, phi = 0.995, q = 3,
                family = Gamma(log), stoptype = "RD"))
plot(Gmod4)

(Gmod5 <- GGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.1, phi = 0.995, q = 3,
                family = poisson(), stoptype = "SR"))
plot(Gmod5)

## End(Not run)

##########################################
# Life tables
# See Dimitrova et al. (2023), Section 4.2

data(EWmortality)
attach(EWmortality)
(M1 <- GGeDS(formula = Deaths ~ f(Age) + offset(log(Exposure)),
              family = quasipoisson(), phi = 0.99, beta = 0.1, q = 3,
              stoptype = "LR"))

Exposure_init <- Exposure + 0.5 * Deaths
Rate <- Deaths / Exposure_init
(M2 <- GGeDS(formula = Rate ~ f(Age), weights = Exposure_init,
              family = quasibinomial(), phi = 0.99, beta = 0.1,
              q = 3, stoptype = "LR"))


op <- par(mfrow=c(2,2))
plot(Age, Deaths/Exposure, ylab = expression(mu[x]), xlab = "Age")
lines(M1, n = 3, tr = exp, lwd = 1, col = "red")
plot(Age, Rate, ylab = expression(q[x]), xlab = "Age")
lines(M2, n = 3, tr = quasibinomial()$linkinv, lwd = 1, col = "red")
plot(Age, log(Deaths/Exposure), ylab = expression(log(mu[x])), xlab = "Age")
lines(M1, n = 3, lwd = 1, col = "red")
plot(Age, quasibinomial()$linkfun(Rate), ylab = expression(logit(q[x])), xlab = "Age")
lines(M2, n = 3, lwd = 1, col = "red")
par(op)

#########################################
# bivariate example
set.seed(123)
doublesin <- function(x) {
# Adjusting the output to ensure it's positive
exp(sin(2*x[,1]) + sin(2*x[,2]))
}
X <- round(runif(400, min = 0, max = 3), 2)
Y <- round(runif(400, min = 0, max = 3), 2)
# Calculate lambda for Poisson distribution
lambda <- doublesin(cbind(X,Y))
# Generate Z from Poisson distribution
Z <- rpois(400, lambda)
data <- data.frame(X, Y, Z)

# Fit a Poisson GeDS regression using GGeDS
BivGeDS <- GGeDS(Z ~ f(X,Y), beta = 0.2, phi = 0.99, family = "poisson")

# Poisson mean deviance w.r.t data
deviance(BivGeDS, n = 2) # or sum(poisson()$dev.resids(Z, BivGeDS$Linear.Fit$Predicted, wt = 1))
deviance(BivGeDS, n = 3)
deviance(BivGeDS, n = 4)

# Poisson mean deviance w.r.t true function#' 
f_XY <- apply(cbind(X, Y), 1, function(row) doublesin(matrix(row, ncol = 2)))
mean(poisson()$dev.resids(f_XY, BivGeDS$Linear.Fit$Predicted, wt = 1))
mean(poisson()$dev.resids(f_XY, BivGeDS$Quadratic.Fit$Predicted, wt = 1))
mean(poisson()$dev.resids(f_XY, BivGeDS$Cubic.Fit$Predicted, wt = 1))

# Surface plot of the generating function (doublesin)
plot(BivGeDS, f = doublesin)
# Surface plot of the fitted model
plot(BivGeDS)

Defined integral of GeDS objects

Description

This function computes defined integrals of a fitted GeDS regression model.

Usage

Integrate(object = NULL, knots = NULL, coef = NULL, from, to, n = 3L)
Integrate(object = NULL, knots = NULL, coef = NULL, from, to, n = 3L)

Arguments

`object`	the `GeDS-class` object containing the GeDS fit which should be integrated. It should be the result of fitting a univariate GeDS regression via `NGeDS` or `GGeDS`. If this is provided, the `knots` and `coef` parameters will be automatically extracted from the `GeDS` object. If `object` is `NULL`, the user must provide the `knots` and `coef` vectors explicitly.
`knots`	a numeric vector of knots. This is required if `object` is `NULL`. If a `GeDS` object is provided, this parameter is ignored.
`coef`	a numeric vector of coefficients. This is required if `object` is `NULL`. If a `GeDS` object is provided, this parameter is ignored
`from`	optional numeric vector containing the lower limit(s) of integration. It should be either of size one or of the same size as the argument `to`. If left unspecified, by default it is set to the left-most limit of the interval embedding the observations of the independent variable.
`to`	numeric vector containing the upper limit(s) of integration.
`n`	integer value (2, 3 or 4) specifying the order ( $=$ degree $+ 1$ ) of the GeDS fit to be integrated. By default equal to `3L`. Non-integer values will be passed to the function `as.integer`.

Details

The function is based on the well known property (c.f. De Boor, 2001, Chapter X, formula (33)) that the integral of a linear combination of appropriately normalized B-splines is equal to the sum of its corresponding coefficients, noting that the GeDS regression is in fact such a linear combination.

Since the function is based on this property, it is designed to work only on the predictor scale in the GNM (GLM) framework.

If the argument from is a single value, then it is taken as the lower limit of integration for all the defined integrals required, whereas the upper limits of integration are the values contained in the argument to. If the arguments from and to are of similar size, the integrals (as many as the size) are computed by sequentially taking the pairs of values in the from and to vectors as limits of integration.

References

De Boor, C. (2001). A Practical Guide to Splines (Revised Edition). Springer, New York.

Examples


# Generate a data sample for the response variable
# Y and the single covariate X
# see Dimitrova et al. (2023), section 4.1
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only
# a component non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)
# Fit GeDS regression using NGeDS
Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = .995, Xextr = c(-2,2))
# Compute defined integrals (in TeX style) $\int_{1}^{-1} f(x)dx$
# and $\int_{1}^{1} f(x)dx$
# $f$ being the quadratic fit
Integrate(Gmod, from = 1, to = c(-1,1), n = 3)
# Compute defined integrals (in TeX style) $\int_{1}^{-1} f(x)dx$
# and $\int_{-1}^{1} f(x)dx$
# $f$ being the quadratic fit
Integrate(Gmod, from = c(1,-1), to = c(-1,1), n = 3)
## Not run:
## This gives an error
Integrate(Gmod, from = c(1,-1), to = c(1,1), n = 3)

# Generate a data sample for the response variable
# Y and the single covariate X
# see Dimitrova et al. (2023), section 4.1
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only
# a component non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)
# Fit GeDS regression using NGeDS
Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = .995, Xextr = c(-2,2))
# Compute defined integrals (in TeX style) $\int_{1}^{-1} f(x)dx$
# and $\int_{1}^{1} f(x)dx$
# $f$ being the quadratic fit
Integrate(Gmod, from = 1, to = c(-1,1), n = 3)
# Compute defined integrals (in TeX style) $\int_{1}^{-1} f(x)dx$
# and $\int_{-1}^{1} f(x)dx$
# $f$ being the quadratic fit
Integrate(Gmod, from = c(1,-1), to = c(-1,1), n = 3)
## Not run:
## This gives an error
Integrate(Gmod, from = c(1,-1), to = c(1,1), n = 3)

IRLS Estimation

Description

This function is an implementation of the IRLS estimation algorithm adjusted to the specific usage within the function SplineReg_GLM.

Usage

IRLSfit(
  x,
  y,
  weights = rep(1, nobs),
  mustart = NULL,
  offset = rep(0, nobs),
  family = gaussian(),
  control = list()
)
IRLSfit(
  x,
  y,
  weights = rep(1, nobs),
  mustart = NULL,
  offset = rep(0, nobs),
  family = gaussian(),
  control = list()
)

Arguments

`x`	a matrix of regression functions (e.g. B-splines and/or terms of the parametric part) evaluated at the sample values of the covariate(s).
`y`	a vector of size $N$ containing the observed values of the response variable $y$ .
`weights`	an optional vector of prior weights for the observations, used when weighted IRLS fitting is required. By default, this is a vector of 1s.
`mustart`	initial values for the vector of means of the response variable in the IRLS regression estimation. Must be a vector of length $N$ .
`offset`	a vector of size $N$ that can be used to specify a fixed covariate to be included in the predictor model avoiding the estimation of its corresponding regression coefficient. In the case that more than one covariate is fixed, the user should sum the corresponding coordinates of the fixed covariates to produce one common $N$ -vector of coordinates.
`family`	a description of the error distribution and link function to be used in the model. This can be a character string naming a family function (e.g. `"gaussian"`), the family function itself (e.g. `gaussian`) or the result of a call to a family function (e.g. `gaussian()`). See family for details on family functions.
`control`	a list of parameters for controlling the IRLS fitting process to be passed on to `glm.control`. See `glm.fit` for further details.

Details

This function is a slightly modified version of the glm.fit from the package stats to which we refer for further details. The difference in the inputs of IRLSfit and glm.fit is that the former admits initial values only for the vector of means.

The output from IRLSfit has some additional slots compared to glm.fit. We note that the slots weights, res2 and z contain values of the IRLS weights, “working residuals" and transformed responses computed after the last IRLS iteration, i.e. they are based on the estimated coefficients that are returned by IRLSfit.

The source code of IRLSfit contains also some commented lines that produce useful plots at each IRLS iteration. Normally, printing these plots is time consuming, but they could be run for inspection purposes.

Value

A list containing:

`coefficients`	a named vector containing the estimated regression coefficients;
`residuals`	the working residuals, which are the residuals from the final iteration of the IRLS fit. Cases with zero weights are omitted, and their working residuals are `NA`;
`res2`	the working residuals after the final IRLS iteration. They are used within the knot placement steps of stage A of GeDS;
`fitted.values`	the fitted mean values, obtained by transforming the predictor by the inverse of the link function;
`rank`	the numeric rank of the fitted linear model;
`family`	the `family` object used;
`linear.predictors`	the fitted predictor;
`deviance`	a vector containing the deviances obtained at each IRLS iteration;
`lastdeviance`	the deviance at the last IRLS iteration;
`null.deviance`	The deviance for the null model (see `glm` documentation);
`iter`	the number of IRLS iterations performed;
`weights`	the working weights after the last IRLS iteration;
`prior.weights`	the “prior weights" (see the `weights` argument);
`df.residual`	the residual degrees of freedom;
`df.null`	the residual degrees of freedom for the null model;
`y`	the vector of values of the response variable used in the fitting;
`z`	the transformed responses computed after the last IRLS iteration;
`converged`	logical. Was the IRLS algorithm judged to have converged?
`boundary`	logical. Is the fitted value on the boundary of the attainable values?

In addition, non-empty fits will have components qr, R and effects relating to the final weighted linear fit, see lm.fit documentation.

Knots method for GeDS, GeDSboost, GeDSgam

Description

Method for the generic function knots that allows the user to extract the vector of knots of a GeDS, GeDSboost or GeDSgam fit of a specified order contained in a GeDS-class, GeDSboost-class or GeDSgam-class object, respectively.

Usage

## S3 method for class 'GeDS'
knots(Fn, n = 3L, options = c("all", "internal"), ...)

## S3 method for class 'GeDSboost'
knots(Fn, n = 3L, options = c("all", "internal"), ...)

## S3 method for class 'GeDSgam'
knots(Fn, n = 3L, options = c("all", "internal"), ...)
## S3 method for class 'GeDS'
knots(Fn, n = 3L, options = c("all", "internal"), ...)

## S3 method for class 'GeDSboost'
knots(Fn, n = 3L, options = c("all", "internal"), ...)

## S3 method for class 'GeDSgam'
knots(Fn, n = 3L, options = c("all", "internal"), ...)

Arguments

`Fn`	the `GeDS-class`, `GeDSboost-class` or `GeDSgam-class` object from which the vector of knots for the specified GeDS, FGB-GeDS or GAM-GeDS fit should be extracted.
`n`	integer value (2, 3 or 4) specifying the order ( $=$ degree $+ 1$ ) of the GeDS, FGB-GeDS or GAM-GeDS fit whose knots should be extracted. By default equal to `3L`. Non-integer values will be passed to the function `as.integer`.
`options`	a character string specifying whether "`all`" knots, including the left-most and the right-most limits of the interval embedding the observations (the default) or only the "`internal`" knots should be extracted.
`...`	potentially further arguments (required for compatibility with the definition of the generic function). Currently ignored, but with a warning.

Details

This is a method for the function knots in the stats package.

As GeDS-class, GeDSboost-class and GeDSgam-class objects contain three different fits (linear, quadratic and cubic), it is possible to specify the order of the GeDS fit whose knots are required via the input argument n.

Value

A vector in which each element represents a knot of the GeDS/FGB-GeDS/GAM-GeDS fit of the required order.

Lines method for GeDS objects.

Description

Lines method for GeDS objects. Adds a GeDS curve to an existing plot.

Usage

## S4 method for signature 'GeDS'
lines(
  x,
  n = 3L,
  transform = function(x) x,
  onlySpline = TRUE,
  data = data.frame(),
  ...
)
## S4 method for signature 'GeDS'
lines(
  x,
  n = 3L,
  transform = function(x) x,
  onlySpline = TRUE,
  data = data.frame(),
  ...
)

Arguments

`x`	a `GeDS-Class` object from which the GeDS fit should be extracted.
`n`	integer value (2, 3 or 4) specifying the order ( $=$ degree $+ 1$ ) of the GeDS fit that should be plotted. By default equal to `3L`. Non-integer values will be passed to the function `as.integer`.
`transform`	a function that can be used to transform the scale of the Y axis. Typically it can be the inverse of the link function if the plot is on the scale of the response variable.
`onlySpline`	logical variable specifying whether only the spline component of the fitted GeDS predictor model should be plotted or alternatively also the parametric component (see `formula`) should be plotted.
`data`	an optional `data.frame`, `list` or `environment` containing values of the independent variables for which the GeDS predicted values should be plotted. If left empty the values are extracted from the object `x` itself.
`...`	further arguments to be passed to the default `lines` function.

Details

This method can be used to add a curve corresponding to a particular GeDS fit to an active plot.

As GeDS objects contain three different fits (linear, quadratic and cubic), it is possible to specify the order of the GeDS regression to be plotted via the input argument n.

Examples


# Generate a data sample for the response variable
# Y and the single covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)

# Fit a GeDS regression model using NGeDS
(Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2)))

# Plot the GeDS third order fit (the quadratic one)
# without its corresponding Polygon
plot(Gmod, type = "none")

# Add a curve corresponding to the second order fit (the linear one)
lines(Gmod, n = 2, col = "green", lwd = 2, lty = 3)

# Generate a data sample for the response variable
# Y and the single covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)

# Fit a GeDS regression model using NGeDS
(Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2)))

# Plot the GeDS third order fit (the quadratic one)
# without its corresponding Polygon
plot(Gmod, type = "none")

# Add a curve corresponding to the second order fit (the linear one)
lines(Gmod, n = 2, col = "green", lwd = 2, lty = 3)

Geometrically Designed Spline regression estimation

Description

NGeDS constructs a Geometrically Designed variable knots spline regression model referred to as a GeDS model, for a response having a Normal distribution.

Usage

NGeDS(
  formula,
  data,
  weights,
  beta = 0.5,
  phi = 0.99,
  min.intknots = 0,
  max.intknots = 500,
  q = 2,
  Xextr = NULL,
  Yextr = NULL,
  show.iters = FALSE,
  stoptype = "RD",
  higher_order = TRUE,
  intknots_init = NULL,
  fit_init = NULL,
  only_pred = FALSE
)
NGeDS(
  formula,
  data,
  weights,
  beta = 0.5,
  phi = 0.99,
  min.intknots = 0,
  max.intknots = 500,
  q = 2,
  Xextr = NULL,
  Yextr = NULL,
  show.iters = FALSE,
  stoptype = "RD",
  higher_order = TRUE,
  intknots_init = NULL,
  fit_init = NULL,
  only_pred = FALSE
)

Arguments

`formula`	a description of the structure of the model to be fitted, including the dependent and independent variables. See `formula` for details.
`data`	an optional data frame, list or environment containing the variables of the model. If not found in `data`, the variables are taken from `environment(formula)`, typically the environment from which `NGeDS` is called.
`weights`	an optional vector of ‘prior weights’ to be put on the observations in the fitting process in case the user requires weighted GeDS fitting. It should be `NULL` or a numeric vector of the same length as the response variable in the argument `formula`.
`beta`	numeric parameter in the interval $[0,1]$ tuning the knot placement in stage A of GeDS. See details.
`phi`	numeric parameter in the interval $[0,1]$ specifying the threshold for the stopping rule (model selector) in stage A of GeDS. See also `stoptype` and details below.
`min.intknots`	optional parameter allowing the user to set a minimum number of internal knots required. By default equal to zero.
`max.intknots`	optional parameter allowing the user to set a maximum number of internal knots to be added by the GeDS estimation algorithm. By default equal to the number of knots for the saturated GeDS model.
`q`	numeric parameter which allows to fine-tune the stopping rule of stage A of GeDS, by default equal to 2. See details.
`Xextr`	numeric vector of 2 elements representing the left-most and right-most limits of the interval embedding the observations of the first independent variable. See details.
`Yextr`	numeric vector of 2 elements representing the left-most and right-most limits of the interval embedding the observations of the second independent variable (if the bivariate GeDS is run). See details.
`show.iters`	logical variable indicating whether or not to print information at each step.
`stoptype`	a character string indicating the type of GeDS stopping rule to be used. It should be either one of `"SR"`, `"RD"` or `"LR"`, partial match allowed. See details.
`higher_order`	a logical that defines whether to compute the higher order fits (quadratic and cubic) after stage A is run. Default is `TRUE`.
`intknots_init`	vector of starting internal knots. Default is `NULL`.
`fit_init`	A list containing fitted values `pred`, along with corresponding `intknots` and `coef`, representing the initial fit from which to begin Stage A GeDS iteration (i.e. departing from step 2).
`only_pred`	logical, if `TRUE` only predictions are computed.

Details

The NGeDS function implements the GeDS methodology, recently developed by Kaishev et al. (2016) and extended in the GGeDS function for the more general GNM, (GLM) context, allowing for the response to have any distribution from the Exponential Family. Under the GeDS approach the (non-)linear predictor is viewed as a spline with variable knots which are estimated along with the regression coefficients and the order of the spline, using a two stage algorithm. In stage A, a linear variable-knot spline is fitted to the data applying iteratively least squares regression (see lm function). In stage B, a Schoenberg variation diminishing spline approximation to the fit from stage A is constructed, thus simultaneously producing spline fits of order 2, 3 and 4, all of which are included in the output, a GeDS-Class object.

As noted in formula, the argument formula allows the user to specify models with two components, a spline regression (non-parametric) component involving part of the independent variables identified through the function f and an optional parametric component involving the remaining independent variables. For NGeDS one or two independent variables are allowed for the spline component and arbitrary many independent variables for the parametric component. Failure to specify the independent variable for the spline regression component through the function f will return an error. See formula.

The parameter beta tunes the placement of a new knot in stage A of the algorithm. Once a current second-order spline is fitted to the data the regression residuals are computed and grouped by their sign. A new knot is placed at a location defined by the group for which a certain measure attains its maximum. The latter measure is defined as a weighted linear combination of the range of each group and the mean of the absolute residuals within it. The parameter beta determines the weights in this measure correspondingly as beta and 1 - beta. The higher it is, the more weight is put to the mean of the residuals and the less to the range of their corresponding x-values. The default value of beta is 0.5.

The argument stoptype allows to choose between three alternative stopping rules for the knot selection in stage A of GeDS, the "RD", that stands for Ratio of Deviances, the "SR", that stands for Smoothed Ratio of deviances and the "LR", that stands for Likelihood Ratio. The latter is based on the difference of deviances rather than on their ratio as in the case of "RD" and "SR". Therefore "LR" can be viewed as a log likelihood ratio test performed at each iteration of the knot placement. In each of these cases the corresponding stopping criterion is compared with a threshold value phi (see below).

The argument q is an input parameter that allows to fine-tune the stopping rule in stage A. It identifies the number of consecutive iterations over which the deviance should exhibit stable convergence so as the knot placement in stage A is terminated. More precisely, under any of the rules "RD", "SR", or "LR", the deviance at the current iteration is compared to the deviance computed q iterations before, i.e., before selecting the last q knots. Setting a higher q will lead to more knots being added before exiting stage A of GeDS.

Value

GeDS-Class object, i.e. a list of items that summarizes the main details of the fitted GeDS regression. See GeDS-Class for details. Some S3 methods are available in order to make these objects tractable, such as coef, deviance, knots, predict and print as well as S4 methods for lines and plot.

References

Examples


###################################################
# Generate a data sample for the response variable
# Y and the single covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)

# Fit a Normal GeDS regression using NGeDS
(Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2)))

# Apply some of the available methods, e.g.
# coefficients, knots and deviance extractions for the
# quadratic GeDS fit
# Note that the first call to the function knots returns
# also the left and right limits of the interval containing
# the data
coef(Gmod, n = 3)
knots(Gmod, n = 3)
knots(Gmod, n = 3, options = "internal")
deviance(Gmod, n = 3)

# Add a covariate, Z, that enters linearly
Z <- runif(N)
Y2 <- Y + 2*Z + 1
# Re-fit the data using NGeDS
(Gmod2 <- NGeDS(Y2 ~ f(X) + Z, beta = 0.6, phi = 0.995, Xextr = c(-2,2)))
coef(Gmod2, n = 3)
coef(Gmod2, onlySpline = FALSE, n = 3)

## Not run: 
##########################################
# Real data example
# See Kaishev et al. (2016), section 4.2
data('BaFe2As2')
(Gmod2 <- NGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.6, phi = 0.99, q = 3))
plot(Gmod2)

## End(Not run)

#########################################
# bivariate example
# See Dimitrova et al. (2023), section 5

# Generate a data sample for the response variable
# Z and the covariates X and Y assuming Normal noise
set.seed(123)
doublesin <- function(x){
 sin(2*x[,1])*sin(2*x[,2])
}

X <- (round(runif(400, min = 0, max = 3),2))
Y <- (round(runif(400, min = 0, max = 3),2))
Z <- doublesin(cbind(X,Y))
Z <- Z+rnorm(400, 0, sd = 0.1)
# Fit a two dimensional GeDS model using NGeDS
(BivGeDS <- NGeDS(Z ~ f(X, Y), phi = 0.9))

# Extract quadratic coefficients/knots/deviance
coef(BivGeDS, n = 3)
knots(BivGeDS, n = 3)
deviance(BivGeDS, n = 3)

# Surface plot of the generating function (doublesin)
plot(BivGeDS, f = doublesin)
# Surface plot of the fitted model
plot(BivGeDS)

###################################################
# Generate a data sample for the response variable
# Y and the single covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)

# Fit a Normal GeDS regression using NGeDS
(Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2)))

# Apply some of the available methods, e.g.
# coefficients, knots and deviance extractions for the
# quadratic GeDS fit
# Note that the first call to the function knots returns
# also the left and right limits of the interval containing
# the data
coef(Gmod, n = 3)
knots(Gmod, n = 3)
knots(Gmod, n = 3, options = "internal")
deviance(Gmod, n = 3)

# Add a covariate, Z, that enters linearly
Z <- runif(N)
Y2 <- Y + 2*Z + 1
# Re-fit the data using NGeDS
(Gmod2 <- NGeDS(Y2 ~ f(X) + Z, beta = 0.6, phi = 0.995, Xextr = c(-2,2)))
coef(Gmod2, n = 3)
coef(Gmod2, onlySpline = FALSE, n = 3)

## Not run: 
##########################################
# Real data example
# See Kaishev et al. (2016), section 4.2
data('BaFe2As2')
(Gmod2 <- NGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.6, phi = 0.99, q = 3))
plot(Gmod2)

## End(Not run)

#########################################
# bivariate example
# See Dimitrova et al. (2023), section 5

# Generate a data sample for the response variable
# Z and the covariates X and Y assuming Normal noise
set.seed(123)
doublesin <- function(x){
 sin(2*x[,1])*sin(2*x[,2])
}

X <- (round(runif(400, min = 0, max = 3),2))
Y <- (round(runif(400, min = 0, max = 3),2))
Z <- doublesin(cbind(X,Y))
Z <- Z+rnorm(400, 0, sd = 0.1)
# Fit a two dimensional GeDS model using NGeDS
(BivGeDS <- NGeDS(Z ~ f(X, Y), phi = 0.9))

# Extract quadratic coefficients/knots/deviance
coef(BivGeDS, n = 3)
knots(BivGeDS, n = 3)
deviance(BivGeDS, n = 3)

# Surface plot of the generating function (doublesin)
plot(BivGeDS, f = doublesin)
# Surface plot of the fitted model
plot(BivGeDS)

Component-wise gradient boosting with NGeDS base-learners

Description

NGeDSboost implements component-wise gradient boosting (Bühlmann and Yu (2003), Bühlmann and Hothorn (2007)) using normal GeD splines (i.e., fitted with NGeDS function) as base-learners (see Dimitrova et al. (2025)).

Usage

NGeDSboost(
  formula,
  data,
  weights = NULL,
  normalize_data = FALSE,
  family = mboost::Gaussian(),
  link = NULL,
  initial_learner = TRUE,
  int.knots_init = 2L,
  min_iterations,
  max_iterations,
  shrinkage = 1,
  phi_boost_exit = 0.99,
  q_boost = 2L,
  beta = 0.5,
  phi = 0.99,
  int.knots_boost = 500L,
  q = 2L,
  higher_order = TRUE,
  boosting_with_memory = FALSE
)
NGeDSboost(
  formula,
  data,
  weights = NULL,
  normalize_data = FALSE,
  family = mboost::Gaussian(),
  link = NULL,
  initial_learner = TRUE,
  int.knots_init = 2L,
  min_iterations,
  max_iterations,
  shrinkage = 1,
  phi_boost_exit = 0.99,
  q_boost = 2L,
  beta = 0.5,
  phi = 0.99,
  int.knots_boost = 500L,
  q = 2L,
  higher_order = TRUE,
  boosting_with_memory = FALSE
)

Arguments

`formula`	a description of the structure of the model to be fitted, including the dependent and independent variables. Unlike `NGeDS` and `GGeDS`, the formula specified allows for multiple additive GeD spline regression components (as well as linear components) to be included (e.g., `Y ~ f(X1) + f(X2) + X3`).
`data`	a data frame containing the variables referenced in the formula.
`weights`	an optional vector of ‘prior weights’ to be put on the observations during the fitting process. It should be `NULL` or a numeric vector of the same length as the response variable defined in the formula.
`normalize_data`	a logical that defines whether the data should be normalized (standardized) before fitting the baseline linear model, i.e., before running the FGB algorithm. Normalizing the data involves scaling the predictor variables to have a mean of 0 and a standard deviation of 1. Note that this process alters the scale and interpretation of the knots and coefficients estimated. Default is equal to `FALSE`.
`family`	determines the loss function to be optimized by the boosting algorithm. In case `initial_learner = FALSE` it also determines the corresponding empirical risk minimizer to be used as offset initial learner. By default, it is set to `mboost::Gaussian()`. Users can specify any `Family` object from the mboost package.
`link`	in case the `Family` object has not the desired link function you can specify it here.
`initial_learner`	a logical value. If set to `TRUE`, the model's initial learner will be a GeD spline. If set to `FALSE`, then the initial predictor will consist of the empirical risk minimizer corresponding to the specified `family`.
`int.knots_init`	optional parameter allowing the user to set a maximum number of internal knots to be added by the initial GeDS learner in case `initial_learner = TRUE`. Default is equal to `2L`.
`min_iterations`	optional parameter to manually set a minimum number of boosting iterations to be run. If not specified, it defaults to 0L.
`max_iterations`	optional parameter to manually set the maximum number of boosting iterations to be run. If not specified, it defaults to 100L. This setting serves as a fallback when the stopping rule, based on consecutive deviances and tuned by `phi_boost_exit` and `q_boost`, does not trigger an earlier termination (see Dimitrova et al. (2025)). Therefore, users can increase/decrease the number of boosting iterations, by increasing/decreasing the value `phi_boost_exit` and/or `q_boost`, or directly specify `max_iterations`.
`shrinkage`	numeric parameter in the interval $[0,1]$ defining the step size or shrinkage parameter. This controls the size of the steps taken in the direction of the gradient of the loss function. In other words, the magnitude of the update each new iteration contributes to the final model. Default is equal to `1`.
`phi_boost_exit`	numeric parameter in the interval $[0,1]$ specifying the threshold for the boosting iterations stopping rule. Default is equal to `0.99`.
`q_boost`	numeric parameter which allows to fine-tune the boosting iterations stopping rule, by default equal to `2L`.
`beta`	numeric parameter in the interval $[0,1]$ tuning the knot placement in stage A of GeDS. Default is equal to `0.5`. See details in `NGeDS`.
`phi`	numeric parameter in the interval $[0,1]$ specifying the threshold for the stopping rule (model selector) in stage A of GeDS. Default is equal to `0.99`. See details in `NGeDS`.
`int.knots_boost`	The maximum number of internal knots that can be added by the GeDS base-learners in each boosting iteration, effectively setting the value of `max.intknots` in `NGeDS` at each boosting iteration. Default is `500L`.
`q`	numeric parameter which allows to fine-tune the stopping rule of stage A of GeDS, by default equal to `2L`. See details in `NGeDS`.
`higher_order`	a logical that defines whether to compute the higher order fits (quadratic and cubic) after the FGB algorithm is run. Default is `TRUE`.
`boosting_with_memory`	logical value. If `TRUE`, boosting is performed taking into account previously fitted knots when fitting a GeDS learner at each new boosting iteration. If `boosting_with_memory` is `TRUE`, we recommend setting `int.knots_init = 1` and `int.knots_boost = 1`.

Details

The NGeDSboost function implements functional gradient boosting algorithm for some pre-defined loss function, using linear GeD splines as base learners. At each boosting iteration, the negative gradient vector is fitted through the base procedure encapsulated within the NGeDS function. The latter constructs a Geometrically Designed variable knots spline regression model for a response having a Normal distribution. The FGB algorithm yields a final linear fit. Higher order fits (quadratic and cubic) are then computed by calculating the Schoenberg’s variation diminishing spline (VDS) approximation of the linear fit.

On the one hand, NGeDSboost includes all the parameters of NGeDS, which in this case tune the base-learner fit at each boosting iteration. On the other hand, NGeDSboost includes some additional parameters proper to the FGB procedure. We describe the main ones as follows.

First, family allows to specify the loss function and corresponding risk function to be optimized by the boosting algorithm. If initial_learner = FALSE, the initial learner employed will be the empirical risk minimizer corresponding to the family chosen. If initial_learner = TRUE then the initial learner will be an NGeDS fit with maximum number of internal knots equal to int.knots_init.

shrinkage tunes the step length/shrinkage parameter which helps to control the learning rate of the model. In other words, when a new base learner is added to the ensemble, its contribution to the final prediction is multiplied by the shrinkage parameter. The smaller shrinkage is, the slower/more gradual the learning process will be, and viceversa.

The number of boosting iterations is controlled by a Ratio of Deviances stopping rule similar to the one presented for GGeDS. In the same way phi and q tune the stopping rule of GGeDS, phi_boost_exit and q_boost tune the stopping rule of NGeDSboost. The user can also manually control the number of boosting iterations through min_iterations and max_iterations.

Value

GeDSboost-Class object, i.e. a list of items that summarizes the main details of the fitted FGB-GeDS model. See GeDSboost-Class for details. Some S3 methods are available in order to make these objects tractable, such as coef, knots, print and predict. Also variable importance measures (bl_imp) and improved plotting facilities (visualize_boosting).

References

Friedman, J.H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29 (5), 1189–1232.
DOI: doi:10.1214/aos/1013203451

Bühlmann P., Yu B. (2003). Boosting With the L2 Loss. Journal of the American Statistical Association, 98(462), 324–339. doi:10.1198/016214503000125

Bühlmann P., Hothorn T. (2007). Boosting Algorithms: Regularization, Prediction and Model Fitting. Statistical Science, 22(4), 477 – 505.
DOI: doi:10.1214/07-STS242

Examples


################################# Example 1 #################################
# Generate a data sample for the response variable
# Y and the single covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.2)
data = data.frame(X, Y)

# Fit a Normal FGB-GeDS regression using NGeDSboost

Gmodboost <- NGeDSboost(Y ~ f(X), data = data)
MSE_Gmodboost_linear <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_linear)^2)
MSE_Gmodboost_quadratic <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_quadratic)^2)
MSE_Gmodboost_cubic <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_cubic)^2)

cat("\n", "MEAN SQUARED ERROR", "\n",
    "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n",
    "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n",
    "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n")

# Compute predictions on new randomly generated data
X <- sort(runif(100, min = -2, max = 2))

pred_linear <- predict(Gmodboost, newdata = data.frame(X), n = 2L)
pred_quadratic <- predict(Gmodboost, newdata = data.frame(X), n = 3L)
pred_cubic <- predict(Gmodboost, newdata = data.frame(X), n = 4L)

MSE_Gmodboost_linear <- mean((sapply(X, f_1) - pred_linear)^2)
MSE_Gmodboost_quadratic <- mean((sapply(X, f_1) - pred_quadratic)^2)
MSE_Gmodboost_cubic <- mean((sapply(X, f_1) - pred_cubic)^2)
cat("\n", "MEAN SQUARED ERROR", "\n",
    "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n",
    "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n",
    "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n")

## S3 methods for class 'GeDSboost'
# Print 
print(Gmodboost)
# Knots
knots(Gmodboost, n = 2L)
knots(Gmodboost, n = 3L)
knots(Gmodboost, n = 4L)
# Coefficients
coef(Gmodboost, n = 2L)
coef(Gmodboost, n = 3L)
coef(Gmodboost, n = 4L)
# Deviances
deviance(Gmodboost, n = 2L)
deviance(Gmodboost, n = 3L)
deviance(Gmodboost, n = 4L)

############################ Example 2 - Bodyfat ############################
library(TH.data)
data("bodyfat", package = "TH.data")

Gmodboost <- NGeDSboost(formula = DEXfat ~ age + f(hipcirc, waistcirc) + f(kneebreadth),
data = bodyfat, phi_boost_exit = 0.9, q_boost = 1, phi = 0.9, q = 1)

MSE_Gmodboost_linear <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_linear)^2)
MSE_Gmodboost_quadratic <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_quadratic)^2)
MSE_Gmodboost_cubic <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_cubic)^2)
# Comparison
cat("\n", "MSE", "\n",
    "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n",
    "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n",
    "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n")

################################# Example 1 #################################
# Generate a data sample for the response variable
# Y and the single covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.2)
data = data.frame(X, Y)

# Fit a Normal FGB-GeDS regression using NGeDSboost

Gmodboost <- NGeDSboost(Y ~ f(X), data = data)
MSE_Gmodboost_linear <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_linear)^2)
MSE_Gmodboost_quadratic <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_quadratic)^2)
MSE_Gmodboost_cubic <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_cubic)^2)

cat("\n", "MEAN SQUARED ERROR", "\n",
    "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n",
    "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n",
    "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n")

# Compute predictions on new randomly generated data
X <- sort(runif(100, min = -2, max = 2))

pred_linear <- predict(Gmodboost, newdata = data.frame(X), n = 2L)
pred_quadratic <- predict(Gmodboost, newdata = data.frame(X), n = 3L)
pred_cubic <- predict(Gmodboost, newdata = data.frame(X), n = 4L)

MSE_Gmodboost_linear <- mean((sapply(X, f_1) - pred_linear)^2)
MSE_Gmodboost_quadratic <- mean((sapply(X, f_1) - pred_quadratic)^2)
MSE_Gmodboost_cubic <- mean((sapply(X, f_1) - pred_cubic)^2)
cat("\n", "MEAN SQUARED ERROR", "\n",
    "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n",
    "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n",
    "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n")

## S3 methods for class 'GeDSboost'
# Print 
print(Gmodboost)
# Knots
knots(Gmodboost, n = 2L)
knots(Gmodboost, n = 3L)
knots(Gmodboost, n = 4L)
# Coefficients
coef(Gmodboost, n = 2L)
coef(Gmodboost, n = 3L)
coef(Gmodboost, n = 4L)
# Deviances
deviance(Gmodboost, n = 2L)
deviance(Gmodboost, n = 3L)
deviance(Gmodboost, n = 4L)

############################ Example 2 - Bodyfat ############################
library(TH.data)
data("bodyfat", package = "TH.data")

Gmodboost <- NGeDSboost(formula = DEXfat ~ age + f(hipcirc, waistcirc) + f(kneebreadth),
data = bodyfat, phi_boost_exit = 0.9, q_boost = 1, phi = 0.9, q = 1)

MSE_Gmodboost_linear <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_linear)^2)
MSE_Gmodboost_quadratic <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_quadratic)^2)
MSE_Gmodboost_cubic <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_cubic)^2)
# Comparison
cat("\n", "MSE", "\n",
    "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n",
    "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n",
    "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n")

NGeDSgam: Local Scoring Algorithm with GeD Splines in Backfitting

Description

Implements the Local Scoring Algorithm (Hastie and Tibshirani (1986)), applying normal linear GeD splines (i.e., NGeDS function) to fit the targets within each backfitting iteration. Higher order fits are computed by pursuing stage B of GeDS after the local-scoring algorithm is run.

Usage

NGeDSgam(
  formula,
  family = "gaussian",
  data,
  weights = NULL,
  offset = NULL,
  normalize_data = FALSE,
  min_iterations,
  max_iterations,
  phi_gam_exit = 0.99,
  q_gam = 2,
  beta = 0.5,
  phi = 0.99,
  internal_knots = 500,
  q = 2,
  higher_order = TRUE
)
NGeDSgam(
  formula,
  family = "gaussian",
  data,
  weights = NULL,
  offset = NULL,
  normalize_data = FALSE,
  min_iterations,
  max_iterations,
  phi_gam_exit = 0.99,
  q_gam = 2,
  beta = 0.5,
  phi = 0.99,
  internal_knots = 500,
  q = 2,
  higher_order = TRUE
)

Arguments

`formula`	a description of the model structure to be fitted, specifying both the dependent and independent variables. Unlike `NGeDS` and `GGeDS`, this formula supports multiple additive (normal) GeD spline regression components as well as linear components. For example, setting `formula = Y ~ f(X1) + f(X2) + X3` implies using a normal linear GeD spline as the smoother for `X1` and for `X2`, while for `X3` a linear model would be used.
`family`	a character string indicating the response variable distribution and link function to be used. Default is `"gaussian"`. This should be a character or a family object.
`data`	a data frame containing the variables referenced in the formula.
`weights`	an optional vector of ‘prior weights’ to be put on the observations during the fitting process. It should be `NULL` or a numeric vector of the same length as the response variable defined in the formula.
`offset`	a vector of size $N$ that can be used to specify a fixed component to be included in the linear predictor during fitting. In case more than one covariate is fixed, the user should sum the corresponding coordinates of the fixed covariates to produce one common $N$ -vector of coordinates.
`normalize_data`	a logical that defines whether the data should be normalized (standardized) before fitting the baseline linear model, i.e., before running the local-scoring algorithm. Normalizing the data involves scaling the predictor variables to have a mean of 0 and a standard deviation of 1. This process alters the scale and interpretation of the knots and coefficients estimated. Default is equal to `FALSE`.
`min_iterations`	optional parameter to manually set a minimum number of boosting iterations to be run. If not specified, it defaults to 0L.
`max_iterations`	optional parameter to manually set the maximum number of boosting iterations to be run. If not specified, it defaults to 100L. This setting serves as a fallback when the stopping rule, based on consecutive deviances and tuned by `phi_gam_exit` and `q_gam`, does not trigger an earlier termination (see Dimitrova et al. (2025)). Therefore, users can increase/decrease the number of boosting iterations, by increasing/decreasing the value `phi_gam_exit` and/or `q_gam`, or directly specify `max_iterations`.
`phi_gam_exit`	Convergence threshold for local-scoring and backfitting. Both algorithms stop when the relative change in the deviance is below this threshold. Default is `0.99`.
`q_gam`	numeric parameter which allows to fine-tune the stopping rule of the local-scoring and backfitting iterations. By default equal to `2L`.
`beta`	numeric parameter in the interval $[0,1]$ tuning the knot placement in stage A of GeDS, for each of the GeD spline components of the model. Default is equal to `0.5`. See details in `NGeDS`.
`phi`	numeric parameter in the interval $[0,1]$ specifying the threshold for the stopping rule (model selector) in stage A of GeDS, for each of the GeD spline components of the model. Default is equal to `0.99`. See details in `NGeDS`.
`internal_knots`	The maximum number of internal knots that can be added by the GeDS base-learners in each boosting iteration, effectively setting the value of `max.intknots` in `NGeDS` at each backfitting iteration. Default is `500L`.
`q`	numeric parameter which allows to fine-tune the stopping rule of stage A of GeDS, for each of the GeD spline components of the model. By default equal to `2L`. See details in `NGeDS`.
`higher_order`	a logical that defines whether to compute the higher order fits (quadratic and cubic) after the local-scoring algorithm is run. Default is `TRUE`.

Details

The NGeDSgam function employs the local scoring algorithm to fit a Generalized Additive Model (GAM). This algorithm iteratively fits weighted additive models by backfitting. Normal linear GeD splines, as well as linear learners, are supported as function smoothers within the backfitting algorithm. The local-scoring algorithm ultimately produces a linear fit. Higher order fits (quadratic and cubic) are then computed by calculating the Schoenberg’s variation diminishing spline (VDS) approximation of the linear fit.

On the one hand, NGeDSgam includes all the parameters of NGeDS, which in this case tune the function smoother fit at each backfitting iteration. On the other hand, NGeDSgam includes some additional parameters proper to the local-scoring procedure. We describe the main ones as follows.

The family chosen determines the link function, adjusted dependent variable and weights to be used in the local-scoring algorithm. The number of local-scoring and backfitting iterations is controlled by a Ratio of Deviances stopping rule similar to the one presented for GGeDS. In the same way phi and q tune the stopping rule of GGeDS, phi_boost_exit and q_boost tune the stopping rule of NGeDSgam. The user can also manually control the number of local-scoring iterations through min_iterations and max_iterations.

Value

GeDSgam-Class object, i.e. a list of items that summarizes the main details of the fitted GAM-GeDS model. See GeDSgam-Class for details. Some S3 methods are available in order to make these objects tractable, such as coef, knots, print and predict.

References

Hastie, T. and Tibshirani, R. (1986). Generalized Additive Models. Statistical Science 1 (3) 297 - 310.
DOI: doi:10.1214/ss/1177013604

Examples


# Load package
library(GeDS) 

data(airquality) 
data = na.omit(airquality)
data$Ozone <- data$Ozone^(1/3)

formula = Ozone ~ f(Solar.R) + f(Wind, Temp)
Gmodgam <- NGeDSgam(formula = formula, data = data,
phi = 0.8)
MSE_Gmodgam_linear <- mean((data$Ozone - Gmodgam$predictions$pred_linear)^2)
MSE_Gmodgam_quadratic <- mean((data$Ozone - Gmodgam$predictions$pred_quadratic)^2)
MSE_Gmodgam_cubic <- mean((data$Ozone - Gmodgam$predictions$pred_cubic)^2)

cat("\n", "MEAN SQUARED ERROR", "\n",
"Linear NGeDSgam:", MSE_Gmodgam_linear, "\n",
"Quadratic NGeDSgam:", MSE_Gmodgam_quadratic, "\n",
"Cubic NGeDSgam:", MSE_Gmodgam_cubic, "\n")

## S3 methods for class 'GeDSboost'
# Print 
print(Gmodgam)
# Knots
knots(Gmodgam, n = 2L)
knots(Gmodgam, n = 3L)
knots(Gmodgam, n = 4L)
# Coefficients
coef(Gmodgam, n = 2L)
coef(Gmodgam, n = 3L)
coef(Gmodgam, n = 4L)
# Deviances
deviance(Gmodgam, n = 2L)
deviance(Gmodgam, n = 3L)
deviance(Gmodgam, n = 4L)

# Load package
library(GeDS) 

data(airquality) 
data = na.omit(airquality)
data$Ozone <- data$Ozone^(1/3)

formula = Ozone ~ f(Solar.R) + f(Wind, Temp)
Gmodgam <- NGeDSgam(formula = formula, data = data,
phi = 0.8)
MSE_Gmodgam_linear <- mean((data$Ozone - Gmodgam$predictions$pred_linear)^2)
MSE_Gmodgam_quadratic <- mean((data$Ozone - Gmodgam$predictions$pred_quadratic)^2)
MSE_Gmodgam_cubic <- mean((data$Ozone - Gmodgam$predictions$pred_cubic)^2)

cat("\n", "MEAN SQUARED ERROR", "\n",
"Linear NGeDSgam:", MSE_Gmodgam_linear, "\n",
"Quadratic NGeDSgam:", MSE_Gmodgam_quadratic, "\n",
"Cubic NGeDSgam:", MSE_Gmodgam_cubic, "\n")

## S3 methods for class 'GeDSboost'
# Print 
print(Gmodgam)
# Knots
knots(Gmodgam, n = 2L)
knots(Gmodgam, n = 3L)
knots(Gmodgam, n = 4L)
# Coefficients
coef(Gmodgam, n = 2L)
coef(Gmodgam, n = 3L)
coef(Gmodgam, n = 4L)
# Deviances
deviance(Gmodgam, n = 2L)
deviance(Gmodgam, n = 3L)
deviance(Gmodgam, n = 4L)

Plot method for GeDS objects.

Description

Plot method for GeDS objects. Plots GeDS fits.

Usage

## S4 method for signature 'GeDS,ANY'
plot(
  x,
  f = NULL,
  which,
  DEV = FALSE,
  ask = FALSE,
  main,
  legend.pos = "topright",
  legend.text = NULL,
  new.window = FALSE,
  wait = 0.5,
  n = 3L,
  type = c("none", "Polygon", "NCI", "ACI"),
  ...
)
## S4 method for signature 'GeDS,ANY'
plot(
  x,
  f = NULL,
  which,
  DEV = FALSE,
  ask = FALSE,
  main,
  legend.pos = "topright",
  legend.text = NULL,
  new.window = FALSE,
  wait = 0.5,
  n = 3L,
  type = c("none", "Polygon", "NCI", "ACI"),
  ...
)

Arguments

`x`	a `GeDS-class` object from which the GeDS fit(s) should be extracted.
`f`	(optional) specifies the underlying function or generating process to which the model was fit. This parameter is useful if the user wishes to plot the specified function/process alongside the model fit and the data
`which`	a numeric vector specifying the iterations of stage A for which the corresponding GeDS fits should be plotted. It has to be a subset of `1:nrow(x$stored)`. See details.
`DEV`	logical variable specifying whether a plot representing the deviance at each iteration of stage A should be produced or not.
`ask`	logical variable specifying whether the user should be prompted before changing the plot page.
`main`	an optional character string used as the plot title. If set to '"detail"', the knots vector will be displayed on the plot.
`legend.pos`	the position of the legend within the panel. See legend for details.
`legend.text`	a character vector specifying the legend text.
`new.window`	logical variable specifying whether the plot should be shown in a new window or in the active one.
`wait`	time, in seconds, the system should wait before plotting a new page. Ignored if `ask = TRUE`.
`n`	integer value (2, 3 or 4) specifying the order ( $=$ degree $+ 1$ ) of the GeDS fit that should be plotted. By default equal to `3L`. Non-integer values will be passed to the function `as.integer`.
`type`	character string specifying the type of plot required. Should be set either to `"Polygon"` if the user wants to get also the control polygon of the GeDS fit, `"NCI"` or `"ACI"` if 95% confidence bands for the predictions should be plotted (see details) or `"none"` if only the fitted GeDS curve should be plotted. Applies only when plotting a univariate spline regression.
`...`	further arguments to be passed to the `plot.default` function.

Details

This method is provided in order to allow the user to plot the GeDS fits contained in the GeDS-Class objects.

Since in Stage A of the GeDS algorithm the knots of a linear spline fit are sequentially located, one at a time, the user may wish to visually inspect this process using the argument which. The latter specifies a particular iteration number (or a vector of such numbers) for which the corresponding linear fit(s) should be plotted. The ask and wait arguments can help the user to manage these pages.

By means of ask the user can determine for how long each page should appear on the screen. Pages are sequentially replaced by pressing the enter button.

Note that, in order to ensure stability, if the object was produced by the function GGeDS, plotting intermediate fits of stage A is allowed only if n = 2, in contrast to objects produced by NGeDS for which plotting intermediate results is allowed also for n = 2 or 3 results.

The confidence intervals obtained by setting type = "NCI" are approximate local bands obtained considering the knots as fixed constants. Hence the columns of the design matrix are seen as covariates and standard methodology relying on the se.fit option of predict.lm or predict.glm is applied.

Setting type = "ACI", asymptotic confidence intervals are plotted. This option is applicable only if the canonical link function has been used in the fitting procedure.

Examples

###################################################
# Generate a data sample for the response variable
# Y and the single covariate X, assuming Normal noise
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)

# Fit a Normal GeDS regression using NGeDS
(Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2)))

# Plot the final quadratic GeDS fit (red solid line)
# with its control polygon (blue dashed line)
plot(Gmod)

# Plot the quadratic fit obtained from the linear fit at the 10th
# iteration of stage A i.e. after 9 internal knots have been inserted
# by the GeDS procedure
plot(Gmod, which=10)

# Generate plots of all the intermediate fits obtained
# by running the GeDS procedure
## Not run: 
plot(Gmod, which=1:16)

## End(Not run)

###################################################
# Generate a data sample for the response variable Y and the covariate
# X assuming Poisson distributed error and a log link function

set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N ,min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- exp(f_1(X))
# Generate Poisson distributed Y according to the mean model
Y <- rpois(N,means)

# Fit a Poisson GeDS regression model using GGeDS
(Gmod2 <- GGeDS(Y ~ f(X), beta = 0.2, phi = 0.995, family = poisson(),
                Xextr = c(-2,2)))

# similar plots as before, but for the linear fit
plot(Gmod2, n = 2)
plot(Gmod2, which = 10, n = 2)
## Not run: 
plot(Gmod2, which = 1:16, n = 2)
plot(Gmod2, which = 1:16, n = 2, ask = T)

## End(Not run)

###################################################
# Generate a data sample for the response variable
# Y and the single covariate X, assuming Normal noise
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)

# Fit a Normal GeDS regression using NGeDS
(Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2)))

# Plot the final quadratic GeDS fit (red solid line)
# with its control polygon (blue dashed line)
plot(Gmod)

# Plot the quadratic fit obtained from the linear fit at the 10th
# iteration of stage A i.e. after 9 internal knots have been inserted
# by the GeDS procedure
plot(Gmod, which=10)

# Generate plots of all the intermediate fits obtained
# by running the GeDS procedure
## Not run: 
plot(Gmod, which=1:16)

## End(Not run)

###################################################
# Generate a data sample for the response variable Y and the covariate
# X assuming Poisson distributed error and a log link function

set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N ,min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- exp(f_1(X))
# Generate Poisson distributed Y according to the mean model
Y <- rpois(N,means)

# Fit a Poisson GeDS regression model using GGeDS
(Gmod2 <- GGeDS(Y ~ f(X), beta = 0.2, phi = 0.995, family = poisson(),
                Xextr = c(-2,2)))

# similar plots as before, but for the linear fit
plot(Gmod2, n = 2)
plot(Gmod2, which = 10, n = 2)
## Not run: 
plot(Gmod2, which = 1:16, n = 2)
plot(Gmod2, which = 1:16, n = 2, ask = T)

## End(Not run)

Plot method for GeDSboost objects.

Description

Plots the component functions of a GeDSboost object fitted using NGeDSboost. If the model has a single base-learner, the plot will be returned on the response scale. Otherwise, plots are produced on the linear predictor scale. Note that only univariate base-learner plots are returned, as representation of the boosted model as a single spline model is available only for univariate base-learners (see Dimitrova et al. (2025)). In addition since component-wise gradient boosting inherently performs base-learner selection, you should only expect plots for the base-learners that where selected across the boosting iterations.

Usage

## S4 method for signature 'GeDSboost,ANY'
plot(x, n = 3L, ...)
## S4 method for signature 'GeDSboost,ANY'
plot(x, n = 3L, ...)

Arguments

`x`	a `GeDSboost-class` object from which the GeDSboost fit should be extracted.
`n`	integer value (2, 3 or 4) specifying the order ( $=$ degree $+ 1$ ) of the FGB-GeDS fit to be extracted.
`...`	further arguments to be passed to the `plot.default` function.

References

Plot method for GeDSgam objects.

Description

Plots on the linear predictor scale the component functions of a GeDSgam object fitted using NGeDSgam.

Usage

## S4 method for signature 'GeDSgam,ANY'
plot(x, base_learners = NULL, f = NULL, n = 3L, ...)
## S4 method for signature 'GeDSgam,ANY'
plot(x, base_learners = NULL, f = NULL, n = 3L, ...)

Arguments

`x`	a `GeDSgam-class` object from which the GeDSgam fit(s) should be extracted.
`base_learners`	either NULL or a vector of character string specifying the base-learners of the model for which predictions should be plotted. Note that single base-learner predictions are provided on the linear predictor scale.
`f`	(optional) specifies the underlying component function or generating process to which the model was fit. This parameter is useful if the user wishes to plot the specified function/process alongside the model fit and the data.
`n`	integer value (2, 3 or 4) specifying the order ( $=$ degree $+ 1$ ) of the GAM-GeDS fit.
`...`	further arguments to be passed to the `plot.default` function.

Piecewise Polynomial Spline Representation

Description

The function converts a GeDS fit which has a B-spline representation to a piecewise polynomial form.

Usage

PPolyRep(object, n = 3)
PPolyRep(object, n = 3)

Arguments

`object`	the `GeDS-class` where the GeDS fit to be converted is found.
`n`	integer value (2, 3 or 4) specifying the order ( $=$ degree $+ 1$ ) of the GeDS fit which should be converted to a piecewise polynomial form. By default equal to `3L`. Non-integer values will be passed to the function `as.integer`.

Details

This function converts a selected GeDS fit from a GeDS-class object represented in terms of B-splines into an object where the fit is represented in terms of piecewise polynomials.

The function wraps polySpline in order to let it accept GeDS-class objects as input. Hence the function provides a useful link between the package GeDS and the package splines, allowing the user to take advantage of the functions provided in the splines package.

Value

An object that inherits from classes "spline" and "polySpline". It is a list whose arguments are:

`knots`	a vector of size $k + 2$ containing the complete set of knots (internal knots plus the limits of the interval) of the GeDS fit.
`coefficients`	a $(k + 2) \times n$ matrix containing the coefficients of the polynomials in the required piecewise polynomial representation.

Note

Let us note that the first $k+1$ rows of the matrix contain the n coefficients of the $k+1$ consecutive pieces of the piecewise polynomial representation. The last $(k+2)$ -th row is extraneous and it appears as a result of the use of the function polySpline.

Examples

# Generate a data sample for the response variable
# Y and the single covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only
# a component non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)

# Fit a Normal GeDS regression using NGeDS
Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2))

# construct the PP representation of the cubic GeDS fit
# and apply some functions of the package splines
Polymod <- PPolyRep(Gmod, 4)
require(splines)
class(Polymod)
splineKnots(Polymod)
knots(Gmod, n = 4)
plot(Polymod)


# Generate a plot showing the PP representation
# based on the same example
knt <- splineKnots(Polymod)
coeffs <- coef(Polymod)
plot(Gmod, n = 4, legend.pos = FALSE, main = "Cubic Curves")
cols <- sample(heat.colors(length(knt)), length(knt))
for(i in 1:(length(knt))){
  curve(coeffs[i,1] + coeffs[i,2]*(x - knt[i])+
          coeffs[i,3]*(x - knt[i])^2+
        coeffs[i,4]*(x - knt[i])^3,
        add = TRUE, col = cols[i])
  abline(v = knt[i])
}

# Generate a data sample for the response variable
# Y and the single covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only
# a component non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)

# Fit a Normal GeDS regression using NGeDS
Gmod <- NGeDS(Y ~ f(X), beta = 0.6, phi = 0.995, Xextr = c(-2,2))

# construct the PP representation of the cubic GeDS fit
# and apply some functions of the package splines
Polymod <- PPolyRep(Gmod, 4)
require(splines)
class(Polymod)
splineKnots(Polymod)
knots(Gmod, n = 4)
plot(Polymod)


# Generate a plot showing the PP representation
# based on the same example
knt <- splineKnots(Polymod)
coeffs <- coef(Polymod)
plot(Gmod, n = 4, legend.pos = FALSE, main = "Cubic Curves")
cols <- sample(heat.colors(length(knt)), length(knt))
for(i in 1:(length(knt))){
  curve(coeffs[i,1] + coeffs[i,2]*(x - knt[i])+
          coeffs[i,3]*(x - knt[i])^2+
        coeffs[i,4]*(x - knt[i])^3,
        add = TRUE, col = cols[i])
  abline(v = knt[i])
}

Predict method for GeDS objects

Description

This is a user friendly method to compute predictions from GeDS objects.

Usage

## S3 method for class 'GeDS'
predict(object, newdata, type = c("response", "link", "terms"), n = 3L, ...)
## S3 method for class 'GeDS'
predict(object, newdata, type = c("response", "link", "terms"), n = 3L, ...)

Arguments

`object`	the `GeDS-class` object for which the computation of the predicted values is required.
`newdata`	an optional `data.frame`, `list` or `environment` containing values of the independent variables for which predicted values of the predictor model (including the GeDS and the parametric components) should be computed. If left empty the values are extracted from the object `x` itself.
`type`	character string indicating the type of prediction required. By default it is equal to `"response"`, i.e. the result is on the scale of the response variable. See details for the other options.
`n`	integer value (2, 3 or 4) specifying the order ( $=$ degree $+ 1$ ) of the GeDS fit whose predicted values should be computed. By default equal to `3L`. Non-integer values will be passed to the function `as.integer`.
`...`	potentially further arguments (required by the definition of the generic function). They are ignored, but with a warning.

Details

This is a method for the function predict that allows the user to handle GeDS-Class objects.

In analogy with the function predict.glm in the stats package, the user can specify the scale on which the predictions should be computed through the argument type. If the predictions are required to be on the scale of the response variable, the user should set type = "response", which is the default. Alternatively if one wants the predictions to be on the predictor scale, it is necessary to set type = "link". By specifying type = "terms", it is possible to inspect the predicted values separately for each single independent variable which enter either the GeD spline component or the parametric component of the predictor model. In this case the returned result is a matrix whose columns correspond to the terms supplied via newdata or extracted from the object.

As GeDS objects contain three different fits (linear, quadratic and cubic), it is possible to specify the order for which GeDS predictions are required via the input argument n.

Value

A numeric vector corresponding to the predicted values (if type = "link" or type = "response"). If type = "terms" a numeric matrix with a column per term.

Predict method for GeDSboost, GeDSgam

Description

This method computes predictions from GeDSboost and GeDSgam objects. It is designed to be user-friendly and accommodate different orders of the GeDSboost or GeDSgam fits.

Usage

## S3 method for class 'GeDSboost'
predict(object, newdata, n = 3L, base_learner = NULL, ...)

## S3 method for class 'GeDSgam'
predict(object, newdata, n = 3L, base_learner = NULL, ...)
## S3 method for class 'GeDSboost'
predict(object, newdata, n = 3L, base_learner = NULL, ...)

## S3 method for class 'GeDSgam'
predict(object, newdata, n = 3L, base_learner = NULL, ...)

Arguments

`object`	the `GeDSboost-class` or `GeDSgam-class` object.
`newdata`	an optional data frame for prediction.
`n`	the order of the GeDS fit (`2L` for linear, `3L` for quadratic, and `4L` for cubic). Default is `3L`.
`base_learner`	either `NULL` or a `character` string specifying the base-learner of the model for which predictions should be computed. Note that single base-learner predictions are provided on the linear predictor scale.
`...`	potentially further arguments.

Value

Numeric vector of predictions (vector of means).

References

Gu, C. and Wahba, G. (1991). Minimizing GCV/GML Scores with Multiple Smoothing Parameters via the Newton Method. SIAM J. Sci. Comput., 12, 383–398.

Examples

## Gu and Wahba 4 univariate term example ##
# Generate a data sample for the response variable
# y and the covariates x0, x1 and x2; include a noise predictor x3
set.seed(123)
N <- 400
f_x0x1x2 <- function(x0,x1,x2) {
  f0 <- function(x0) 2 * sin(pi * x0)
  f1 <- function(x1) exp(2 * x1)
  f2 <- function(x2) 0.2 * x2^11 * (10 * (1 - x2))^6 + 10 * (10 * x2)^3 * (1 - x2)^10
  f <- f0(x0) + f1(x1) + f2(x2)
  return(f)
}
x0 <- runif(N, 0, 1)
x1 <- runif(N, 0, 1)
x2 <- runif(N, 0, 1)
x3 <- runif(N, 0, 1)
# Specify a model for the mean of y
f <- f_x0x1x2(x0 = x0, x1 = x1, x2 = x2)
# Add (Normal) noise to the mean of y
y <- rnorm(N, mean = f, sd = 0.2)
data <- data.frame(y = y, x0 = x0, x1 = x1, x2 = x2, x3 = x3)

# Fit a GeDSgam model
Gmodgam <- NGeDSgam(y ~ f(x0) + f(x1) + f(x2) + f(x3), data = data)
# Check that the sum of the individual base-learner predictions equals the final
# model prediction

pred0 <- predict(Gmodgam, n = 2, newdata = data, base_learner = "f(x0)")
pred1 <- predict(Gmodgam, n = 2, newdata = data, base_learner = "f(x2)")
pred2 <- predict(Gmodgam, n = 2, newdata = data, base_learner = "f(x1)")
pred3 <- predict(Gmodgam, n = 2, newdata = data, base_learner = "f(x3)")
round(predict(Gmodgam, n = 2, newdata = data) -
(mean(predict(Gmodgam, n = 2, newdata = data)) + pred0 + pred1 + pred2 + pred3), 12)

pred0 <- predict(Gmodgam, n = 3, newdata = data, base_learner = "f(x0)")
pred1 <- predict(Gmodgam, n = 3, newdata = data, base_learner = "f(x2)")
pred2 <- predict(Gmodgam, n = 3, newdata = data, base_learner = "f(x1)")
pred3 <- predict(Gmodgam, n = 3, newdata = data, base_learner = "f(x3)")

round(predict(Gmodgam, n = 3, newdata = data) - (pred0 + pred1 + pred2 + pred3), 12)

pred0 <- predict(Gmodgam, n = 4, newdata = data, base_learner = "f(x0)")
pred1 <- predict(Gmodgam, n = 4, newdata = data, base_learner = "f(x2)")
pred2 <- predict(Gmodgam, n = 4, newdata = data, base_learner = "f(x1)")
pred3 <- predict(Gmodgam, n = 4, newdata = data, base_learner = "f(x3)")

round(predict(Gmodgam, n = 4, newdata = data) - (pred0 + pred1 + pred2 + pred3), 12)

# Plot GeDSgam partial fits to f(x0), f(x1), f(x2)
par(mfrow = c(1,3))
for (i in 1:3) {
  # Plot the base learner
  plot(Gmodgam, n = 3, base_learners = paste0("f(x", i-1, ")"), col = "seagreen",
       cex.lab = 1.5, cex.axis = 1.5)
  # Add legend
  if (i == 2) {
    position <- "topleft"
    } else if (i == 3) {
      position <- "topright"
      } else {
        position <- "bottom"
      }
  legend(position, legend = c("GAM-GeDS Quadratic", paste0("f(x", i-1, ")")),
         col = c("seagreen", "darkgray"),
         lwd = c(2, 2),
         bty = "n",
         cex = 1.5)
}

## Gu and Wahba 4 univariate term example ##
# Generate a data sample for the response variable
# y and the covariates x0, x1 and x2; include a noise predictor x3
set.seed(123)
N <- 400
f_x0x1x2 <- function(x0,x1,x2) {
  f0 <- function(x0) 2 * sin(pi * x0)
  f1 <- function(x1) exp(2 * x1)
  f2 <- function(x2) 0.2 * x2^11 * (10 * (1 - x2))^6 + 10 * (10 * x2)^3 * (1 - x2)^10
  f <- f0(x0) + f1(x1) + f2(x2)
  return(f)
}
x0 <- runif(N, 0, 1)
x1 <- runif(N, 0, 1)
x2 <- runif(N, 0, 1)
x3 <- runif(N, 0, 1)
# Specify a model for the mean of y
f <- f_x0x1x2(x0 = x0, x1 = x1, x2 = x2)
# Add (Normal) noise to the mean of y
y <- rnorm(N, mean = f, sd = 0.2)
data <- data.frame(y = y, x0 = x0, x1 = x1, x2 = x2, x3 = x3)

# Fit a GeDSgam model
Gmodgam <- NGeDSgam(y ~ f(x0) + f(x1) + f(x2) + f(x3), data = data)
# Check that the sum of the individual base-learner predictions equals the final
# model prediction

pred0 <- predict(Gmodgam, n = 2, newdata = data, base_learner = "f(x0)")
pred1 <- predict(Gmodgam, n = 2, newdata = data, base_learner = "f(x2)")
pred2 <- predict(Gmodgam, n = 2, newdata = data, base_learner = "f(x1)")
pred3 <- predict(Gmodgam, n = 2, newdata = data, base_learner = "f(x3)")
round(predict(Gmodgam, n = 2, newdata = data) -
(mean(predict(Gmodgam, n = 2, newdata = data)) + pred0 + pred1 + pred2 + pred3), 12)

pred0 <- predict(Gmodgam, n = 3, newdata = data, base_learner = "f(x0)")
pred1 <- predict(Gmodgam, n = 3, newdata = data, base_learner = "f(x2)")
pred2 <- predict(Gmodgam, n = 3, newdata = data, base_learner = "f(x1)")
pred3 <- predict(Gmodgam, n = 3, newdata = data, base_learner = "f(x3)")

round(predict(Gmodgam, n = 3, newdata = data) - (pred0 + pred1 + pred2 + pred3), 12)

pred0 <- predict(Gmodgam, n = 4, newdata = data, base_learner = "f(x0)")
pred1 <- predict(Gmodgam, n = 4, newdata = data, base_learner = "f(x2)")
pred2 <- predict(Gmodgam, n = 4, newdata = data, base_learner = "f(x1)")
pred3 <- predict(Gmodgam, n = 4, newdata = data, base_learner = "f(x3)")

round(predict(Gmodgam, n = 4, newdata = data) - (pred0 + pred1 + pred2 + pred3), 12)

# Plot GeDSgam partial fits to f(x0), f(x1), f(x2)
par(mfrow = c(1,3))
for (i in 1:3) {
  # Plot the base learner
  plot(Gmodgam, n = 3, base_learners = paste0("f(x", i-1, ")"), col = "seagreen",
       cex.lab = 1.5, cex.axis = 1.5)
  # Add legend
  if (i == 2) {
    position <- "topleft"
    } else if (i == 3) {
      position <- "topright"
      } else {
        position <- "bottom"
      }
  legend(position, legend = c("GAM-GeDS Quadratic", paste0("f(x", i-1, ")")),
         col = c("seagreen", "darkgray"),
         lwd = c(2, 2),
         bty = "n",
         cex = 1.5)
}

Print method for GeDS, GeDSboost, GeDSgam

Description

Method for the generic function print that allows to print on screen the main information related to the fitted predictor model that can be extracted from a GeDS-class, GeDSboost-class or GeDSgam-class object.

Usage

## S3 method for class 'GeDS'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'GeDSboost'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'GeDSgam'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'GeDS'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'GeDSboost'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'GeDSgam'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

Arguments

`x`	the `GeDS-class`, `GeDSboost-class` or `GeDSgam-class` object for which the main information should be printed on screen.
`digits`	number of digits to be printed.
`...`	potentially further arguments (required by the definition of the generic function).

Details

This method allows to print on screen basic information related to the fitted predictor model such as the function call, the number of internal knots for the linear GeDS/FGB-GeDS/GAM-GeDS fit and the deviances for the three (linear, quadratic and cubic) fitted predictor models embedded in the GeDSboost-class or GeDSgam-class object.

Value

This function returns (invisibly) the same input object, but adding the slot Print that contains the three sub-slots:

`Nknots`	the number of internal knots of the linear GeDS/FGB-GeDS/GAM-GeDS fit
`Deviances`	the deviances of the three (linear, quadratic and cubic) GeDS/FGB-GeDS/GAM-GeDS fits
`Call`	the `call` to the function that produced the `x` object

Estimation of the coefficients of a predictor model with spline and possibly parametric components.

Description

Functions that estimate the coefficients of a predictor model involving a spline component and possibly a parametric component applying (Iteratively Re-weighted) Least Squares (IR)LS iteration.

Usage

SplineReg_LM(
  X,
  Y,
  Z = NULL,
  offset = rep(0, length(X)),
  weights = rep(1, length(X)),
  InterKnots,
  n,
  extr = range(X),
  prob = 0.95,
  coefficients = NULL,
  only_pred = FALSE
)

SplineReg_GLM(
  X,
  Y,
  Z,
  offset = rep(0, nobs),
  weights = rep(1, length(X)),
  InterKnots,
  n,
  extr = range(X),
  family,
  mustart,
  inits = NULL,
  etastart = NULL
)
SplineReg_LM(
  X,
  Y,
  Z = NULL,
  offset = rep(0, length(X)),
  weights = rep(1, length(X)),
  InterKnots,
  n,
  extr = range(X),
  prob = 0.95,
  coefficients = NULL,
  only_pred = FALSE
)

SplineReg_GLM(
  X,
  Y,
  Z,
  offset = rep(0, nobs),
  weights = rep(1, length(X)),
  InterKnots,
  n,
  extr = range(X),
  family,
  mustart,
  inits = NULL,
  etastart = NULL
)

Arguments

`X`	a numeric vector containing $N$ sample values of the covariate chosen to enter the spline regression component of the predictor model.
`Y`	a vector of size $N$ containing the observed values of the response variable $y$ .
`Z`	a design matrix with $N$ rows containing other covariates selected to enter the parametric component of the predictor model (see `formula`). If no such covariates are selected, it is set to `NULL` by default.
`offset`	a vector of size $N$ that can be used to specify a fixed covariate to be included in the predictor model avoiding the estimation of its corresponding regression coefficient. In case more than one covariate is fixed, the user should sum the corresponding coordinates of the fixed covariates to produce one common $N$ -vector of coordinates. The argument `offset` is particularly useful in `Splinereg_GLM` if the link function used is not the identity.
`weights`	an optional vector of ‘prior weights’ to be put on the observations in the fitting process in case the user requires weighted fitting. It is a vector of 1s by default.
`InterKnots`	a numeric vector containing the locations of the internal knots necessary to compute the B-splines. In GeDS these are the internal knots in a current iteration of stage A.
`n`	integer value specifying the order of the spline to be evaluated. It should be 2 (linear spline), 3 (quadratic spline) or 4 (cubic spline). Non-integer values will be passed to the function `as.integer`.
`extr`	optional numeric vector of 2 elements representing the left-most and right-most limits of the interval embedding the sample values of `X`. By default equal correspondingly to the smallest and largest values of `X`.
`prob`	the confidence level to be used for the confidence bands in the `SplineReg_LM` fit. See details below.
`coefficients`	optional vector of spline coefficients. If provided, `SplineReg` computes only the corresponding predicted values.
`only_pred`	logical, if `TRUE` only `Theta`, `Predicted`, `Residuals` and `RSS` will be computed.
`family`	a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. See `family` for details of family functions.
`mustart`	initial values for the vector of means in the IRLS estimation. Must be a vector of length $N$ .
`inits`	a numeric vector of length `length(InterKnots) + n + NCOL(Z)` providing initial values for the coefficients, to be used in the IRLS estimation (alternative to providing the `mustart` vector).
`etastart`	initial values for the predictor in the IRLS estimation (alternative to providing either `inits` or `mustart`). Must be a vector of length $N$ .

Details

The functions estimate the coefficients of a predictor model with a spline component (and possibly a parametric component) for a given, fixed order and vector of knots of the spline and a specified distribution of the response variable (from the Exponential Family). The functions SplineReg_LM and SplineReg_GLM are based correspondingly on LS and IRLS and used correspondingly in NGeDS and GGeDS, to estimate the coefficients of the final GeDS fits of stage B, after their knots have been positioned to coincide with the Greville abscissas of the knots of the linear fit from stage A (see Dimitrova et al. 2023). Additional inference related quantities are also computed (see Value below). The function SplineReg_GLM is also used to estimate the coefficients of the linear GeDS fit of stage A within GGeDS, whereas in NGeDS this estimation is performed internally leading to faster R code.

In addition SplineReg_LM computes some useful quantities among which confidence intervals and the Control Polygon (see Section 2 of Kaishev et al. 2016).

The confidence intervals contained in the output slot NCI are approximate local bands obtained considering the knots as fixed constants. Hence the columns of the design matrix are seen as covariates and standard methodology relying on the se.fit option of predict.lm or predict.glm is used. In the ACI slot, asymptotic confidence intervals are provided, following Kaishev et al (2006). If the variance matrix is singular the Moore-Penrose pseudo-inverse is computed instead.

As mentioned, SplineReg_GLM is intensively used in Stage A of the GeDS algorithm implemented in GGeDS and in order to make it as fast as possible input data validation is mild. Hence it is expected that the user checks carefully the input parameters before using SplineReg_GLM. The "Residuals" in the output of this function are similar to the so called “working residuals" in the glm function. "Residuals" are the residuals $r_i$ used in the knot placement procedure, i.e.

$r_i= (y_i - \hat{\mu}_i){d \mu_i \over d \eta_i },$

but in contrast to glm “working residuals", they are computed using the final IRLS fitted $\hat{\mu}_i$ . "Residuals" are then used in locating the knots of the linear spline fit of Stage A.

In SplineReg_GLM confidence intervals are not computed.

Value

A list containing:

`Theta`	a vector containing the fitted coefficients of the spline regression component and the parametric component of the predictor model.
`Predicted`	a vector of $N$ predicted mean values of the response variable computed at the sample values of the covariate(s).
`Residuals`	a vector containing the normal regression residuals if `SplineReg_LM` is called or the residuals described in Details if `SplineReg_GLM` is called.
`RSS`	the deviance for the fitted predictor model, defined as in Dimitrova et al. (2023), which for `SplineReg_LM` coincides with the Residual Sum of Squares.
`NCI`	a list containing the lower (`Low`) and upper (`Upp`) limits of the approximate confidence intervals computed at the sample values of the covariate(s). See details above.
`Basis`	the matrix of B-spline regression functions and the covariates of the parametric part evaluated at the sample values of the covariate(s).
`Polygon`	a list containing x-y coordinates ("`Kn`" and "`Thetas`") of the vertices of the Control Polygon, see Dimitrova et al. (2023).
`deviance`	a vector containing deviances computed at each IRLS step (computed only with the `SplineReg_GLM`).
`temporary`	the result of the function `lm` if `SplineReg_LM` is used or the output of the function `IRLSfit` (which is similar to the output from `glm.fit`), if `SplineReg_GLM` is used.
`ACI`	a list containing the lower (`Low`) and upper (`Upp`) limits of the asymptotic confidence intervals computed at the sample values of the covariate(s).

References

Kaishev, V. K., Dimitrova, D. S., Haberman, S. & Verrall, R. J. (2006). Geometrically designed, variable know regression splines: asymptotics and inference (Statistical Research Paper No. 28). London, UK: Faculty of Actuarial Science & Insurance, City University London.
URL: openaccess.city.ac.uk

Functions used to fit GeDS objects with an univariate spline regression component

Description

These are computing engines called by NGeDS and GGeDS, needed for the underlying fitting procedures.

Usage

UnivariateFitter(
  X,
  Y,
  Z = NULL,
  offset = rep(0, NROW(Y)),
  weights = rep(1, length(X)),
  beta = 0.5,
  phi = 0.5,
  min.intknots = 0,
  max.intknots = 300,
  q = 2,
  extr = range(X),
  show.iters = FALSE,
  tol = as.double(1e-12),
  stoptype = c("SR", "RD", "LR"),
  higher_order = TRUE,
  intknots_init = NULL,
  fit_init = NULL,
  only_pred = FALSE
)

GenUnivariateFitter(
  X,
  Y,
  Z = NULL,
  offset = rep(0, NROW(Y)),
  weights = rep(1, length(X)),
  family = gaussian(),
  beta = 0.5,
  phi = 0.5,
  min.intknots = 0,
  max.intknots = 300,
  q = 2,
  extr = range(X),
  show.iters = F,
  tol = as.double(1e-12),
  stoptype = c("SR", "RD", "LR"),
  higher_order = TRUE
)
UnivariateFitter(
  X,
  Y,
  Z = NULL,
  offset = rep(0, NROW(Y)),
  weights = rep(1, length(X)),
  beta = 0.5,
  phi = 0.5,
  min.intknots = 0,
  max.intknots = 300,
  q = 2,
  extr = range(X),
  show.iters = FALSE,
  tol = as.double(1e-12),
  stoptype = c("SR", "RD", "LR"),
  higher_order = TRUE,
  intknots_init = NULL,
  fit_init = NULL,
  only_pred = FALSE
)

GenUnivariateFitter(
  X,
  Y,
  Z = NULL,
  offset = rep(0, NROW(Y)),
  weights = rep(1, length(X)),
  family = gaussian(),
  beta = 0.5,
  phi = 0.5,
  min.intknots = 0,
  max.intknots = 300,
  q = 2,
  extr = range(X),
  show.iters = F,
  tol = as.double(1e-12),
  stoptype = c("SR", "RD", "LR"),
  higher_order = TRUE
)

Arguments

`X`	a numeric vector containing $N$ sample values of the covariate chosen to enter the spline regression component of the predictor model.
`Y`	a vector of size $N$ containing the observed values of the response variable $y$ .
`Z`	a design matrix with $N$ rows containing other covariates selected to enter the parametric component of the predictor model (see `formula`). If no such covariates are selected, it is set to `NULL` by default.
`offset`	a vector of size $N$ that can be used to specify a fixed covariate to be included in the predictor model avoiding the estimation of its corresponding regression coefficient. In case more than one covariate is fixed, the user should sum the corresponding coordinates of the fixed covariates to produce one common $N$ -vector of coordinates. The `offset` argument is particularly useful when using `GenUnivariateFitter` if the link function used is not the identity.
`weights`	an optional vector of size $N$ of ‘prior weights’ to be put on the observations in the fitting process in case the user requires weighted GeDS fitting. It is `NULL` by default.
`beta`	numeric parameter in the interval $[0,1]$ tuning the knot placement in stage A of GeDS. See the description of `NGeDS` or `GGeDS`.
`phi`	numeric parameter in the interval $[0,1]$ specifying the threshold for the stopping rule (model selector) in stage A of GeDS. See also `stoptype` and details in the description of `NGeDS` or `GGeDS`.
`min.intknots`	optional parameter allowing the user to set a minimum number of internal knots required. By default equal to zero.
`max.intknots`	optional parameter allowing the user to set a maximum number of internal knots to be added by the GeDS estimation algorithm. By default equal to the number of internal knots $\kappa$ for the saturated GeDS model (i.e. $\kappa=N-2$ ).
`q`	numeric parameter which allows to fine-tune the stopping rule of stage A of GeDS, by default equal to 2. See details in the description of `NGeDS` or `GGeDS`.
`extr`	numeric vector of 2 elements representing the left-most and right-most limits of the interval embedding the sample values of `X`. By default equal correspondingly to the smallest and largest values of `X`.
`show.iters`	logical variable indicating whether or not to print information at each step. By default equal to `FALSE`.
`tol`	numeric value indicating the tolerance to be used in the knot placement steps in stage A. By default equal to 1E-12. See details below.
`stoptype`	a character string indicating the type of GeDS stopping rule to be used. It should be either `"SR"`, `"RD"` or `"LR"`, partial match allowed. See details of `NGeDS` or `GGeDS`.
`higher_order`	a logical that defines whether to compute the higher order fits (quadratic and cubic) after stage A is run. Default is `TRUE`.
`intknots_init`	vector of initial internal knots from which to start the GeDS Stage A iterations. See Section 3 of Kaishev et al. (2016). Default is `NULL`.
`fit_init`	A list containing fitted values `pred`, along with corresponding `intknots` and `coef`, representing the initial fit from which to begin Stage A GeDS iteration (i.e. departing from step 2).
`only_pred`	logical, if `TRUE` only predictions are computed.
`family`	a description of the error distribution and link function to be used in the model. This can be a character string naming a family function (e.g. `"gaussian"`), the family function itself (e.g. `gaussian`) or the result of a call to a family function (e.g. `gaussian()`). See family for details on family functions.

Details

The functions UnivariateFitter and GenUnivariateFitter are in general not intended to be used directly, they should be called through NGeDS and GGeDS. However, in case there is a need for multiple GeDS fitting (as may be the case e.g. in Monte Carlo simulations) it may be efficient to use the fitters outside the main functions.

The argument tol is used in the knot placement procedure of stage A of the GeDS algorithm in order to check whether the current knot $\delta^*$ is set at an acceptable location or not. If there exists a knot $\delta_i$ such that $|\delta^* - \delta_i| <$ tol, $\delta^*$ , then the new knot is considered to be coalescent with an existing one, it is discarded and the algorithm seeks alternative knot locations. By default it is equal to 1e-12.

See NGeDS and GGeDS, Kaishev et al. (2016) and Dimitrova et al. (2023) for further details.

Value

A GeDS-Class object, but without the Formula, extcall, terms and znames slots.

References

Examples

# Examples similar to the ones
# presented in NGeDS and in GGeDS

# Generate a data sample for the response variable
# Y and the covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N ,min = -2, max = 2))
# Specify a model for the mean of Y to include only
# a component non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)

# Fit a Normal GeDS regression model using the fitter function
(Gmod <- UnivariateFitter(X, Y, beta = 0.6, phi = 0.995,
           extr = c(-2,2)))

##############################################################
# second: very similar example, but based on Poisson data
set.seed(123)
X <- sort(runif(N , min = -2, max = 2))
means <- exp(f_1(X))
Y <- rpois(N,means)
(Gmod2 <- GenUnivariateFitter(X, Y, beta = 0.2,
            phi = 0.995, family = poisson(), extr = c(-2,2)))

# a plot showing quadratic and cubic fits,
# in the predictor scale
plot(X,log(Y), xlab = "x", ylab = expression(f[1](x)))
lines(Gmod2, n = 3, col = "red")
lines(Gmod2, n = 4, col = "blue", lty = 2)
legend("topleft", c("Quadratic", "Cubic"),
       col = c("red", "blue"),
       lty = c(1,2))

# Examples similar to the ones
# presented in NGeDS and in GGeDS

# Generate a data sample for the response variable
# Y and the covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N ,min = -2, max = 2))
# Specify a model for the mean of Y to include only
# a component non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.1)

# Fit a Normal GeDS regression model using the fitter function
(Gmod <- UnivariateFitter(X, Y, beta = 0.6, phi = 0.995,
           extr = c(-2,2)))

##############################################################
# second: very similar example, but based on Poisson data
set.seed(123)
X <- sort(runif(N , min = -2, max = 2))
means <- exp(f_1(X))
Y <- rpois(N,means)
(Gmod2 <- GenUnivariateFitter(X, Y, beta = 0.2,
            phi = 0.995, family = poisson(), extr = c(-2,2)))

# a plot showing quadratic and cubic fits,
# in the predictor scale
plot(X,log(Y), xlab = "x", ylab = expression(f[1](x)))
lines(Gmod2, n = 3, col = "red")
lines(Gmod2, n = 4, col = "blue", lty = 2)
legend("topleft", c("Quadratic", "Cubic"),
       col = c("red", "blue"),
       lty = c(1,2))

Visualize Boosting Iterations

Description

This function plots the NGeDSboost fit to the data at the beginning of a given boosting iteration and then plots the subsequent NGeDS fit on the corresponding negative gradient. Note: Applicable only for NGeDSboost models with a single univariate base-learner.

Usage

## S3 method for class 'GeDSboost'
visualize_boosting(object, iters = NULL, final_fits = FALSE)
## S3 method for class 'GeDSboost'
visualize_boosting(object, iters = NULL, final_fits = FALSE)

Arguments

`object`	a `GeDSboost-class` object.
`iters`	numeric, specifies the iteration(s) number.
`final_fits`	logical indicating whether the final linear, quadratic and cubic fits should be plotted.

Examples

# Load packages
library(GeDS)

# Generate a data sample for the response variable
# Y and the single covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.2)
data = data.frame(X, Y)
Gmodboost <- NGeDSboost(Y ~ f(X), data = data, normalize_data = TRUE)

# Plot
plot(X, Y, pch=20, col=c("darkgrey"))
lines(X, sapply(X, f_1), col = "black", lwd = 2)
lines(X, Gmodboost$predictions$pred_linear, col = "green4", lwd = 2)
lines(X, Gmodboost$predictions$pred_quadratic, col="red", lwd=2)
lines(X, Gmodboost$predictions$pred_cubic, col="purple", lwd=2)
legend("topright",
legend = c("Order 2 (degree=1)", "Order 3 (degree=2)", "Order 4 (degree=3)"),
col = c("green4", "red", "purple"),
lty = c(1, 1),
lwd = c(2, 2, 2),
cex = 0.75,
bty="n",
bg = "white")
# Visualize boosting iterations + final fits
par(mfrow=c(4,2))
visualize_boosting(Gmodboost, iters = 0:3, final_fits = TRUE)
par(mfrow=c(1,1))

# Load packages
library(GeDS)

# Generate a data sample for the response variable
# Y and the single covariate X
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- f_1(X)
# Add (Normal) noise to the mean of Y
Y <- rnorm(N, means, sd = 0.2)
data = data.frame(X, Y)
Gmodboost <- NGeDSboost(Y ~ f(X), data = data, normalize_data = TRUE)

# Plot
plot(X, Y, pch=20, col=c("darkgrey"))
lines(X, sapply(X, f_1), col = "black", lwd = 2)
lines(X, Gmodboost$predictions$pred_linear, col = "green4", lwd = 2)
lines(X, Gmodboost$predictions$pred_quadratic, col="red", lwd=2)
lines(X, Gmodboost$predictions$pred_cubic, col="purple", lwd=2)
legend("topright",
legend = c("Order 2 (degree=1)", "Order 3 (degree=2)", "Order 4 (degree=3)"),
col = c("green4", "red", "purple"),
lty = c(1, 1),
lwd = c(2, 2, 2),
cex = 0.75,
bty="n",
bg = "white")
# Visualize boosting iterations + final fits
par(mfrow=c(4,2))
visualize_boosting(Gmodboost, iters = 0:3, final_fits = TRUE)
par(mfrow=c(1,1))

Package 'GeDS'

Help Index

GeDS

Description

Details

Author(s)

References

See Also

Barium-Ferrum-Arsenide Powder Diffraction Data

Description

Usage

Format

Source

References

Examples

Fitter function for GeD Spline Regression for bivariate data

Description

Usage

Arguments

Value

References

See Also

Base Learner Importance for GeDSboost objects

Description

Usage

Arguments

Details

Value

References

Examples

Coal Mining Disasters data

Description

Usage

Format

Source

References

Coef method for GeDS objects

Description

Usage

Arguments

Details

Value

See Also

Examples

Coef method for GeDSboost, GeDSgam

Description

Usage

Arguments

Value

See Also

k-fold cross-validation

Description

Arguments

Value

Examples

Crystallographic Scattering Data

Description

Usage

Format

Source

Examples

Derivative of GeDS objects

Description

Usage

Arguments

Details

References

Examples

Deviance method for GeDS, GeDSboost, GeDSgam

Description

Usage

Arguments

Details

Value

See Also

Death counts in England and Wales

Description

Usage

Format

Defining the covariates for the spline component in a GeDS formula.