Title: | Bayesian Model Averaging |
---|---|
Description: | Package for Bayesian model averaging and variable selection for linear models, generalized linear models and survival models (cox regression). |
Authors: | Adrian Raftery [aut], Jennifer Hoeting [aut], Chris Volinsky [aut], Ian Painter [aut], Ka Yee Yeung [aut], Hana Sevcikova [cre] |
Maintainer: | Hana Sevcikova <[email protected]> |
License: | GPL (>= 2) |
Version: | 3.18.20 |
Built: | 2025-01-15 22:26:43 UTC |
Source: | https://github.com/hanase/bma |
Bayesian Model Averaging accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to approximate posterior model probability.
bic.glm(x, ...) ## S3 method for class 'matrix' bic.glm(x, y, glm.family, wt = rep(1, nrow(x)), strict = FALSE, prior.param = c(rep(0.5, ncol(x))), OR = 20, maxCol = 30, OR.fix = 2, nbest = 150, dispersion = NULL, factor.type = TRUE, factor.prior.adjust = FALSE, occam.window = TRUE, call = NULL, ...) ## S3 method for class 'data.frame' bic.glm(x, y, glm.family, wt = rep(1, nrow(x)), strict = FALSE, prior.param = c(rep(0.5, ncol(x))), OR = 20, maxCol = 30, OR.fix = 2, nbest = 150, dispersion = NULL, factor.type = TRUE, factor.prior.adjust = FALSE, occam.window = TRUE, call = NULL, ...) ## S3 method for class 'formula' bic.glm(f, data, glm.family, wt = rep(1, nrow(data)), strict = FALSE, prior.param = c(rep(0.5, ncol(x))), OR = 20, maxCol = 30, OR.fix = 2, nbest = 150, dispersion = NULL, factor.type = TRUE, factor.prior.adjust = FALSE, occam.window = TRUE, na.action = na.omit, ...)
bic.glm(x, ...) ## S3 method for class 'matrix' bic.glm(x, y, glm.family, wt = rep(1, nrow(x)), strict = FALSE, prior.param = c(rep(0.5, ncol(x))), OR = 20, maxCol = 30, OR.fix = 2, nbest = 150, dispersion = NULL, factor.type = TRUE, factor.prior.adjust = FALSE, occam.window = TRUE, call = NULL, ...) ## S3 method for class 'data.frame' bic.glm(x, y, glm.family, wt = rep(1, nrow(x)), strict = FALSE, prior.param = c(rep(0.5, ncol(x))), OR = 20, maxCol = 30, OR.fix = 2, nbest = 150, dispersion = NULL, factor.type = TRUE, factor.prior.adjust = FALSE, occam.window = TRUE, call = NULL, ...) ## S3 method for class 'formula' bic.glm(f, data, glm.family, wt = rep(1, nrow(data)), strict = FALSE, prior.param = c(rep(0.5, ncol(x))), OR = 20, maxCol = 30, OR.fix = 2, nbest = 150, dispersion = NULL, factor.type = TRUE, factor.prior.adjust = FALSE, occam.window = TRUE, na.action = na.omit, ...)
x |
a matrix or data.frame of independent variables. |
y |
a vector of values for the dependent variable. |
f |
a formula |
data |
a data frame containing the variables in the model. |
glm.family |
a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See 'family' for details of family functions.) |
wt |
an optional vector of weights to be used. |
strict |
a logical indicating whether models with more likely submodels are
eliminated. |
prior.param |
a vector of values specifying the prior weights for each variable. |
OR |
a number specifying the maximum ratio for excluding models in Occam's window |
maxCol |
a number specifying the maximum number of columns in design matrix (including intercept) to be kept. |
OR.fix |
width of the window which keeps models after the leaps approximation
is done.
Because the leaps and bounds gives only an approximation to BIC,
there is a need to increase the window at this first "cut" so as to
assure that no good models are deleted.
The level of this cut is at |
nbest |
a number specifying the number of models of each size returned to
|
dispersion |
a logical value specifying whether dispersion should be
estimated or not. Default is |
factor.type |
a logical value specifying how variables of class "factor" are
handled.
A factor variable with d levels is turned into (d-1) dummy variables using a
treatment contrast.
If |
factor.prior.adjust |
a logical value specifying whether
the prior distribution on dummy variables for factors
should be adjusted when |
occam.window |
a logical value specifying if Occam's window should be used.
If set to |
call |
used internally |
na.action |
a function which indicates what should happen when data contain |
... |
unused |
Bayesian Model Averaging accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to approximate posterior model probability.
bic.glm
returns an object of class bic.glm
The function summary
is used to print a summary of the results.
The function plot
is used to plot posterior distributions for the coefficients.
The function imageplot
generates an image of the models which were averaged over.
An object of class bic.glm
is a list containing at least the following components:
postprob |
the posterior probabilities of the models selected |
deviance |
the estimated model deviances |
label |
labels identifying the models selected |
bic |
values of BIC for the models |
size |
the number of independent variables in each of the models |
which |
a logical matrix with one row per model and one column per variable indicating whether that variable is in the model |
probne0 |
the posterior probability that each variable is non-zero (in percent) |
postmean |
the posterior mean of each coefficient (from model averaging) |
postsd |
the posterior standard deviation of each coefficient (from model averaging) |
condpostmean |
the posterior mean of each coefficient conditional on the variable being included in the model |
condpostsd |
the posterior standard deviation of each coefficient conditional on the variable being included in the model |
mle |
matrix with one row per model and one column per variable giving the maximum likelihood estimate of each coefficient for each model |
se |
matrix with one row per model and one column per variable giving the standard error of each coefficient for each model |
reduced |
a logical indicating whether any variables were dropped before model averaging |
dropped |
a vector containing the names of those variables dropped before model averaging |
call |
the matched call that created the bma.lm object |
If more than maxcol
variables are supplied, then bic.glm does stepwise
elimination of variables until maxcol
variables are reached.
bic.glm
handles factor variables according to the factor.type
parameter. If this is true then factor variables are kept in the model or dropped in
entirety. If false, then each dummy variable can be kept or dropped independently.
If bic.glm
is used with a formula that includes interactions between factor
variables, then bic.glm
will create a new factor variable to represent that
interaction, and this factor variable will be kept or dropped in entirety if
factor.type
is true.
This can create interpretation problems if any of the corresponding main effects are
dropped.
Many thanks to Sanford Weisberg for making source code for leaps available.
Chris Volinsky [email protected], Adrian Raftery [email protected], and Ian Painter [email protected]
Raftery, Adrian E. (1995). Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111-196, Cambridge, Mass.: Blackwells.
An earlier version, issued as Working Paper 94-12, Center for Studies in Demography and Ecology, University of Washington (1994) is available as a technical report from the Department of Statistics, University of Washington.
summary.bic.glm
,
print.bic.glm
,
plot.bic.glm
## Not run: ### logistic regression library("MASS") data(birthwt) y<- birthwt$lo x<- data.frame(birthwt[,-1]) x$race<- as.factor(x$race) x$ht<- (x$ht>=1)+0 x<- x[,-9] x$smoke <- as.factor(x$smoke) x$ptl<- as.factor(x$ptl) x$ht <- as.factor(x$ht) x$ui <- as.factor(x$ui) glm.out.FT <- bic.glm(x, y, strict = FALSE, OR = 20, glm.family="binomial", factor.type=TRUE) summary(glm.out.FT) imageplot.bma(glm.out.FT) glm.out.FF <- bic.glm(x, y, strict = FALSE, OR = 20, glm.family="binomial", factor.type=FALSE) summary(glm.out.FF) imageplot.bma(glm.out.FF) glm.out.TT <- bic.glm(x, y, strict = TRUE, OR = 20, glm.family="binomial", factor.type=TRUE) summary(glm.out.TT) imageplot.bma(glm.out.TT) glm.out.TF <- bic.glm(x, y, strict = TRUE, OR = 20, glm.family="binomial", factor.type=FALSE) summary(glm.out.TF) imageplot.bma(glm.out.TF) ## End(Not run) ## Not run: ### Gamma family library(survival) data(cancer) surv.t<- veteran$time x<- veteran[,-c(3,4)] x$celltype<- factor(as.character(x$celltype)) sel<- veteran$status == 0 x<- x[!sel,] surv.t<- surv.t[!sel] glm.out.va <- bic.glm(x, y=surv.t, glm.family=Gamma(link="inverse"), factor.type=FALSE) summary(glm.out.va) imageplot.bma(glm.out.va) plot(glm.out.va) ## End(Not run) ### Poisson family ### Yates (teeth) data. x<- rbind( c(0, 0, 0), c(0, 1, 0), c(1, 0, 0), c(1, 1, 1)) y<-c(4, 16, 1, 21) n<-c(1,1,1,1) models<- rbind( c(1, 1, 0), c(1, 1, 1)) glm.out.yates <- bic.glm( x, y, n, glm.family = poisson(), factor.type=FALSE) summary(glm.out.yates) ## Not run: ### Gaussian library(MASS) data(UScrime) f <- formula(log(y) ~ log(M)+So+log(Ed)+log(Po1)+log(Po2)+log(LF)+ log(M.F)+ log(Pop)+log(NW)+log(U1)+log(U2)+ log(GDP)+log(Ineq)+log(Prob)+log(Time)) glm.out.crime <- bic.glm(f, data = UScrime, glm.family = gaussian()) summary(glm.out.crime) # note the problems with the estimation of the posterior standard # deviation (compare with bicreg example) ## End(Not run)
## Not run: ### logistic regression library("MASS") data(birthwt) y<- birthwt$lo x<- data.frame(birthwt[,-1]) x$race<- as.factor(x$race) x$ht<- (x$ht>=1)+0 x<- x[,-9] x$smoke <- as.factor(x$smoke) x$ptl<- as.factor(x$ptl) x$ht <- as.factor(x$ht) x$ui <- as.factor(x$ui) glm.out.FT <- bic.glm(x, y, strict = FALSE, OR = 20, glm.family="binomial", factor.type=TRUE) summary(glm.out.FT) imageplot.bma(glm.out.FT) glm.out.FF <- bic.glm(x, y, strict = FALSE, OR = 20, glm.family="binomial", factor.type=FALSE) summary(glm.out.FF) imageplot.bma(glm.out.FF) glm.out.TT <- bic.glm(x, y, strict = TRUE, OR = 20, glm.family="binomial", factor.type=TRUE) summary(glm.out.TT) imageplot.bma(glm.out.TT) glm.out.TF <- bic.glm(x, y, strict = TRUE, OR = 20, glm.family="binomial", factor.type=FALSE) summary(glm.out.TF) imageplot.bma(glm.out.TF) ## End(Not run) ## Not run: ### Gamma family library(survival) data(cancer) surv.t<- veteran$time x<- veteran[,-c(3,4)] x$celltype<- factor(as.character(x$celltype)) sel<- veteran$status == 0 x<- x[!sel,] surv.t<- surv.t[!sel] glm.out.va <- bic.glm(x, y=surv.t, glm.family=Gamma(link="inverse"), factor.type=FALSE) summary(glm.out.va) imageplot.bma(glm.out.va) plot(glm.out.va) ## End(Not run) ### Poisson family ### Yates (teeth) data. x<- rbind( c(0, 0, 0), c(0, 1, 0), c(1, 0, 0), c(1, 1, 1)) y<-c(4, 16, 1, 21) n<-c(1,1,1,1) models<- rbind( c(1, 1, 0), c(1, 1, 1)) glm.out.yates <- bic.glm( x, y, n, glm.family = poisson(), factor.type=FALSE) summary(glm.out.yates) ## Not run: ### Gaussian library(MASS) data(UScrime) f <- formula(log(y) ~ log(M)+So+log(Ed)+log(Po1)+log(Po2)+log(LF)+ log(M.F)+ log(Pop)+log(NW)+log(U1)+log(U2)+ log(GDP)+log(Ineq)+log(Prob)+log(Time)) glm.out.crime <- bic.glm(f, data = UScrime, glm.family = gaussian()) summary(glm.out.crime) # note the problems with the estimation of the posterior standard # deviation (compare with bicreg example) ## End(Not run)
Bayesian Model Averaging for Cox proportional hazards models for censored survival data. This accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to approximate posterior model probability.
bic.surv(x, ...) ## S3 method for class 'matrix' bic.surv(x, surv.t, cens, strict = FALSE, OR = 20, maxCol = 30, prior.param = c(rep(0.5, ncol(x))), OR.fix = 2, nbest = 150, factor.type = TRUE, factor.prior.adjust = FALSE, call = NULL, ...) ## S3 method for class 'data.frame' bic.surv(x, surv.t, cens, strict = FALSE, OR = 20, maxCol = 30, prior.param = c(rep(0.5, ncol(x))), OR.fix = 2, nbest = 150, factor.type = TRUE, factor.prior.adjust = FALSE, call = NULL, ...) ## S3 method for class 'formula' bic.surv(f, data, strict = FALSE, OR = 20, maxCol = 30, prior.param = c(rep(0.5, ncol(x))), OR.fix = 2, nbest = 150, factor.type = TRUE, factor.prior.adjust = FALSE, call = NULL, ...)
bic.surv(x, ...) ## S3 method for class 'matrix' bic.surv(x, surv.t, cens, strict = FALSE, OR = 20, maxCol = 30, prior.param = c(rep(0.5, ncol(x))), OR.fix = 2, nbest = 150, factor.type = TRUE, factor.prior.adjust = FALSE, call = NULL, ...) ## S3 method for class 'data.frame' bic.surv(x, surv.t, cens, strict = FALSE, OR = 20, maxCol = 30, prior.param = c(rep(0.5, ncol(x))), OR.fix = 2, nbest = 150, factor.type = TRUE, factor.prior.adjust = FALSE, call = NULL, ...) ## S3 method for class 'formula' bic.surv(f, data, strict = FALSE, OR = 20, maxCol = 30, prior.param = c(rep(0.5, ncol(x))), OR.fix = 2, nbest = 150, factor.type = TRUE, factor.prior.adjust = FALSE, call = NULL, ...)
x |
a matrix or data frame of independent variables. |
surv.t |
a vector of values for the dependent variable. |
cens |
a vector of indicators of censoring (0=censored 1=uncensored) |
f |
a survival model formula |
data |
a data frame containing the variables in the model. |
strict |
logical indicating whether models with more likely submodels are eliminated.
|
OR |
a number specifying the maximum ratio for excluding models in Occam's window |
maxCol |
a number specifying the maximum number of columns in design matrix (including intercept) to be kept. |
prior.param |
a vector of prior probabilities that parameters are non-zero. Default puts a prior of .5 on all parameters. Setting to 1 forces the variable into the model. |
OR.fix |
width of the window which keeps models after the leaps approximation is done.
Because the leaps and bounds gives only an approximation to BIC, there is a need to increase the window at
this first "cut" so as to ensure that no good models are deleted.
The level of this cut is at |
nbest |
a value specifying the number of models of each size returned to bic.glm by the modified leaps algorithm. |
factor.type |
a logical value specifying how variables of class "factor" are handled.
A factor variable with d levels is turned into (d-1) dummy variables using a treatment contrast.
If |
factor.prior.adjust |
a logical value specifying if the prior distribution on
dummy variables for factors should be adjusted when |
call |
used internally |
... |
unused |
Bayesian Model Averaging accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to approximate posterior model probability.
bic.surv
averages of Cox regression models.
bic.surv
returns an object of class bic.surv
The function summary
is used to print a summary of the results.
The function plot
is used to plot posterior distributions for the
coefficients.
The function imageplot
generates an image of the models which were
averaged over.
An object of class bic.glm
is a list containing at least the following
components:
postprob |
the posterior probabilities of the models selected |
label |
labels identifying the models selected |
bic |
values of BIC for the models |
size |
the number of independent variables in each of the models |
which |
a logical matrix with one row per model and one column per variable indicating whether that variable is in the model |
probne0 |
the posterior probability that each variable is non-zero (in percent) |
postmean |
the posterior mean of each coefficient (from model averaging) |
postsd |
the posterior standard deviation of each coefficient (from model averaging) |
condpostmean |
the posterior mean of each coefficient conditional on the variable being included in the model |
condpostsd |
the posterior standard deviation of each coefficient conditional on the variable being included in the model |
mle |
matrix with one row per model and one column per variable giving the maximum likelihood estimate of each coefficient for each model |
se |
matrix with one row per model and one column per variable giving the standard error of each coefficient for each model |
reduced |
a logical indicating whether any variables were dropped before model averaging |
dropped |
a vector containing the names of those variables dropped before model averaging |
call |
the matched call that created the bma.lm object |
If more than maxcol
variables are supplied, then bic.surv does
stepwise elimination of variables until maxcol
variables are reached.
Many thanks to Sanford Weisberg for making source code for leaps available.
Chris Volinsky [email protected]; Adrian Raftery [email protected]; Ian Painter [email protected]
Volinsky, C.T., Madigan, D., Raftery, A.E. and Kronmal, R.A. (1997). "Bayesian Model Averaging in Proportional Hazard Models: Assessing the Risk of a Stroke." Applied Statistics 46: 433-448
summary.bic.surv
,
print.bic.surv
,
plot.bic.surv
## Not run: ## veteran data library(survival) data(cancer) test.bic.surv<- bic.surv(Surv(time,status) ~ ., data = veteran, factor.type = TRUE) summary(test.bic.surv, conditional=FALSE, digits=2) plot(test.bic.surv) imageplot.bma(test.bic.surv) ## End(Not run) ## pbc data x<- pbc[1:312,] surv.t<- x$time cens<- as.numeric((x$status == 2)) x<- x[,c("age", "albumin", "alk.phos", "ascites", "bili", "edema", "hepato", "platelet", "protime", "sex", "ast", "spiders", "stage", "trt", "copper")] ## Not run: x$bili<- log(x$bili) x$alb<- log(x$alb) x$protime<- log(x$protime) x$copper<- log(x$copper) x$ast<- log(x$ast) test.bic.surv<- bic.surv(x, surv.t, cens, factor.type=FALSE, strict=FALSE) summary(test.bic.surv) ## End(Not run)
## Not run: ## veteran data library(survival) data(cancer) test.bic.surv<- bic.surv(Surv(time,status) ~ ., data = veteran, factor.type = TRUE) summary(test.bic.surv, conditional=FALSE, digits=2) plot(test.bic.surv) imageplot.bma(test.bic.surv) ## End(Not run) ## pbc data x<- pbc[1:312,] surv.t<- x$time cens<- as.numeric((x$status == 2)) x<- x[,c("age", "albumin", "alk.phos", "ascites", "bili", "edema", "hepato", "platelet", "protime", "sex", "ast", "spiders", "stage", "trt", "copper")] ## Not run: x$bili<- log(x$bili) x$alb<- log(x$alb) x$protime<- log(x$protime) x$copper<- log(x$copper) x$ast<- log(x$ast) test.bic.surv<- bic.surv(x, surv.t, cens, factor.type=FALSE, strict=FALSE) summary(test.bic.surv) ## End(Not run)
Bayesian Model Averaging accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to approximate posterior model probability.
bicreg(x, y, wt = rep(1, length(y)), strict = FALSE, OR = 20, maxCol = 31, drop.factor.levels = TRUE, nbest = 150)
bicreg(x, y, wt = rep(1, length(y)), strict = FALSE, OR = 20, maxCol = 31, drop.factor.levels = TRUE, nbest = 150)
x |
a matrix of independent variables |
y |
a vector of values for the dependent variable |
wt |
a vector of weights for regression |
strict |
logical. FALSE returns all models whose posterior model probability is within a factor of 1/OR of that of the best model. TRUE returns a more parsimonious set of models, where any model with a more likely submodel is eliminated. |
OR |
a number specifying the maximum ratio for excluding models in Occam's window |
maxCol |
a number specifying the maximum number of columns in the design matrix (including the intercept) to be kept. |
drop.factor.levels |
logical. Indicates whether factor levels can be individually dropped in the stepwise procedure to reduce the number of columns in the design matrix, or if a factor can be dropped only in its entirety. |
nbest |
a value specifying the number of models of each size returned to bic.glm by the leaps algorithm. The default is 150 (replacing the original default of 10). |
Bayesian Model Averaging accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to the approximate posterior model probabilities.
bicreg
returns an object of class bicreg
The function 'summary' is used to print a summary of the results. The function 'plot' is used to plot posterior distributions for the coefficients.
An object of class bicreg
is a list containing at least the following components:
postprob |
the posterior probabilities of the models selected |
namesx |
the names of the variables |
label |
labels identifying the models selected |
r2 |
R2 values for the models |
bic |
values of BIC for the models |
size |
the number of independent variables in each of the models |
which |
a logical matrix with one row per model and one column per variable indicating whether that variable is in the model |
probne0 |
the posterior probability that each variable is non-zero (in percent) |
postmean |
the posterior mean of each coefficient (from model averaging) |
postsd |
the posterior standard deviation of each coefficient (from model averaging) |
condpostmean |
the posterior mean of each coefficient conditional on the variable being included in the model |
condpostsd |
the posterior standard deviation of each coefficient conditional on the variable being included in the model |
ols |
matrix with one row per model and one column per variable giving the OLS estimate of each coefficient for each model |
se |
matrix with one row per model and one column per variable giving the standard error of each coefficient for each model |
reduced |
a logical indicating whether any variables were dropped before model averaging |
dropped |
a vector containing the names of those variables dropped before model averaging |
residvar |
residual variance for each model |
call |
the matched call that created the bicreg object |
Original Splus code developed by Adrian Raftery ([email protected]) and revised by Chris T. Volinsky. Translation to R by Ian Painter.
Raftery, Adrian E. (1995). Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111-196, Cambridge, Mass.: Blackwells.
summary.bicreg
, print.bicreg
, plot.bicreg
library(MASS) data(UScrime) x<- UScrime[,-16] y<- log(UScrime[,16]) x[,-2]<- log(x[,-2]) lma<- bicreg(x, y, strict = FALSE, OR = 20) summary(lma) plot(lma) imageplot.bma(lma)
library(MASS) data(UScrime) x<- UScrime[,-16] y<- log(UScrime[,16]) x[,-2]<- log(x[,-2]) lma<- bicreg(x, y, strict = FALSE, OR = 20) summary(lma) plot(lma) imageplot.bma(lma)
Helper function for MC3.REG which implements each step of the Metropolis-Hastings algorithm.
For.MC3.REG(i, g, Ys, Xs, PI, K, nu, lambda, phi, outs.list)
For.MC3.REG(i, g, Ys, Xs, PI, K, nu, lambda, phi, outs.list)
i |
the current iteration number. |
g |
a list containing the current state and the history of the Markov-Chain. This list is in the same form as the return value (see the 'value' section below):
|
Ys |
the vector of scaled responses. |
Xs |
the matrix of scaled covariates. |
PI |
a hyperparameter indicating the prior probability of an outlier. The default values are 0.1 if the data set has less than 50 observations, 0.02 otherwise. |
K |
a hyperparameter indicating the outlier inflation factor |
nu |
regression hyperparameter. Default value is 2.58 if r2 for the full model is less than 0.9 or 0.2 if r2 for the full model is greater than 0.9. |
lambda |
regression hyperparameter. Default value is 0.28 if r2 for the full model is less than 0.9 or 0.1684 if r2 for the full model is greater than 0.9. |
phi |
regression hyperparameter. Default value is 2.85 if r2 for the full model is less than 0.9 or 9.2 if r2 for the full model is greater than 0.9. |
outs.list |
a vector of all potential outlier locations
(e.g. |
This function implements a single Metropolis-Hastings step, choosing a proposal model, calculating the Bayes Factor between the current model and proposal model, and updating the current model to the proposal model if the step results in an update.
a list containing the current state and the history of the Markov-Chain, with components
flag |
a 0/1 number specifying whether the previous Metropolis-Hastings step resulted in a changed state or not. |
big.list |
a matrix containing the history of the Markov-Chain. Each row represents a unique model (combination of variables and outliers). The first column is the set of variables in the model (in binary form), the second column is the set of outliers in the model (in binary form), the third column is the log-posterior for the model (up to a constant) and the fourth column is the number of times that model has been visited. |
M0.var |
a logical vector specifying the variables in the current model. |
M0.out |
a logical vector specifying the outliers in the current model. |
M0.1 |
a number representing the variables in the current model in binary form. |
M0.2 |
a number represnting the outliers in the current model in binary form. |
outcnt |
the number of potential outliers |
The implementation here differs from the Splus implentation. The Splus implementation uses global variables to contain the state of the current model and the history of the Markov-Chain. This implentation passes the current state and history to the function and then returns the updated state.
Jennifer Hoeting [email protected] with the assistance of Gary Gadbury. Translation from Splus to R by Ian Painter [email protected].
Bayesian Model Averaging for Linear Regression Models Adrian E. Raftery, David Madigan, and Jennifer A. Hoeting (1997). Journal of the American Statistical Association, 92, 179-191.
A Method for Simultaneous Variable and Transformation Selection in Linear Regression Jennifer Hoeting, Adrian E. Raftery and David Madigan (2002). Journal of Computational and Graphical Statistics 11 (485-507)
A Method for Simultaneous Variable Selection and Outlier Identification in Linear Regression Jennifer Hoeting, Adrian E. Raftery and David Madigan (1996). Computational Statistics and Data Analysis, 22, 251-270
Earlier versions of these papers are available via the World Wide Web using the url: https://www.stat.colostate.edu/~jah/papers/
MC3.REG
, MC3.REG.choose
, MC3.REG.logpost
Function to evaluate Bayes factors and account for model uncertainty in generalized linear models.
glib(x, ...) ## S3 method for class 'matrix' glib(x, y, n = rep(1, nrow(x)), error = "poisson", link = "log", scale = 1, models = NULL, phi = c(1, 1.65, 5), psi = 1, nu = 0, pmw = rep(1, nrow(models)), glimest = TRUE, glimvar = FALSE, output.priorvar = FALSE, post.bymodel = TRUE, output.postvar = FALSE, priormean = NULL, priorvar = NULL, nbest = 150, call = NULL, ...) ## S3 method for class 'data.frame' glib(x, y, n = rep(1, nrow(x)), error = "poisson", link = "log", scale = 1, models = NULL, phi = c(1, 1.65, 5), psi = 1, nu = 0, pmw = rep(1, nrow(models)), glimest = TRUE, glimvar = FALSE, output.priorvar = FALSE, post.bymodel = TRUE, output.postvar = FALSE, priormean = NULL, priorvar = NULL, nbest = 150, call = NULL, ...) ## S3 method for class 'bic.glm' glib(x, scale = 1, phi = 1, psi = 1, nu = 0, glimest = TRUE, glimvar = FALSE, output.priorvar = FALSE, post.bymodel = TRUE, output.postvar = FALSE, priormean = NULL, priorvar = NULL, call = NULL, ...) as.bic.glm(g, ...) ## S3 method for class 'glib' as.bic.glm( g, index.phi=1, ...)
glib(x, ...) ## S3 method for class 'matrix' glib(x, y, n = rep(1, nrow(x)), error = "poisson", link = "log", scale = 1, models = NULL, phi = c(1, 1.65, 5), psi = 1, nu = 0, pmw = rep(1, nrow(models)), glimest = TRUE, glimvar = FALSE, output.priorvar = FALSE, post.bymodel = TRUE, output.postvar = FALSE, priormean = NULL, priorvar = NULL, nbest = 150, call = NULL, ...) ## S3 method for class 'data.frame' glib(x, y, n = rep(1, nrow(x)), error = "poisson", link = "log", scale = 1, models = NULL, phi = c(1, 1.65, 5), psi = 1, nu = 0, pmw = rep(1, nrow(models)), glimest = TRUE, glimvar = FALSE, output.priorvar = FALSE, post.bymodel = TRUE, output.postvar = FALSE, priormean = NULL, priorvar = NULL, nbest = 150, call = NULL, ...) ## S3 method for class 'bic.glm' glib(x, scale = 1, phi = 1, psi = 1, nu = 0, glimest = TRUE, glimvar = FALSE, output.priorvar = FALSE, post.bymodel = TRUE, output.postvar = FALSE, priormean = NULL, priorvar = NULL, call = NULL, ...) as.bic.glm(g, ...) ## S3 method for class 'glib' as.bic.glm( g, index.phi=1, ...)
x |
an |
g |
an object of type |
y |
a vector of values for the dependent variable |
n |
an optional vector of weights to be used. |
error |
a string indicating the error family to use. Currently "gaussian", "gamma", "inverse gaussian", "binomial" and "poisson" are implemented. |
link |
a string indicating the link to use. Currently "identity", "log", "logit", "probit", "sqrt", "inverse" and "loglog" are implemented. |
scale |
the scale factor for the model. May be either a numeric constant or a string specifying the estimation, either "deviance" or "pearson". The default value is 1 for "binomial" and "poisson" error structures, and "pearson" for the others. |
models |
an optional matrix representing the models to be averaged over.
|
phi |
a vector of phi values. Default: |
psi |
a scalar prior parameter. Default: |
nu |
a scalar prior parameter. Default: 0 |
pmw |
a vector of prior model weights. These must be positive, but do not have to sum to one.
The prior model probabilities are given by |
glimest |
a logical value specifying whether to output estimates and standard errors for each model. |
glimvar |
a logical value specifying whether glim variance matrices are output for each model. |
output.priorvar |
a logical value specifying whether the prior variance is output for each model and value of phi combination. |
post.bymodel |
a logical value specifying whether to output the posterior mean and sd for each model and value of phi combination. |
output.postvar |
a logical value specifying whether to output the posterior variance matrix for each model and value of phi combination. |
priormean |
an optional vector of length p+1 containing a user specified prior mean on the variables (including the intercept), where p=number of independent variables. |
priorvar |
an optional matrix containing a user specified prior variance matrix, a (p+1) x (p+1) matrix. Default has the prior variance estimated as in Raftery(1996). |
nbest |
an integer giving the number of best models of each size to be returned by bic.glm if |
call |
the call to the function |
index.phi |
an index to the value of phi to use when converting a |
... |
unused |
Function to evaluate Bayes factors and account for model
uncertainty in generalized linear models.
This also calculates posterior distributions from a set of reference
proper priors.
as.bic.glm
creates a 'bic.glm' object from a 'glib' object.
glib
returns an object of type glib
, which is a list
containing the following items:
inputs |
a list echoing the inputs (x,y,n,error,link,models,phi,psi,nu) |
bf |
a list containing the model comparison results:
|
posterior |
a list containing the Bayesian model mixing results:
|
glim.est |
a list containing the GLIM estimates for the different models:
|
posterior.bymodel |
a list containing model-specific posterior means and sds:
|
prior |
a list containing the prior distributions:
|
models |
an array containing the models used. |
glm.out |
an object of type 'bic.glm' containing the results of
any call to |
call |
the call to the function |
The outputs controlled by glimvar, output.priorvar and output.postvar can take up a lot of space, which is why these control parameters are F by default.
Original Splus code developed by Adrian Raftery [email protected] and revised by Chris T. Volinsky. Translation to R by Ian S. Painter.
Raftery, A.E. (1988). Approximate Bayes factors for generalized linear models. Technical Report no. 121, Department of Statistics, University of Washington.
Raftery, Adrian E. (1995). Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111-196, Cambridge, Mass.: Blackwells.
Raftery, A.E. (1996). Approximate Bayes factors and accounting for model uncertainty in generalized linear models. Biometrika (83: 251-266).
## Not run: ### Finney data data(vaso) x<- vaso[,1:2] y<- vaso[,3] n<- rep(1,times=length(y)) finney.models<- rbind( c(1, 0), c(0, 1), c(1, 1)) finney.glib <- glib (x,y,n, error="binomial", link="logit", models=finney.models, glimvar=TRUE, output.priorvar=TRUE, output.postvar=TRUE) summary(finney.glib) finney.bic.glm<- as.bic.glm(finney.glib) plot(finney.bic.glm,mfrow=c(2,1)) ## End(Not run) ### Yates (teeth) data. x<- rbind( c(0, 0, 0), c(0, 1, 0), c(1, 0, 0), c(1, 1, 1)) y<-c(4, 16, 1, 21) n<-c(1,1,1,1) models<- rbind( c(1, 1, 0), c(1, 1, 1)) glib.yates <- glib ( x, y, n, models=models, glimvar=TRUE, output.priorvar=TRUE, output.postvar=TRUE) summary(glib.yates) ## Not run: ### logistic regression with no models specified library("MASS") data(birthwt) y<- birthwt$lo x<- data.frame(birthwt[,-1]) x$race<- as.factor(x$race) x$ht<- (x$ht>=1)+0 x<- x[,-9] x$smoke <- as.factor(x$smoke) x$ptl<- as.factor(x$ptl) x$ht <- as.factor(x$ht) x$ui <- as.factor(x$ui) glib.birthwt<- glib(x,y, error="binomial", link = "logit") summary(glib.birthwt) glm.birthwt<- as.bic.glm(glib.birthwt) imageplot.bma(glm.birthwt) plot(glm.birthwt) ## End(Not run)
## Not run: ### Finney data data(vaso) x<- vaso[,1:2] y<- vaso[,3] n<- rep(1,times=length(y)) finney.models<- rbind( c(1, 0), c(0, 1), c(1, 1)) finney.glib <- glib (x,y,n, error="binomial", link="logit", models=finney.models, glimvar=TRUE, output.priorvar=TRUE, output.postvar=TRUE) summary(finney.glib) finney.bic.glm<- as.bic.glm(finney.glib) plot(finney.bic.glm,mfrow=c(2,1)) ## End(Not run) ### Yates (teeth) data. x<- rbind( c(0, 0, 0), c(0, 1, 0), c(1, 0, 0), c(1, 1, 1)) y<-c(4, 16, 1, 21) n<-c(1,1,1,1) models<- rbind( c(1, 1, 0), c(1, 1, 1)) glib.yates <- glib ( x, y, n, models=models, glimvar=TRUE, output.priorvar=TRUE, output.postvar=TRUE) summary(glib.yates) ## Not run: ### logistic regression with no models specified library("MASS") data(birthwt) y<- birthwt$lo x<- data.frame(birthwt[,-1]) x$race<- as.factor(x$race) x$ht<- (x$ht>=1)+0 x<- x[,-9] x$smoke <- as.factor(x$smoke) x$ptl<- as.factor(x$ptl) x$ht <- as.factor(x$ht) x$ui <- as.factor(x$ui) glib.birthwt<- glib(x,y, error="binomial", link = "logit") summary(glib.birthwt) glm.birthwt<- as.bic.glm(glib.birthwt) imageplot.bma(glm.birthwt) plot(glm.birthwt) ## End(Not run)
This function implements the iterated Bayesian Model Averaging method for variable selection. This method works by making repeated calls to a Bayesian model averaging procedure, iterating through the variables in a fixed order. After each call to the Bayesian model averaging procedure only those variables which have posterior probability greater than a specified threshold are retained, those variables whose posterior probabilities do not meet the threshold are replaced with the next set of variables. The order in which the variables are to be considered is usually determined on the basis of the some measure of goodness of fit calculated univariately for each variable.
iBMA.glm(x, ...) iBMA.bicreg(x, ...) iBMA.surv(x, ...) ## S3 method for class 'matrix' iBMA.glm(x, Y, wt = rep(1, nrow(X)), thresProbne0 = 5, glm.family, maxNvar = 30, nIter = 100, verbose = FALSE, sorted = FALSE, factor.type = TRUE, ...) ## S3 method for class 'matrix' iBMA.glm(x, Y, wt = rep(1, nrow(X)), thresProbne0 = 5, glm.family, maxNvar = 30, nIter = 100, verbose = FALSE, sorted = FALSE, factor.type = TRUE, ...) ## S3 method for class 'iBMA.intermediate.glm' iBMA.glm(x, nIter = NULL, verbose = NULL, ...) ## S3 method for class 'matrix' iBMA.bicreg(x, Y, wt = rep(1, nrow(X)), thresProbne0 = 5, maxNvar = 30, nIter = 100, verbose = FALSE, sorted = FALSE, ...) ## S3 method for class 'data.frame' iBMA.bicreg(x, Y, wt = rep(1, nrow(X)), thresProbne0 = 5, maxNvar = 30, nIter = 100, verbose = FALSE, sorted = FALSE, ...) ## S3 method for class 'iBMA.intermediate.bicreg' iBMA.bicreg(x, nIter = NULL, verbose = NULL, ...) ## S3 method for class 'matrix' iBMA.surv(x, surv.t, cens, wt = rep(1, nrow(X)), thresProbne0 = 5, maxNvar = 30, nIter = 100, verbose = FALSE, sorted = FALSE, factor.type = TRUE, ...) ## S3 method for class 'data.frame' iBMA.surv(x, surv.t, cens, wt = rep(1, nrow(X)), thresProbne0 = 5, maxNvar = 30, nIter = 100, verbose = FALSE, sorted = FALSE, factor.type = TRUE, ...) ## S3 method for class 'iBMA.intermediate.surv' iBMA.surv(x, nIter = NULL,verbose = NULL, ...)
iBMA.glm(x, ...) iBMA.bicreg(x, ...) iBMA.surv(x, ...) ## S3 method for class 'matrix' iBMA.glm(x, Y, wt = rep(1, nrow(X)), thresProbne0 = 5, glm.family, maxNvar = 30, nIter = 100, verbose = FALSE, sorted = FALSE, factor.type = TRUE, ...) ## S3 method for class 'matrix' iBMA.glm(x, Y, wt = rep(1, nrow(X)), thresProbne0 = 5, glm.family, maxNvar = 30, nIter = 100, verbose = FALSE, sorted = FALSE, factor.type = TRUE, ...) ## S3 method for class 'iBMA.intermediate.glm' iBMA.glm(x, nIter = NULL, verbose = NULL, ...) ## S3 method for class 'matrix' iBMA.bicreg(x, Y, wt = rep(1, nrow(X)), thresProbne0 = 5, maxNvar = 30, nIter = 100, verbose = FALSE, sorted = FALSE, ...) ## S3 method for class 'data.frame' iBMA.bicreg(x, Y, wt = rep(1, nrow(X)), thresProbne0 = 5, maxNvar = 30, nIter = 100, verbose = FALSE, sorted = FALSE, ...) ## S3 method for class 'iBMA.intermediate.bicreg' iBMA.bicreg(x, nIter = NULL, verbose = NULL, ...) ## S3 method for class 'matrix' iBMA.surv(x, surv.t, cens, wt = rep(1, nrow(X)), thresProbne0 = 5, maxNvar = 30, nIter = 100, verbose = FALSE, sorted = FALSE, factor.type = TRUE, ...) ## S3 method for class 'data.frame' iBMA.surv(x, surv.t, cens, wt = rep(1, nrow(X)), thresProbne0 = 5, maxNvar = 30, nIter = 100, verbose = FALSE, sorted = FALSE, factor.type = TRUE, ...) ## S3 method for class 'iBMA.intermediate.surv' iBMA.surv(x, nIter = NULL,verbose = NULL, ...)
x |
a matrix or data.frame of independent variables,
or else an object of class |
Y |
a vector of values for the dependent variable. |
surv.t |
a vector of survival times. |
cens |
a vector of indicators of censoring (0=censored 1=uncensored) |
wt |
an optional vector of weights to be used. |
thresProbne0 |
a number giving the probability threshold for including variables as a percent. |
glm.family |
glm family. |
maxNvar |
a number giving the maximum number of variables to be considered in a model. |
nIter |
a number giving the maximum number of iterations that should be run. |
verbose |
a logical value specifying if verbose output should be produced or not |
sorted |
a logical value specifying if the variables have been sorted or not. If |
factor.type |
a logical value specifying how variables of class "factor" are handled. A factor variable with d levels is turned into (d-1) dummy variables using a treatment contrast. If 'factor.type = TRUE', models will contain either all or none of these dummy variables. If 'factor.type = FALSE', models are free to select the dummy variables independently. In this case, factor.prior.adjust determines the prior on these variables. |
... |
other parameters to be passed to |
These methods can be run in a 'batch' mode by setting nIter
to be larger than the number of variables.
Alternatively, if nIter
is set to be small, the procedure may return before all of the variables have been examined.
In this case the returned result of the call will be of class 'iBMA.X.intermediate', and if iBMA.X is called with this result as the input, nIter
more iterations will be run.
If on any iteration there are no variables that have posterior probability less than the threshold, the variable with the lowest posterior probability is dropped.
An object of either type iBMA.X, or of type iBMA.X.intermediate, where 'X' is either 'glm', 'bicreg' or 'surv'. Objects of type 'iBMA.X.intermediate' consist of a list with components for each parameter passed into iBMA.X as well as the following components:
sortedX |
a matrix or data.frame containing the sorted variables. |
call |
the matched call. |
initial.order |
the inital ordering of the variables. |
nVar |
the number of variables. |
currentSet |
a vector specifying the set of variables currently selected. |
nextVar |
the next variable to be examined |
current.probne0 |
the posterior probabilities for inclusion for each of the variables in the current set of variables. |
maxProbne0 |
the maximum posterior probability calculated for each variable |
nTimes |
the number of times each variable has been included in the set of selected variables |
currIter |
the current iteration number |
new.vars |
the set of variables that will be added to the current set during the next iteration |
first.in.model |
a vector of numbers giving the iteration number that each variable was first examined in. A value of NA indicates that a variable has not yet been examined. |
iter.dropped |
a vector giving the iteration number in which each variable was dropped from the current set. A value of NA indicates that a variable has not yet been dropeed. |
Objects of the type iBMA.glm contain in addition to all of these elements the following components:
nIterations |
the total number of iterations that were run |
selected |
the set of variables that were selected (in terms of the initial ordering of the variables) |
bma |
an object of type 'bic.X' containing the results of the Bayesian model averaging run on the selected set of variables. |
The parameters verbose
and nIter
can be changed between sets of iterations.
The parameter sorted
specifies if the variables should be sorted prior to iteration, if sorted
is set to FALSE
then the variables are sorted according to the decreasing single variable model R2 values for iBMA.bicreg or the single variable model increasing Chi-sq P-values for iBMA.glm and iBMA.surv.
Subsequent reference to variables is in terms of this ordered set of variables.
It is possible to obtain degenerate results when using a large number of predictor variables in linear regression. This problem is much less common with logistic regression and survival analysis.
Ka Yee Yeung, [email protected], Adrian Raftery [email protected], Ian Painter [email protected]
Yeung, K.Y., Bumgarner, R.E. and Raftery, A.E. (2005). ‘ Bayesian Model Averaging: Development of an improved multi-class, gene selection and classification tool for microarray data.’ Bioinformatics, 21(10), 2394-2402
bic.glm
,
bicreg
,
bic.surv
,
summary.iBMA.bicreg
,
print.iBMA.bicreg
,
orderplot.iBMA.bicreg
## Not run: ############ iBMA.glm library("MASS") data(birthwt) y<- birthwt$lo x<- data.frame(birthwt[,-1]) x$race<- as.factor(x$race) x$ht<- (x$ht>=1)+0 x<- x[,-9] x$smoke <- as.factor(x$smoke) x$ptl<- as.factor(x$ptl) x$ht <- as.factor(x$ht) x$ui <- as.factor(x$ui) ### add 41 columns of noise noise<- matrix(rnorm(41*nrow(x)), ncol=41) colnames(noise)<- paste('noise', 1:41, sep='') x<- cbind(x, noise) iBMA.glm.out<- iBMA.glm( x, y, glm.family="binomial", factor.type=FALSE, verbose = TRUE, thresProbne0 = 5 ) summary(iBMA.glm.out) ## End(Not run) ## Not run: ################## iBMA.surv library(survival) data(cancer) surv.t<- veteran$time cens<- veteran$status veteran$time<- NULL veteran$status<- NULL lvet<- nrow(veteran) invlogit<- function(x) exp(x)/(1+exp(x)) # generate random noise, 34 uniform variables # and 10 factors each with 4 levels X <- data.frame(matrix(runif(lvet*34), ncol=34), matrix(letters[1:6][(rbinom(10*lvet, 3, .5))+1], ncol = 10)) colnames(X) <- c(paste("u",1:34, sep=""),paste("C",1:10, sep="")) for(i in 35:44) X[,i] <- as.factor(X[,i]) veteran_plus_noise<- cbind(veteran, X) test.iBMA.surv <- iBMA.surv(x = veteran_plus_noise, surv.t = surv.t, cens = cens, thresProbne0 = 5, maxNvar = 30, factor.type = TRUE, verbose = TRUE, nIter = 100) test.iBMA.surv summary(test.iBMA.surv) ## End(Not run) ## Not run: ############ iBMA.bicreg ... degenerate example library(MASS) data(UScrime) UScrime$M<- log(UScrime$M); UScrime$Ed<- log(UScrime$Ed); UScrime$Po1<- log(UScrime$Po1); UScrime$Po2<- log(UScrime$Po2); UScrime$LF<- log(UScrime$LF); UScrime$M.F<- log(UScrime$M.F) UScrime$Pop<- log(UScrime$Pop); UScrime$NW<- log(UScrime$NW); UScrime$U1<- log(UScrime$U1); UScrime$U2<- log(UScrime$U2); UScrime$GDP<- log(UScrime$GDP); UScrime$Ineq<- log(UScrime$Ineq) UScrime$Prob<- log(UScrime$Prob); UScrime$Time<- log(UScrime$Time) noise<- matrix(rnorm(35*nrow(UScrime)), ncol=35) colnames(noise)<- paste('noise', 1:35, sep='') UScrime_plus_noise<- cbind(UScrime, noise) y<- UScrime_plus_noise$y UScrime_plus_noise$y <- NULL # run 2 iterations and examine results iBMA.bicreg.crime <- iBMA.bicreg( x = UScrime_plus_noise, Y = y, thresProbne0 = 5, verbose = TRUE, maxNvar = 30, nIter = 2) summary(iBMA.bicreg.crime) orderplot(iBMA.bicreg.crime) ## End(Not run) ## Not run: # run from current state until completion iBMA.bicreg.crime <- iBMA.bicreg( iBMA.bicreg.crime, nIter = 200) summary(iBMA.bicreg.crime) orderplot(iBMA.bicreg.crime) ## End(Not run) set.seed(0) x <- matrix( rnorm(50*30), 50, 30) lp <- apply( x[,1:5], 1, sum) iBMA.bicreg.ex <- iBMA.bicreg( x = x, Y = lp, thresProbne0 = 5, maxNvar = 20) explp <- exp(lp) prob <- explp/(1+explp) y=rbinom(n=length(prob),prob=prob,size=1) iBMA.glm.ex <- iBMA.glm( x = x, Y = y, glm.family = "binomial", factor.type = FALSE, thresProbne0 = 5, maxNvar = 20) cat("\n\n CAUTION: iBMA.bicreg can give degenerate results when") cat(" the number of predictor variables is large\n\n")
## Not run: ############ iBMA.glm library("MASS") data(birthwt) y<- birthwt$lo x<- data.frame(birthwt[,-1]) x$race<- as.factor(x$race) x$ht<- (x$ht>=1)+0 x<- x[,-9] x$smoke <- as.factor(x$smoke) x$ptl<- as.factor(x$ptl) x$ht <- as.factor(x$ht) x$ui <- as.factor(x$ui) ### add 41 columns of noise noise<- matrix(rnorm(41*nrow(x)), ncol=41) colnames(noise)<- paste('noise', 1:41, sep='') x<- cbind(x, noise) iBMA.glm.out<- iBMA.glm( x, y, glm.family="binomial", factor.type=FALSE, verbose = TRUE, thresProbne0 = 5 ) summary(iBMA.glm.out) ## End(Not run) ## Not run: ################## iBMA.surv library(survival) data(cancer) surv.t<- veteran$time cens<- veteran$status veteran$time<- NULL veteran$status<- NULL lvet<- nrow(veteran) invlogit<- function(x) exp(x)/(1+exp(x)) # generate random noise, 34 uniform variables # and 10 factors each with 4 levels X <- data.frame(matrix(runif(lvet*34), ncol=34), matrix(letters[1:6][(rbinom(10*lvet, 3, .5))+1], ncol = 10)) colnames(X) <- c(paste("u",1:34, sep=""),paste("C",1:10, sep="")) for(i in 35:44) X[,i] <- as.factor(X[,i]) veteran_plus_noise<- cbind(veteran, X) test.iBMA.surv <- iBMA.surv(x = veteran_plus_noise, surv.t = surv.t, cens = cens, thresProbne0 = 5, maxNvar = 30, factor.type = TRUE, verbose = TRUE, nIter = 100) test.iBMA.surv summary(test.iBMA.surv) ## End(Not run) ## Not run: ############ iBMA.bicreg ... degenerate example library(MASS) data(UScrime) UScrime$M<- log(UScrime$M); UScrime$Ed<- log(UScrime$Ed); UScrime$Po1<- log(UScrime$Po1); UScrime$Po2<- log(UScrime$Po2); UScrime$LF<- log(UScrime$LF); UScrime$M.F<- log(UScrime$M.F) UScrime$Pop<- log(UScrime$Pop); UScrime$NW<- log(UScrime$NW); UScrime$U1<- log(UScrime$U1); UScrime$U2<- log(UScrime$U2); UScrime$GDP<- log(UScrime$GDP); UScrime$Ineq<- log(UScrime$Ineq) UScrime$Prob<- log(UScrime$Prob); UScrime$Time<- log(UScrime$Time) noise<- matrix(rnorm(35*nrow(UScrime)), ncol=35) colnames(noise)<- paste('noise', 1:35, sep='') UScrime_plus_noise<- cbind(UScrime, noise) y<- UScrime_plus_noise$y UScrime_plus_noise$y <- NULL # run 2 iterations and examine results iBMA.bicreg.crime <- iBMA.bicreg( x = UScrime_plus_noise, Y = y, thresProbne0 = 5, verbose = TRUE, maxNvar = 30, nIter = 2) summary(iBMA.bicreg.crime) orderplot(iBMA.bicreg.crime) ## End(Not run) ## Not run: # run from current state until completion iBMA.bicreg.crime <- iBMA.bicreg( iBMA.bicreg.crime, nIter = 200) summary(iBMA.bicreg.crime) orderplot(iBMA.bicreg.crime) ## End(Not run) set.seed(0) x <- matrix( rnorm(50*30), 50, 30) lp <- apply( x[,1:5], 1, sum) iBMA.bicreg.ex <- iBMA.bicreg( x = x, Y = lp, thresProbne0 = 5, maxNvar = 20) explp <- exp(lp) prob <- explp/(1+explp) y=rbinom(n=length(prob),prob=prob,size=1) iBMA.glm.ex <- iBMA.glm( x = x, Y = y, glm.family = "binomial", factor.type = FALSE, thresProbne0 = 5, maxNvar = 20) cat("\n\n CAUTION: iBMA.bicreg can give degenerate results when") cat(" the number of predictor variables is large\n\n")
Creates an image of the models selected using bicreg
, bic.glm
or bic.surv
.
imageplot.bma(bma.out, color = c("red", "blue", "#FFFFD5"), order = c("input", "probne0", "mds"), ...)
imageplot.bma(bma.out, color = c("red", "blue", "#FFFFD5"), order = c("input", "probne0", "mds"), ...)
bma.out |
An object of type 'bicreg', 'bic.glm' or 'bic.surv' |
color |
A vector of colors of length 3, or a string with value "default" or "blackandwhite", representing the colors to use for the plot. The first color is the color to use when the variable estimate is positive, the second color is the color to use when the variable estimate is negative, and the third color is the color to use when the variable is not included in the model. The value "default" is available for backward compatibility with the first version of |
order |
The order in which to show the variables. The value "input" keeps the order as found in the object, the value "probne0" orders the variables in terms of probability of inclusion, and the value "mds" orders the variables using (single) multidimensional scaling |
... |
Other parameters to be passed to the |
Creates an image of the models selected using bicreg
, bic.glm
or bic.surv
. The image displays inclusion and exclusion of variables within models using separate colors. By default the color for inclusion depends on whether the variable estimate for each model is positive or negative.
If the factor.type == TRUE
option is set in the bma object being displayed, then imageplot.bma
displays only inclusion and exclusion of models, with the color not linked to variable estimates.
The option color = "mds"
is useful for observing variables with linked behavior, it attemps to order the variables in such a way as to keep variabiles with linked behavior (for example, one variabile is only included in a model when another variabile is not included in the model) close together.
This option uses multidimensional scaling on one dimension using Kendall's tau statistic calculated on two-by-two tables of pairwise comparisons of variable inclusion/exclusion from the selected models.
Adrian E. Raftery [email protected] and Hana Sevcikova [email protected]
Clyde, M. (1999) Bayesian Model Averaging and Model Search Strategies (with discussion). In Bayesian Statistics 6. J.M. Bernardo, A.P. Dawid, J.O. Berger, and A.F.M. Smith eds. Oxford University Press, pages 157-185.
# logistic regression using bic.glm library("MASS") data(birthwt) y<- birthwt$lo x<- data.frame(birthwt[,-1]) x$race<- as.factor(x$race) x$ht<- (x$ht>=1)+0 x<- x[,-9] x$smoke <- as.factor(x$smoke) x$ptl<- as.factor(x$ptl) x$ht <- as.factor(x$ht) x$ui <- as.factor(x$ui) glm.out1<- bic.glm(x, y, strict = TRUE, OR = 20, glm.family="binomial") imageplot.bma(glm.out1) ## Not run: # logistic regression using glib library("MASS") data(birthwt) y<- birthwt$lo x<- data.frame(birthwt[,-1]) x$race<- as.factor(x$race) x$ht<- (x$ht>=1)+0 x<- x[,-9] x$smoke <- as.factor(x$smoke) x$ptl<- as.factor(x$ptl) x$ht <- as.factor(x$ht) x$ui <- as.factor(x$ui) glib.birthwt<- glib(x,y, error="binomial", link = "logit") glm.birthwt<- as.bic.glm(glib.birthwt) imageplot.bma(glm.birthwt, order = "mds") ## End(Not run)
# logistic regression using bic.glm library("MASS") data(birthwt) y<- birthwt$lo x<- data.frame(birthwt[,-1]) x$race<- as.factor(x$race) x$ht<- (x$ht>=1)+0 x<- x[,-9] x$smoke <- as.factor(x$smoke) x$ptl<- as.factor(x$ptl) x$ht <- as.factor(x$ht) x$ui <- as.factor(x$ui) glm.out1<- bic.glm(x, y, strict = TRUE, OR = 20, glm.family="binomial") imageplot.bma(glm.out1) ## Not run: # logistic regression using glib library("MASS") data(birthwt) y<- birthwt$lo x<- data.frame(birthwt[,-1]) x$race<- as.factor(x$race) x$ht<- (x$ht>=1)+0 x<- x[,-9] x$smoke <- as.factor(x$smoke) x$ptl<- as.factor(x$ptl) x$ht <- as.factor(x$ht) x$ui <- as.factor(x$ui) glib.birthwt<- glib(x,y, error="binomial", link = "logit") glm.birthwt<- as.bic.glm(glib.birthwt) imageplot.bma(glm.birthwt, order = "mds") ## End(Not run)
Performs Bayesian simultaneous variable selection and outlier identification (SVO) via Markov chain Monte Carlo model composition (MC3).
MC3.REG(all.y, all.x, num.its, M0.var= , M0.out= , outs.list= , outliers = TRUE, PI=.1*(length(all.y) <50) + .02*(length(all.y) >= 50), K=7, nu= , lambda= , phi= )
MC3.REG(all.y, all.x, num.its, M0.var= , M0.out= , outs.list= , outliers = TRUE, PI=.1*(length(all.y) <50) + .02*(length(all.y) >= 50), K=7, nu= , lambda= , phi= )
all.y |
a vector of responses |
all.x |
a matrix of covariates |
num.its |
the number of iterations of the Markov chain sampler |
M0.var |
a logical vector specifying the starting model. For example, if you have 3 predictors and the starting model is X1 and X3, then |
M0.out |
a logical vector specifying the starting model outlier set. The default value is a logical vector of |
outs.list |
a vector of all potential outlier locations (e.g. |
outliers |
a logical parameter indicating whether outliers are to be included. If |
PI |
a hyperparameter indicating the prior probability of an outlier. The default values are 0.1 if the data set has less than 50 observations, 0.02 otherwise. |
K |
a hyperparameter indicating the outlier inflation factor |
nu |
regression hyperparameter. Default value is 2.58 if r2 for the full model is less than 0.9 or 0.2 if r2 for the full model is greater than 0.9. |
lambda |
regression hyperparameter. Default value is 0.28 if r2 for the full model is less than 0.9 or 0.1684 if r2 for the full model is greater than 0.9. |
phi |
regression hyperparameter. Default value is 2.85 if r2 for the full model is less than 0.9 or 9.2 if r2 for the full model is greater than 0.9. |
Performs Bayesian simultaneous variable and outlier selection using Monte Carlo Markov Chain Model Choice (MC3). Potential models are visited using a Metropolis-Hastings algorithm on the integrated likelihood. At the end of the chain exact posterior probabilities are calculated for each model visited.
An object of class mc3
. Print and summary methods exist for this class.
Objects of class mc3
are a list consisting of at least
post.prob |
The posterior probabilities of each model visited. |
variables |
An indicator matrix of the variables in each model. |
outliers |
An indicator matrix of the outliers in each model, if outliers were selected. |
visit.count |
The number of times each model was visited. |
outlier.numbers |
An index showing which outliers were eligable for selection. |
var.names |
The names of the variables. |
n.models |
The number of models visited. |
PI |
The value of PI used. |
K |
The value of K used. |
nu |
The value of nu used. |
lambda |
The value of lambda used. |
phi |
The value of phi used. |
call |
The function call. |
The default values for nu
, lambda
and phi
are recommended when the R2 value for the full model with all outliers is less than 0.9.
If PI
is set too high it is possible to generate sub models which are singular, at which point the function will crash.
The implementation of this function is different from that used in the Splus function. In particular, variables which were global are now passed between functions.
Jennifer Hoeting [email protected] with the assistance of Gary Gadbury. Translation from Splus to R by Ian S. Painter.
Bayesian Model Averaging for Linear Regression Models Adrian E. Raftery, David Madigan, and Jennifer A. Hoeting (1997). Journal of the American Statistical Association, 92, 179-191.
A Method for Simultaneous Variable and Transformation Selection in Linear Regression Jennifer Hoeting, Adrian E. Raftery and David Madigan (2002). Journal of Computational and Graphical Statistics 11 (485-507)
A Method for Simultaneous Variable Selection and Outlier Identification in Linear Regression Jennifer Hoeting, Adrian E. Raftery and David Madigan (1996). Computational Statistics and Data Analysis, 22, 251-270
Earlier versions of these papers are available via the World Wide Web using the url: https://www.stat.colostate.edu/~jah/papers/
## Not run: # Example 1: Scottish hill racing data. data(race) b<- out.ltsreg(race[,-1], race[,1], 2) races.run1<-MC3.REG(race[,1], race[,-1], num.its=20000, c(FALSE,TRUE), rep(TRUE,length(b)), b, PI = .1, K = 7, nu = .2, lambda = .1684, phi = 9.2) races.run1 summary(races.run1) ## End(Not run) # Example 2: Crime data library(MASS) data(UScrime) y.crime.log<- log(UScrime$y) x.crime.log<- UScrime[,-ncol(UScrime)] x.crime.log[,-2]<- log(x.crime.log[,-2]) crime.run1<-MC3.REG(y.crime.log, x.crime.log, num.its=2000, rep(TRUE,15), outliers = FALSE) crime.run1[1:25,] summary(crime.run1)
## Not run: # Example 1: Scottish hill racing data. data(race) b<- out.ltsreg(race[,-1], race[,1], 2) races.run1<-MC3.REG(race[,1], race[,-1], num.its=20000, c(FALSE,TRUE), rep(TRUE,length(b)), b, PI = .1, K = 7, nu = .2, lambda = .1684, phi = 9.2) races.run1 summary(races.run1) ## End(Not run) # Example 2: Crime data library(MASS) data(UScrime) y.crime.log<- log(UScrime$y) x.crime.log<- UScrime[,-ncol(UScrime)] x.crime.log[,-2]<- log(x.crime.log[,-2]) crime.run1<-MC3.REG(y.crime.log, x.crime.log, num.its=2000, rep(TRUE,15), outliers = FALSE) crime.run1[1:25,] summary(crime.run1)
Helper function to MC3.REG that chooses the proposal model for a Metropolis-Hastings step.
MC3.REG.choose(M0.var, M0.out)
MC3.REG.choose(M0.var, M0.out)
M0.var |
a logical vector specifying the variables in the current model. |
M0.out |
a logical vector specifying the outliers in the current model. |
A list representing the proposal model, with components
var |
a logical vector specifying the variables in the proposal model. |
out |
a logical vector specifying the outliers in the proposal model. |
The implementation here differs from the Splus implentation. The Splus implementation uses global variables to contain the state of the current model and the history of the Markov-Chain. This implentation passes the current state and history to the function and then returns the updated state.
Jennifer Hoeting [email protected] with the assistance of Gary Gadbury. Translation from Splus to R by Ian Painter [email protected].
Bayesian Model Averaging for Linear Regression Models Adrian E. Raftery, David Madigan, and Jennifer A. Hoeting (1997). Journal of the American Statistical Association, 92, 179-191.
A Method for Simultaneous Variable and Transformation Selection in Linear Regression Jennifer Hoeting, Adrian E. Raftery and David Madigan (2002). Journal of Computational and Graphical Statistics 11 (485-507)
A Method for Simultaneous Variable Selection and Outlier Identification in Linear Regression Jennifer Hoeting, Adrian E. Raftery and David Madigan (1996). Computational Statistics and Data Analysis, 22, 251-270
Earlier versions of these papers are available via the World Wide Web using the url: https://www.stat.colostate.edu/~jah/papers/
MC3.REG
, For.MC3.REG
, MC3.REG.logpost
Helper function to MC3.REG that calculates the posterior model probability (up to a constant).
MC3.REG.logpost(Y, X, model.vect, p, i, K, nu, lambda, phi)
MC3.REG.logpost(Y, X, model.vect, p, i, K, nu, lambda, phi)
Y |
the vector of scaled responses. |
X |
the matrix of scaled covariates. |
model.vect |
logical vector indicating which variables are to be included in the model |
p |
number of variables in model.vect |
i |
vector of possible outliers |
K |
a hyperparameter indicating the outlier inflation factor |
nu |
regression hyperparameter. Default value is 2.58 if r2 for the full model is less than 0.9 or 0.2 if r2 for the full model is greater than 0.9. |
lambda |
regression hyperparameter. Default value is 0.28 if r2 for the full model is less than 0.9 or 0.1684 if r2 for the full model is greater than 0.9. |
phi |
regression hyperparameter. Default value is 2.85 if r2 for the full model is less than 0.9 or 9.2 if r2 for the full model is greater than 0.9. |
The log-posterior distribution for the model (up to a constant).
The implementation here differs from the Splus implentation. The Splus implementation uses global variables to contain the state of the current model and the history of the Markov-Chain. This implentation passes the current state and history to the function and then returns the updated state.
Jennifer Hoeting [email protected] with the assistance of Gary Gadbury. Translation from Splus to R by Ian Painter [email protected].
Bayesian Model Averaging for Linear Regression Models Adrian E. Raftery, David Madigan, and Jennifer A. Hoeting (1997). Journal of the American Statistical Association, 92, 179-191.
A Method for Simultaneous Variable and Transformation Selection in Linear Regression Jennifer Hoeting, Adrian E. Raftery and David Madigan (2002). Journal of Computational and Graphical Statistics 11 (485-507)
A Method for Simultaneous Variable Selection and Outlier Identification in Linear Regression Jennifer Hoeting, Adrian E. Raftery and David Madigan (1996). Computational Statistics and Data Analysis, 22, 251-270
Earlier versions of these papers are available via the World Wide Web using the url: https://www.stat.colostate.edu/~jah/papers/
MC3.REG
, For.MC3.REG
, MC3.REG.choose
This function displays a plot showing the selection and rejection of variables being considered in an iterated Bayesian model averaging variable selection procedure.
orderplot(x, ...)
orderplot(x, ...)
x |
an object of type iBMA.glm, iBMA.bicreg, iBMA.surv, iBMA.intermediate.glm, iBMA.intermediate.bicreg or iBMA.intermediate.surv. |
... |
other parameters to be passed to plot.default |
The x-axis represents iterations, the y-axis variables. For each variable, a dot in the far left indicates that the variable has not yet been examined, a black line indicates the variable has been examined and dropped, the start of the line represents when the variable was first examined, the end represents when the variable was dropped. A blue line represents a variable that is still in the selected set of variables. If the iterations have completed then the blue lines end with blue dots, representing the final set of variables selected.
Ian Painter [email protected]
## Not run: ############ iBMA.glm library("MASS") data(birthwt) y<- birthwt$lo x<- data.frame(birthwt[,-1]) x$race<- as.factor(x$race) x$ht<- (x$ht>=1)+0 x<- x[,-9] x$smoke <- as.factor(x$smoke) x$ptl<- as.factor(x$ptl) x$ht <- as.factor(x$ht) x$ui <- as.factor(x$ui) ### add 41 columns of noise noise<- matrix(rnorm(41*nrow(x)), ncol=41) colnames(noise)<- paste('noise', 1:41, sep='') x<- cbind(x, noise) iBMA.glm.out<- iBMA.glm(x, y, glm.family="binomial", factor.type=FALSE, verbose = TRUE, thresProbne0 = 5 ) orderplot(iBMA.glm.out) ## End(Not run)
## Not run: ############ iBMA.glm library("MASS") data(birthwt) y<- birthwt$lo x<- data.frame(birthwt[,-1]) x$race<- as.factor(x$race) x$ht<- (x$ht>=1)+0 x<- x[,-9] x$smoke <- as.factor(x$smoke) x$ptl<- as.factor(x$ptl) x$ht <- as.factor(x$ht) x$ui <- as.factor(x$ui) ### add 41 columns of noise noise<- matrix(rnorm(41*nrow(x)), ncol=41) colnames(noise)<- paste('noise', 1:41, sep='') x<- cbind(x, noise) iBMA.glm.out<- iBMA.glm(x, y, glm.family="binomial", factor.type=FALSE, verbose = TRUE, thresProbne0 = 5 ) orderplot(iBMA.glm.out) ## End(Not run)
Function to identify potential outliers
out.ltsreg(x, y, delta)
out.ltsreg(x, y, delta)
x |
the design matrix |
y |
observations |
delta |
the threshold set by the user. Standardized residuals from least trimmed squares regression that are larger than delta are identified as potential outliers |
A 0/1 vector indicating whether each observation is a potential outlier. The function was designed for use with the variable and outlier selection function MC3.REG
Jennifer A. Hoeting
Displays plots of the posterior distributions of the coefficients generated by Bayesian model averaging over linear regression, generalized linear and survival analysis models.
## S3 method for class 'bicreg' plot(x, e = 1e-04, mfrow = NULL, include = 1:x$n.vars, include.intercept = TRUE, ...) ## S3 method for class 'bic.glm' plot(x, e = 1e-04, mfrow = NULL, include = 1:length(x$namesx), ...) ## S3 method for class 'bic.surv' plot(x, e = 1e-04, mfrow = NULL, include = 1:length(x$namesx), ...)
## S3 method for class 'bicreg' plot(x, e = 1e-04, mfrow = NULL, include = 1:x$n.vars, include.intercept = TRUE, ...) ## S3 method for class 'bic.glm' plot(x, e = 1e-04, mfrow = NULL, include = 1:length(x$namesx), ...) ## S3 method for class 'bic.surv' plot(x, e = 1e-04, mfrow = NULL, include = 1:length(x$namesx), ...)
x |
object of type bicreg, bic.glm or bic.surv. |
e |
optional numeric value specifying the range over which the distributions are to be graphed. |
mfrow |
optional vector specifying the layout for each set of graphs |
include |
optional numerical vector specifying which variables to graph (excluding intercept) |
include.intercept |
optional logical value, if true the posterior distribution of the intercept is incuded in the plots |
... |
other parameters to be passed to |
Produces a plot of the posterior distribuion of the coefficients produced by model averaging. The posterior probability that the coefficient is zero is represented by a solid line at zero, with height equal to the probability. The nonzero part of the distribution is scaled so that the maximum height is equal to the probability that the coefficient is nonzero.
The parameter e
specifies the range over which the distributions are to be graphed by specifying the tail probabilities that dictate the range to plot over.
Ian Painter [email protected]
Hoeting, J.A., Raftery, A.E. and Madigan, D. (1996). A method for simultaneous variable selection and outlier identification in linear regression. Computational Statistics and Data Analysis, 22, 251-270.
library(MASS) data(UScrime) x<- UScrime[,-16] y<- log(UScrime[,16]) x[,-2]<- log(x[,-2]) plot( bicreg(x, y))
library(MASS) data(UScrime) x<- UScrime[,-16] y<- log(UScrime[,16]) x[,-2]<- log(x[,-2]) plot( bicreg(x, y))
Bayesian Model Averaging (BMA) accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to approximate posterior model probability. This function predicts the response resulting from a BMA generalized linear model from given data.
## S3 method for class 'bic.glm' predict( object, newdata, ...)
## S3 method for class 'bic.glm' predict( object, newdata, ...)
object |
a fitted object inheriting from class |
newdata |
a data frame containing observations on variables from which the predictor variables are to be selected or constructed from a formula. |
... |
ignored (for compatibility with generic function). |
The predicted values from the BMA model for each observation in newdata.
## Not run: # Example 1 (Gaussian) library(MASS) data(UScrime) f <- formula(log(y) ~ log(M)+So+log(Ed)+log(Po1)+log(Po2)+ log(LF)+log(M.F)+log(Pop)+log(NW)+log(U1)+log(U2)+ log(GDP)+log(Ineq)+log(Prob)+log(Time)) bic.glm.crimeT <- bic.glm(f, data = UScrime, glm.family = gaussian()) predict(bic.glm.crimeT, newdata = UScrime) bic.glm.crimeF <- bic.glm(f, data = UScrime, glm.family = gaussian(), factor.type = FALSE) predict(bic.glm.crimeF, newdata = UScrime) ## End(Not run) ## Not run: # Example 2 (binomial) library(MASS) data(birthwt) y <- birthwt$lo x <- data.frame(birthwt[,-1]) x$race <- as.factor(x$race) x$ht <- (x$ht>=1)+0 x <- x[,-9] x$smoke <- as.factor(x$smoke) x$ptl <- as.factor(x$ptl) x$ht <- as.factor(x$ht) x$ui <- as.factor(x$ui) bic.glm.bwT <- bic.glm(x, y, strict = FALSE, OR = 20, glm.family="binomial", factor.type=TRUE) predict( bic.glm.bwT, newdata = x) bic.glm.bwF <- bic.glm(x, y, strict = FALSE, OR = 20, glm.family="binomial", factor.type=FALSE) predict( bic.glm.bwF, newdata = x) ## End(Not run) ## Not run: # Example 3 (Gaussian) library(MASS) data(anorexia) anorexia.formula <- formula(Postwt ~ Prewt+Treat+offset(Prewt)) bic.glm.anorexiaF <- bic.glm( anorexia.formula, data=anorexia, glm.family="gaussian", factor.type=FALSE) predict( bic.glm.anorexiaF, newdata=anorexia) bic.glm.anorexiaT <- bic.glm( anorexia.formula, data=anorexia, glm.family="gaussian", factor.type=TRUE) predict( bic.glm.anorexiaT, newdata=anorexia) ## End(Not run) ## Not run: # Example 4 (Gamma) library(survival) data(cancer) surv.t <- veteran$time x <- veteran[,-c(3,4)] x$celltype <- factor(as.character(x$celltype)) sel<- veteran$status == 0 x <- x[!sel,] surv.t <- surv.t[!sel] bic.glm.vaT <- bic.glm(x, y=surv.t, glm.family=Gamma(link="inverse"), factor.type=TRUE) predict( bic.glm.vaT, x) bic.glm.vaF <- bic.glm(x, y=surv.t, glm.family=Gamma(link="inverse"), factor.type=FALSE) predict( bic.glm.vaF, x) ## End(Not run) # Example 5 (poisson - Yates teeth data) x <- rbind.data.frame(c(0, 0, 0), c(0, 1, 0), c(1, 0, 0), c(1, 1, 1)) y <- c(4, 16, 1, 21) n <- c(1,1,1,1) bic.glm.yatesF <- bic.glm( x, y, glm.family=poisson(), weights=n, factor.type=FALSE) predict( bic.glm.yatesF, x) ## Not run: # Example 6 (binomial - Venables and Ripley) ldose <- rep(0:5, 2) numdead <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16) sex <- factor(rep(c("M", "F"), c(6, 6))) SF <- cbind(numdead, numalive=20-numdead) budworm <- cbind.data.frame(ldose = ldose, numdead = numdead, sex = sex, SF = SF) budworm.formula <- formula(SF ~ sex*ldose) bic.glm.budwormF <- bic.glm( budworm.formula, data=budworm, glm.family="binomial", factor.type=FALSE) predict(bic.glm.budwormF, newdata=budworm) bic.glm.budwormT <- bic.glm( budworm.formula, data=budworm, glm.family="binomial", factor.type=TRUE) predict(bic.glm.budwormT, newdata=budworm) ## End(Not run)
## Not run: # Example 1 (Gaussian) library(MASS) data(UScrime) f <- formula(log(y) ~ log(M)+So+log(Ed)+log(Po1)+log(Po2)+ log(LF)+log(M.F)+log(Pop)+log(NW)+log(U1)+log(U2)+ log(GDP)+log(Ineq)+log(Prob)+log(Time)) bic.glm.crimeT <- bic.glm(f, data = UScrime, glm.family = gaussian()) predict(bic.glm.crimeT, newdata = UScrime) bic.glm.crimeF <- bic.glm(f, data = UScrime, glm.family = gaussian(), factor.type = FALSE) predict(bic.glm.crimeF, newdata = UScrime) ## End(Not run) ## Not run: # Example 2 (binomial) library(MASS) data(birthwt) y <- birthwt$lo x <- data.frame(birthwt[,-1]) x$race <- as.factor(x$race) x$ht <- (x$ht>=1)+0 x <- x[,-9] x$smoke <- as.factor(x$smoke) x$ptl <- as.factor(x$ptl) x$ht <- as.factor(x$ht) x$ui <- as.factor(x$ui) bic.glm.bwT <- bic.glm(x, y, strict = FALSE, OR = 20, glm.family="binomial", factor.type=TRUE) predict( bic.glm.bwT, newdata = x) bic.glm.bwF <- bic.glm(x, y, strict = FALSE, OR = 20, glm.family="binomial", factor.type=FALSE) predict( bic.glm.bwF, newdata = x) ## End(Not run) ## Not run: # Example 3 (Gaussian) library(MASS) data(anorexia) anorexia.formula <- formula(Postwt ~ Prewt+Treat+offset(Prewt)) bic.glm.anorexiaF <- bic.glm( anorexia.formula, data=anorexia, glm.family="gaussian", factor.type=FALSE) predict( bic.glm.anorexiaF, newdata=anorexia) bic.glm.anorexiaT <- bic.glm( anorexia.formula, data=anorexia, glm.family="gaussian", factor.type=TRUE) predict( bic.glm.anorexiaT, newdata=anorexia) ## End(Not run) ## Not run: # Example 4 (Gamma) library(survival) data(cancer) surv.t <- veteran$time x <- veteran[,-c(3,4)] x$celltype <- factor(as.character(x$celltype)) sel<- veteran$status == 0 x <- x[!sel,] surv.t <- surv.t[!sel] bic.glm.vaT <- bic.glm(x, y=surv.t, glm.family=Gamma(link="inverse"), factor.type=TRUE) predict( bic.glm.vaT, x) bic.glm.vaF <- bic.glm(x, y=surv.t, glm.family=Gamma(link="inverse"), factor.type=FALSE) predict( bic.glm.vaF, x) ## End(Not run) # Example 5 (poisson - Yates teeth data) x <- rbind.data.frame(c(0, 0, 0), c(0, 1, 0), c(1, 0, 0), c(1, 1, 1)) y <- c(4, 16, 1, 21) n <- c(1,1,1,1) bic.glm.yatesF <- bic.glm( x, y, glm.family=poisson(), weights=n, factor.type=FALSE) predict( bic.glm.yatesF, x) ## Not run: # Example 6 (binomial - Venables and Ripley) ldose <- rep(0:5, 2) numdead <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16) sex <- factor(rep(c("M", "F"), c(6, 6))) SF <- cbind(numdead, numalive=20-numdead) budworm <- cbind.data.frame(ldose = ldose, numdead = numdead, sex = sex, SF = SF) budworm.formula <- formula(SF ~ sex*ldose) bic.glm.budwormF <- bic.glm( budworm.formula, data=budworm, glm.family="binomial", factor.type=FALSE) predict(bic.glm.budwormF, newdata=budworm) bic.glm.budwormT <- bic.glm( budworm.formula, data=budworm, glm.family="binomial", factor.type=TRUE) predict(bic.glm.budwormT, newdata=budworm) ## End(Not run)
Bayesian Model Averaging (BMA) accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to approximate posterior model probability. This function predicts the response resulting from a BMA linear model from given data.
## S3 method for class 'bicreg' predict( object, newdata, quantiles, ...)
## S3 method for class 'bicreg' predict( object, newdata, quantiles, ...)
object |
a fitted object inheriting from class |
newdata |
a data frame containing observations on variables from which the predictor variables are to be selected or constructed from a formula. |
quantiles |
The quantiles for which a predictive estimate is
desired. The default is |
... |
ignored (for compatibility with generic function). |
The predicted response values from the BMA model for each observation in newdata.
library(MASS) # Example 1 data(UScrime) x <- UScrime[,-16] y <- log(UScrime[,16]) x[,-2]<- log(x[,-2]) crimeBMA <- bicreg(x, y, strict = FALSE, OR = 20) predict( crimeBMA, x) # Example 2 (Venables and Ripley) npkBMA <- bicreg( x = npk[, c("block","N","K")], y=npk$yield) predict( npkBMA, newdata = npk) # Example 3 (Venables and Ripley) gasPRbma <- bicreg( x = whiteside[,c("Insul", "Temp")], y = whiteside$Gas) predict( gasPRbma, newdata = whiteside)
library(MASS) # Example 1 data(UScrime) x <- UScrime[,-16] y <- log(UScrime[,16]) x[,-2]<- log(x[,-2]) crimeBMA <- bicreg(x, y, strict = FALSE, OR = 20) predict( crimeBMA, x) # Example 2 (Venables and Ripley) npkBMA <- bicreg( x = npk[, c("block","N","K")], y=npk$yield) predict( npkBMA, newdata = npk) # Example 3 (Venables and Ripley) gasPRbma <- bicreg( x = whiteside[,c("Insul", "Temp")], y = whiteside$Gas) predict( gasPRbma, newdata = whiteside)
The record-winning times for 35 hill races in Scotland, as reported by Atkinson (1986).
data(race)
data(race)
data.frame
The distance travelled and the height climbed in each race is also given. The data contains a known error - Atkinson (1986) reports that the record for Knock Hill (observation 18) should actually be 18 minutes rather than 78 minutes.
Description
Name of race
Distance covered in miles
Elevation climbed during race in feet
Record time for race in minutes
http://www.statsci.org/data/general/hills.html
Atkison, A.C., Comments on "Influential Observations, High Leverage Points, and Outliers in Linear Regression", Statistical Science, 1 (1986) 397-402
summary
and print
methods for Bayesian model averaging objects.
## S3 method for class 'bicreg' summary(object, n.models = 5, digits = max(3, getOption("digits") - 3), conditional = FALSE, display.dropped = FALSE, ...) ## S3 method for class 'bic.glm' summary(object, n.models = 5, digits = max(3, getOption("digits") - 3), conditional = FALSE, display.dropped = FALSE, ...) ## S3 method for class 'bic.surv' summary(object, n.models = 5, digits = max(3, getOption("digits") - 3), conditional = FALSE, display.dropped = FALSE, ...) ## S3 method for class 'glib' summary(object, n.models = 5, digits = max(3, getOption("digits") - 3), conditional = FALSE, index.phi=1, ...) ## S3 method for class 'mc3' summary(object, n.models = 5, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'bicreg' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'bic.glm' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'bic.surv' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'mc3' print(x, digits = max(3, getOption("digits") - 3), n.models = nrow(x$variables), ...)
## S3 method for class 'bicreg' summary(object, n.models = 5, digits = max(3, getOption("digits") - 3), conditional = FALSE, display.dropped = FALSE, ...) ## S3 method for class 'bic.glm' summary(object, n.models = 5, digits = max(3, getOption("digits") - 3), conditional = FALSE, display.dropped = FALSE, ...) ## S3 method for class 'bic.surv' summary(object, n.models = 5, digits = max(3, getOption("digits") - 3), conditional = FALSE, display.dropped = FALSE, ...) ## S3 method for class 'glib' summary(object, n.models = 5, digits = max(3, getOption("digits") - 3), conditional = FALSE, index.phi=1, ...) ## S3 method for class 'mc3' summary(object, n.models = 5, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'bicreg' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'bic.glm' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'bic.surv' print(x, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'mc3' print(x, digits = max(3, getOption("digits") - 3), n.models = nrow(x$variables), ...)
object |
object of type 'bicreg', 'bic.glm', 'bic.surv', 'glib' or 'mc3' |
x |
object of type 'bicreg', 'bic.glm', 'bic.surv', 'glib' or 'mc3' |
n.models |
optional number specifying the number of models to display in summary |
digits |
optional number specifying the number of digits to display |
conditional |
optional logical value specifying whether to display conditional expectation and standard deviation |
display.dropped |
optional logical value specifying whether to display the names of any variables dropped before model averaging takes place |
index.phi |
optional number specifying which value of phi to use if multiple values of phi were run. Applies to |
... |
other parameters to be passed to |
The print methods display a view similar to print.lm
or print.glm
.
The summary methods display a view specific to model averaging.
The summary function does not create a summary object (unlike summary.lm
or summary.glm
), instead it directly prints the summary. Note that no calculations are done to create the summary.
Ian Painter [email protected]
# logistic regression library("MASS") data(birthwt) y<- birthwt$lo x<- data.frame(birthwt[,-1]) x$race<- as.factor(x$race) x$ht<- (x$ht>=1)+0 x<- x[,-9] x$smoke <- as.factor(x$smoke) x$ptl<- as.factor(x$ptl) x$ht <- as.factor(x$ht) x$ui <- as.factor(x$ui) glm.out1<- bic.glm(x, y, OR = 20, glm.family="binomial", factor.type=TRUE) summary(glm.out1, conditional = TRUE)
# logistic regression library("MASS") data(birthwt) y<- birthwt$lo x<- data.frame(birthwt[,-1]) x$race<- as.factor(x$race) x$ht<- (x$ht>=1)+0 x<- x[,-9] x$smoke <- as.factor(x$smoke) x$ptl<- as.factor(x$ptl) x$ht <- as.factor(x$ht) x$ui <- as.factor(x$ui) glm.out1<- bic.glm(x, y, OR = 20, glm.family="binomial", factor.type=TRUE) summary(glm.out1, conditional = TRUE)
summary
and print
methods for iterated Bayesian model averaging objects.
## S3 method for class 'iBMA.glm' summary(object, ...) ## S3 method for class 'iBMA.bicreg' summary(object, ...) ## S3 method for class 'iBMA.surv' summary(object, ...) ## S3 method for class 'iBMA.glm' print(x, ...) ## S3 method for class 'iBMA.bicreg' print(x, ...) ## S3 method for class 'iBMA.surv' print(x, ...) ## S3 method for class 'iBMA.intermediate.glm' summary(object, ...) ## S3 method for class 'iBMA.intermediate.bicreg' summary(object, ...) ## S3 method for class 'iBMA.intermediate.surv' summary(object, ...) ## S3 method for class 'iBMA.intermediate.glm' print(x, ...) ## S3 method for class 'iBMA.intermediate.bicreg' print(x, ...) ## S3 method for class 'iBMA.intermediate.surv' print(x, ...)
## S3 method for class 'iBMA.glm' summary(object, ...) ## S3 method for class 'iBMA.bicreg' summary(object, ...) ## S3 method for class 'iBMA.surv' summary(object, ...) ## S3 method for class 'iBMA.glm' print(x, ...) ## S3 method for class 'iBMA.bicreg' print(x, ...) ## S3 method for class 'iBMA.surv' print(x, ...) ## S3 method for class 'iBMA.intermediate.glm' summary(object, ...) ## S3 method for class 'iBMA.intermediate.bicreg' summary(object, ...) ## S3 method for class 'iBMA.intermediate.surv' summary(object, ...) ## S3 method for class 'iBMA.intermediate.glm' print(x, ...) ## S3 method for class 'iBMA.intermediate.bicreg' print(x, ...) ## S3 method for class 'iBMA.intermediate.surv' print(x, ...)
object |
object of type |
x |
object of type |
... |
other parameters to be passed to |
These methods provide concise and summarized information about the variables that have been examined up to the last iteration. If the result is a final result then the methods also display the results of calling print or summary on the Bayesian model average object for the final set of variables.
The summary function does not create a summary object
(unlike summary.lm
or summary.glm
).
Instead it directly prints the summary.
Note that no calculations are done to create the summary.
Ian Painter [email protected]
Finney's data on vaso-contriction in the skin of the digits.
The vaso
data frame has 39 rows and 3 columns.
data(vaso)
data(vaso)
This data frame contains the following columns:
volume
rate
response: 0= nonoccurrence, 1= occurrence
Atkinson, A.C. and Riani, M. (2000), Robust Diagnostic Regression Analysis, First Edition. New York: Springer, Table A.23