Title: | Bayesian Longitudinal Regularized Semiparametric Quantile Mixed Models |
---|---|
Description: | Our recently developed fully Bayesian semiparametric quantile mixed-effect model for high-dimensional longitudinal studies with heterogeneous observations can be implemented through this package. This model can distinguish between time-varying interactions and constant-effect-only cases to avoid model misspecifications. Facilitated by spike-and-slab priors, this model leads to superior performance in estimation, identification and statistical inference. In particular, robust Bayesian inferences in terms of valid Bayesian credible intervals on both parametric and nonparametric effects can be validated on finite samples. The Markov chain Monte Carlo algorithms of the proposed and alternative models are efficiently implemented in 'C++'. |
Authors: | Kun Fan [aut, cre], Cen Wu [aut] |
Maintainer: | Kun Fan <[email protected]> |
License: | GPL-2 |
Version: | 0.1.0 |
Built: | 2024-11-25 16:31:20 UTC |
Source: | https://github.com/kunfa/blend |
In this package, we further extend the sparse Bayesian quantile mixed models to nonlinear longitudinal interactions. Specifically, the proposed Bayesian quantile semiparametric model is robust not only to outliers and heavy‐tailed distributions of the response variable, but also to the misspecification of interaction effect in the forms other than non-linear interactions. We have developed the Gibbs sampler with the spike‐and‐slab priors to promote sparse identification of appropriate forms of main and interaction effects. In addition to the default method, users can also choose different selection structures for separation of constant and varying effects or not, methods without spike–and–slab priors and non-robust methods. In total, Blend provides 8 different methods (4 robust and 4 non-robust) under the random intercept and slope model. All the methods in this package are developed for the first time. Please read the Details below for how to configure the method used.
The user friendly, integrated interface Blend() allows users to flexibly choose the fitting methods by specifying the following parameter:
robust: | whether to use robust methods for modelling. |
quant: | to specify different quantiles when using robust methods. |
structural: | whether to incorporate structural identification(separation of constant and varying effects) . |
sparse: | whether to use the spike-and-slab priors to impose sparsity. |
The function Blend() returns a Blend object that contains the posterior estimates of each coefficients and other useful information for selection(). S3 generic functions selection() and print() are implemented for Blend objects. selection() takes a Blend object and returns the variable selection results.
Fan, K., Ren, J., Ma, Shuangge and Wu, C. (2024). Bayesian Regularized Semiparametric Quantile Mixed Models in Longitudinal Studies. (submitted)
Fan, K., Subedi, S., Yang, G., Lu, X., Ren, J., and Wu, C. (2024). Is Seeing Believing? A Practitioner’s Perspective on High-Dimensional Statistical Inference in Cancer Genomics Studies. Entropy, 26(9), 794.
Ren, J., Zhou, F., Li, X., Ma, S., Jiang, Y. and Wu, C. (2023). Robust Bayesian variable selection for gene-environment interactions. Biometrics, 79(2), 684-694 doi:10.1111/biom.13670
Zhou, F., Ren, J., Ma, S. and Wu, C. (2023). The Bayesian regularized quantile varying coefficient model. Computational Statistics & Data Analysis, 187, 107808.
Zhou, F., Lu, X., Ren, J., Fan, K., Ma, S., & Wu, C. (2022). Sparse group variable selection for gene–environment interactions in the longitudinal study. Genetic epidemiology, 46(5-6), 317-340.
Zhou, F., Ren, J., Li, G., Jiang, Y., Li, X., Wang, W. and Wu, C. (2019). Penalized Variable Selection for Lipid-Environment Interactions in a Longitudinal Lipidomics Study. Genes, 10(12), 1002 doi:10.3390/genes10121002
Ren, J., Zhou, F., Li, X., Ma, S., Jiang, Y. and Wu, C. (2020). roben: Robust Bayesian Variable Selection for Gene-Environment Interactions. R package version 0.1.1. https://CRAN.R-project.org/package=roben
Zhou, F., Ren, J., Lu, X., Ma, S. and Wu, C. (2021). Gene–Environment Interaction: a Variable Selection Perspective. Epistasis. Methods in Molecular Biology. 2212:191–223 doi:10.1007/978-1-0716-0947-7_13
Ren, J., Zhou, F., Li, X., Chen, Q., Zhang, H., Ma, S., Jiang, Y. and Wu, C. (2020) Semi-parametric Bayesian variable selection for gene-environment interactions. Statistics in Medicine, 39: 617– 638 doi:10.1002/sim.8434
Ren, J., Zhou, F., Li, X., Wu, C. and Jiang, Y. (2019) spinBayes: Semi-Parametric Gene-Environment Interaction via Bayesian Variable Selection. R package version 0.1.0. https://CRAN.R-project.org/package=spinBayes
Wu, C., Jiang, Y., Ren, J., Cui, Y. and Ma, S. (2018). Dissecting gene-environment interactions: A penalized robust approach accounting for hierarchical structures. Statistics in Medicine, 37:437–456 doi:10.1002/sim.7518
Wu, C., Cui, Y., and Ma, S. (2014). Integrative analysis of gene–environment interactions under a multi–response partially linear varying coefficient model. Statistics in Medicine, 33(28), 4988–4998 doi:10.1002/sim.6287
Wu, C., Zhong, P.S. and Cui, Y. (2013). High dimensional variable selection for gene-environment interactions. Technical Report. Michigan State University.
fit a Bayesian longitudinal regularized semi-parametric quantile mixed model
Blend( y, x, t, J, kn, degree, iterations = 10000, burn.in = NULL, robust = TRUE, quant = 0.5, sparse = "TRUE", structural = TRUE )
Blend( y, x, t, J, kn, degree, iterations = 10000, burn.in = NULL, robust = TRUE, quant = 0.5, sparse = "TRUE", structural = TRUE )
y |
the vector of repeated - measured response variable. The current version of mixed only supports continuous response. |
x |
the matrix of repeated - measured predictors (genetic factors) with intercept. Each row should be an observation vector for each measurement. |
t |
the vector of scheduled time points. |
J |
the vector of number of repeated measurement for each subject. |
kn |
the number of interior knots for B-spline. |
degree |
the degree of B spline basis. |
iterations |
the number of MCMC iterations. |
burn.in |
the number of iterations for burn-in. |
robust |
logical flag. If TRUE, robust methods will be used. |
quant |
specify different quantiles when applying robust methods. |
sparse |
logical flag. If TRUE, spike-and-slab priors will be used to shrink coefficients of irrelevant covariates to zero exactly. |
structural |
logical flag. If TRUE, the coefficient functions with varying effects and constant effects will be penalized separately. |
Consider the data model described in "data
":
The basis expansion and changing of basis with B splines will be done automatically:
where represents B spline basis.
and
correspond to the constant and varying parts of the coefficient functional, respectively.
q=kn+degree+1 is the number of basis functions. By default, kn=degree=2. User can change the values of kn and degree to any other positive integers.
When 'structural=TRUE'(default), the coefficient functions with varying effects and constant effects will be penalized separately. Otherwise, the coefficient functions with varying effects and constant effects will be penalized together.
When 'sparse="TRUE"' (default), spike-and-slab priors are imposed on individual and/or group levels to identify important constant and varying effects. Otherwise, Laplacian shrinkage will be used.
When 'robust=TRUE' (default), the distribution of is defined as a Laplace distribution with density.
, (
), which leads to a Bayesian formulation of quantile regression. If 'robust=FALSE',
follows a normal distribution.
Please check the references for more details about the prior distributions.
an object of class ‘Blend’ is returned, which is a list with component:
posterior |
the posteriors of coefficients. |
coefficient |
the estimated coefficients. |
burn.in |
the total number of burn-ins. |
iterations |
the total number of iterations. |
data(dat) ## default method fit = Blend(y,x,t,J,kn,degree) fit$coefficient ## alternative: robust non-structural fit = Blend(y,x,t,J,kn,degree, structural=FALSE) fit$coefficient ## alternative: non-robust structural fit = Blend(y,x,t,J,kn,degree, robust=FALSE) fit$coefficient ## alternative: non-robust non-structural fit = Blend(y,x,t,J,kn,degree, robust=FALSE, structural=FALSE) fit$coefficient
data(dat) ## default method fit = Blend(y,x,t,J,kn,degree) fit$coefficient ## alternative: robust non-structural fit = Blend(y,x,t,J,kn,degree, structural=FALSE) fit$coefficient ## alternative: non-robust structural fit = Blend(y,x,t,J,kn,degree, robust=FALSE) fit$coefficient ## alternative: non-robust non-structural fit = Blend(y,x,t,J,kn,degree, robust=FALSE, structural=FALSE) fit$coefficient
calculate 95% coverage for varying effects and constant effects under example data
Coverage(x)
Coverage(x)
x |
Blend object. |
coverage
data(dat) fit = Blend(y,x,t,J,kn,degree) Coverage(fit)
data(dat) fit = Blend(y,x,t,J,kn,degree) Coverage(fit)
Simulated gene expression data for demonstrating the features of Blend.
The data object consists of 8 components: y, x, t, J, kn and degree.
The data and model setting
Consider a longitudinal study on subjects with
repeated measurements for each subject. Let
be the measurement for the
-th subject at each time point
,
. We use an
-dimensional vector
to denote the genetic factors, where
.
is a
covariate associated with random effects and
is a
vector of random effects corresponding to the random intercept and slope model. We have the following semi-parametric quantile mixed-effects model:
where the fixed effects include: (a) the varying intercept , and (b) the varying coefficients
.
The varying intercept and the varying coefficients for the genetic factors can be further expressed as and
.
For the random intercept and slope model, and
.
Furthermore, can be expressed as
,
where
,
, and
.
In the simulated data,
where ,
,
,
,
and
data(dat) length(y) dim(x) length(t) length(J) print(t) print(J) print(kn) print(degree)
data(dat) length(y) dim(x) length(t) length(J) print(t) print(J) print(kn) print(degree)
plot the identified varying effects
plot_Blend(x, sparse, prob=0.95)
plot_Blend(x, sparse, prob=0.95)
x |
Blend object. |
sparse |
sparsity. |
prob |
probability for credible interval, between 0 and 1. e.g. prob=0.95 leads to 95% credible interval |
plot
data(dat) fit = Blend(y,x,t,J,kn,degree) plot_Blend(fit,sparse=TRUE)
data(dat) fit = Blend(y,x,t,J,kn,degree) plot_Blend(fit,sparse=TRUE)
Variable selection for a Blend object
selection(obj, sparse)
selection(obj, sparse)
obj |
Blend object. |
sparse |
logical flag. If TRUE, spike-and-slab priors will be used to shrink coefficients of irrelevant covariates to zero exactly. |
If sparse, the median probability model (MPM) (Barbieri and Berger, 2004) is used to identify predictors that are significantly associated with the response variable. Otherwise, variable selection is based on 95% credible interval. Please check the references for more details about the variable selection.
an object of class ‘selection’ is returned, which is a list with component:
method |
posterior samples from the MCMC |
indices |
a list of indices and names of selected variables |
summary |
a summary of selected variables |
Ren, J., Zhou, F., Li, X., Ma, S., Jiang, Y. and Wu, C. (2023). Robust Bayesian variable selection for gene-environment interactions. Biometrics, 79(2), 684-694 doi:10.1111/biom.13670
Barbieri, M.M. and Berger, J.O. (2004). Optimal predictive model selection. Ann. Statist, 32(3):870–897
data(dat) ## sparse fit = Blend(y,x,t,J,kn,degree) selected=selection(fit,sparse=TRUE) selected ## non-sparse fit = Blend(y,x,t,J,kn,degree,sparse="FALSE") selected=selection(fit,sparse=FALSE) selected
data(dat) ## sparse fit = Blend(y,x,t,J,kn,degree) selected=selection(fit,sparse=TRUE) selected ## non-sparse fit = Blend(y,x,t,J,kn,degree,sparse="FALSE") selected=selection(fit,sparse=FALSE) selected