Title: | Tools for Regression Using Pre-Computed Summary Statistics |
---|---|
Description: | Defines functions to describe regression models using only pre-computed summary statistics (i.e. means, variances, and covariances) in place of individual participant data. Possible models include linear models for linear combinations, products, and logical combinations of phenotypes. Implements methods presented in Wolf et al. (2021) <doi:10.3389/fgene.2021.745901> Wolf et al. (2020) <doi:10.1142/9789811215636_0063> and Gasdaska et al. (2019) <doi:10.1142/9789813279827_0036>. |
Authors: | Jack Wolf [aut, cre, cph] , R Core Team and contributors worldwide [cph, aut] (Author and copyright holder of modified 'stats' fragments) |
Maintainer: | Jack Wolf <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.1.9000 |
Built: | 2024-11-01 03:08:28 UTC |
Source: | https://github.com/jackmwolf/pcsstools |
Compute an analysis of variance table for one or more linear model fitted using PCSS.
## S3 method for class 'pcsslm' anova(object, ...) ## S3 method for class 'pcsslmlist' anova(object, ..., scale = 0, test = "F")
## S3 method for class 'pcsslm' anova(object, ...) ## S3 method for class 'pcsslmlist' anova(object, ..., scale = 0, test = "F")
object , ...
|
objects of class |
scale |
numeric. An estimate of the noise variance |
test |
a character string specifying the test statistic to be used. Can
be one of |
An object of class "anova"
inheriting from class
"data.frame"
.
R Core Team and contributors worldwide. Modified by Jack Wolf
approx_and
approximates the linear model for the a conjunction
of m phenotypes as a function of a set of predictors.
approx_and( means, covs, n, predictors, add_intercept = TRUE, verbose = FALSE, response_assumption = "binary", ... )
approx_and( means, covs, n, predictors, add_intercept = TRUE, verbose = FALSE, response_assumption = "binary", ... )
means |
vector of predictor and response means with the last |
covs |
a matrix of the covariance of all model predictors and the
responses with the order of rows/columns corresponding to the order of
|
n |
sample size. |
predictors |
list of objects of class |
add_intercept |
logical. Should the linear model add an intercept term? |
verbose |
should output be printed to console? |
response_assumption |
character. Either |
... |
additional arguments |
an object of class "pcsslm"
.
An object of class "pcsslm"
is a list containing at least the
following components:
call |
the matched call |
terms |
the |
coefficients |
a |
sigma |
the square root of the estimated variance of the random error. |
df |
degrees of freedom, a 3-vector |
fstatistic |
a 3-vector with the value of the F-statistic with its numerator and denominator degrees of freedom. |
r.squared |
|
adj.r.squared |
the above |
cov.unscaled |
a |
Sum Sq |
a 3-vector with the model's Sum of Squares Regression (SSR), Sum of Squares Error (SSE), and Sum of Squares Total (SST). |
Wolf JM, Westra J, Tintle N (2021). “Using Summary Statistics to Model Multiplicative Combinations of Initially Analyzed Phenotypes With a Flexible Choice of Covariates.” Frontiers in Genetics, 12, 1962. ISSN 1664-8021, doi:10.3389/fgene.2021.745901, https://www.frontiersin.org/articles/10.3389/fgene.2021.745901/full.
Approximate the mean of Y conditional on X
approx_conditional(means, covs, response, n)
approx_conditional(means, covs, response, n)
means |
Vector of the mean of X and the mean of Y |
covs |
Matrix of covariances for X and Y |
response |
Character. If "binary" truncates means to interval [0, 1]. If "continuous" does not restrict. |
n |
Sample size |
A list of length 2 consisting of 2 functions that give the estimated conditional mean and conditional variance of Y as a function of X
approx_mult_prod
recursively estimates the covariances and means of a
set of responses. Estimates are approximated using all unique response
orderings and aggregated.
approx_mult_prod( means, covs, n, response, predictors, responses, verbose = FALSE )
approx_mult_prod( means, covs, n, response, predictors, responses, verbose = FALSE )
means |
a vector of predictor and response means with all response means at the end of the vector. |
covs |
covariance matrix of all predictors and responses with column
and row order corresponding to the order of |
n |
sample size (an integer). |
response |
a string. Currently supports |
predictors , responses
|
lists of objects of class |
verbose |
logical. |
A list containing the following elements:
means |
a vector of the (approximated) means of all predictors and the product of responses |
covs |
a matrix of (approximated) covariances between all predictors and the product of responses |
Wolf JM, Westra J, Tintle N (2021). “Using Summary Statistics to Model Multiplicative Combinations of Initially Analyzed Phenotypes With a Flexible Choice of Covariates.” Frontiers in Genetics, 12, 1962. ISSN 1664-8021, doi:10.3389/fgene.2021.745901, https://www.frontiersin.org/articles/10.3389/fgene.2021.745901/full.
approx_or
approximates the linear model for a disjunction of m
phenotypes as a function of a set of predictors.
approx_or( means, covs, n, predictors, add_intercept = TRUE, verbose = FALSE, response_assumption = "binary", ... )
approx_or( means, covs, n, predictors, add_intercept = TRUE, verbose = FALSE, response_assumption = "binary", ... )
means |
vector of predictor and response means with the last m means being the means of m binary responses to combine in a logical OR statement. |
covs |
a matrix of the covariance of all model predictors and the
responses with the order of rows/columns corresponding to the order of
|
n |
sample size. |
predictors |
list of objects of class |
add_intercept |
logical. Should the linear model add an intercept term? |
verbose |
should output be printed to console? |
response_assumption |
character. Either |
... |
additional arguments |
an object of class "pcsslm"
.
An object of class "pcsslm"
is a list containing at least the
following components:
call |
the matched call |
terms |
the |
coefficients |
a |
sigma |
the square root of the estimated variance of the random error. |
df |
degrees of freedom, a 3-vector |
fstatistic |
a 3-vector with the value of the F-statistic with its numerator and denominator degrees of freedom. |
r.squared |
|
adj.r.squared |
the above |
cov.unscaled |
a |
Sum Sq |
a 3-vector with the model's Sum of Squares Regression (SSR), Sum of Squares Error (SSE), and Sum of Squares Total (SST). |
Wolf JM, Westra J, Tintle N (2021). “Using Summary Statistics to Model Multiplicative Combinations of Initially Analyzed Phenotypes With a Flexible Choice of Covariates.” Frontiers in Genetics, 12, 1962. ISSN 1664-8021, doi:10.3389/fgene.2021.745901, https://www.frontiersin.org/articles/10.3389/fgene.2021.745901/full.
Approximate summary statistics for a product of phenotypes and a set of predictors
approx_prod_stats(means, covs, n, response, predictors)
approx_prod_stats(means, covs, n, response, predictors)
means |
Vector of means of predictors and the two phenotypes to be multiplied |
covs |
Covariance matrix of all predictors and the two phenotypes |
n |
Sample size |
response |
character. Either "binary" or "continuous". |
predictors |
a list of elements of class predictor |
A list with the predicted covariance matrix of all predictors and the product and the means of all predictors and the product.
Approximate the covariance of one response with an arbitrary product of responses.
approx_response_cov_recursive( ids, r_covs, r_means, n, responses, response, verbose = FALSE )
approx_response_cov_recursive( ids, r_covs, r_means, n, responses, response, verbose = FALSE )
ids |
Column ids of responses to use. First is taken alone while 2nd to last are to be multiplied |
r_covs |
Response covariance matrix |
r_means |
Response means (vector) |
n |
Sample size |
responses |
List of lists with elements of class predictor |
response |
Character, Either "binary" or "continuous" |
verbose |
logical |
A vector with the approximated covariance, and approximated mean and variance of the product
calculate_lm
describes the linear model of the last listed variable
in means
and covs
as a function of all other variables in
means
and covs
.
calculate_lm( means, covs, n, add_intercept = FALSE, keep_pcss = FALSE, terms = NULL )
calculate_lm( means, covs, n, add_intercept = FALSE, keep_pcss = FALSE, terms = NULL )
means |
a vector of means of all model predictors and the response with the last element the response mean. |
covs |
a matrix of the covariance of all model predictors and the
response with the order of rows/columns corresponding to the order of
|
n |
sample size |
add_intercept |
logical. If |
keep_pcss |
logical. If |
terms |
terms |
an object of class "pcsslm"
.
An object of class "pcsslm"
is a list containing at least the
following components:
call |
the matched call |
terms |
the |
coefficients |
a |
sigma |
the square root of the estimated variance of the random error. |
df |
degrees of freedom, a 3-vector |
fstatistic |
a 3-vector with the value of the F-statistic with its numerator and denominator degrees of freedom. |
r.squared |
|
adj.r.squared |
the above |
cov.unscaled |
a |
Sum Sq |
a 3-vector with the model's Sum of Squares Regression (SSR), Sum of Squares Error (SSE), and Sum of Squares Total (SST). |
Wolf JM, Westra J, Tintle N (2021). “Using Summary Statistics to Model Multiplicative Combinations of Initially Analyzed Phenotypes With a Flexible Choice of Covariates.” Frontiers in Genetics, 12, 1962. ISSN 1664-8021, doi:10.3389/fgene.2021.745901, https://www.frontiersin.org/articles/10.3389/fgene.2021.745901/full.
Wolf JM, Barnard M, Xia X, Ryder N, Westra J, Tintle N (2020). “Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks.” Pacific Symposium on Biocomputing, 25, 719–730. ISSN 2335-6928, doi:10.1142/9789811215636_0063, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907735/.
Gasdaska A, Friend D, Chen R, Westra J, Zawistowski M, Lindsey W, Tintle N (2019). “Leveraging summary statistics to make inferences about complex phenotypes in large biobanks.” Pacific Symposium on Biocomputing, 24, 391–402. ISSN 2335-6928, doi:10.1142/9789813279827_0036, https://pubmed.ncbi.nlm.nih.gov/30963077/.
calculate_lm_combo
describes the linear model for a linear combination
of responses as a function of a set of predictors.
calculate_lm_combo(means, covs, n, phi, m = length(phi), add_intercept, ...)
calculate_lm_combo(means, covs, n, phi, m = length(phi), add_intercept, ...)
means |
a vector of means of all model predictors and the response with
the last |
covs |
a matrix of the covariance of all model predictors and the
responses with the order of rows/columns corresponding to the order of
|
n |
sample size. |
phi |
vector of linear combination weights with one entry per response variable. |
m |
number of responses to combine. Defaults to |
add_intercept |
logical. If |
... |
additional arguments |
an object of class "pcsslm"
.
An object of class "pcsslm"
is a list containing at least the
following components:
call |
the matched call |
terms |
the |
coefficients |
a |
sigma |
the square root of the estimated variance of the random error. |
df |
degrees of freedom, a 3-vector |
fstatistic |
a 3-vector with the value of the F-statistic with its numerator and denominator degrees of freedom. |
r.squared |
|
adj.r.squared |
the above |
cov.unscaled |
a |
Sum Sq |
a 3-vector with the model's Sum of Squares Regression (SSR), Sum of Squares Error (SSE), and Sum of Squares Total (SST). |
Wolf JM, Barnard M, Xia X, Ryder N, Westra J, Tintle N (2020). “Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks.” Pacific Symposium on Biocomputing, 25, 719–730. ISSN 2335-6928, doi:10.1142/9789811215636_0063, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907735/.
Gasdaska A, Friend D, Chen R, Westra J, Zawistowski M, Lindsey W, Tintle N (2019). “Leveraging summary statistics to make inferences about complex phenotypes in large biobanks.” Pacific Symposium on Biocomputing, 24, 391–402. ISSN 2335-6928, doi:10.1142/9789813279827_0036, https://pubmed.ncbi.nlm.nih.gov/30963077/.
Check that independent and dependent variables are accounted for through PCSS
check_terms(xterms, yterms, pcssterms, pcsstype)
check_terms(xterms, yterms, pcssterms, pcsstype)
xterms , yterms
|
character vector of model's independent variables or variables combined to the dependent variable |
pcssterms |
character vector of variables with provided PCSS |
pcsstype |
character describing the PCSS being checked. Either
|
No return value, called for side effects
Extract independent variables from a formula
extract_predictors(formula = formula())
extract_predictors(formula = formula())
formula |
an object of class |
A list with a character vector of all predictors and a logical value indicating whether the model includes an intercept term.
Extract dependent variables from a formula as a string
extract_response(formula = formula())
extract_response(formula = formula())
formula |
an object of class |
a character vector of all responses
Approximate the partial correlation of Y and Z given X
get_pcor(covs, cors = cov2cor(covs))
get_pcor(covs, cors = cov2cor(covs))
covs |
Covariance matrix of X, Y, and Z. |
cors |
Correlation matrix of X, Y, and Z. |
Approximated partial correlation of the later two terms given the first
guess_response
takes a character vector of the dependent variable
from a formula
object and identifies which function separates the
individual variables that make up the response. It then returns the
model_*
function to model the appropriate response using PCSS.
guess_response(response = character())
guess_response(response = character())
response |
character. Output of |
A character. Either "model_combo"
, "model_product"
,
"model_or"
, "model_and"
, or "model_singular"
.
Lists all permutations of 1,2,...,m unique up to the first two elements
make_permutations(m)
make_permutations(m)
m |
number of elements to permute |
A list of vectors of permutations of 1,2,...,m.
model_and
approximates the linear model for the conjunction
of m phenotypes as a function of a set of predictors.
model_and(formula, n, means, covs, predictors, ...)
model_and(formula, n, means, covs, predictors, ...)
formula |
an object of class |
n |
sample size. |
means |
named vector of predictor and response means. |
covs |
named matrix of the covariance of all model predictors and the responses. |
predictors |
named list of objects of class |
... |
additional arguments |
an object of class "pcsslm"
.
An object of class "pcsslm"
is a list containing at least the
following components:
call |
the matched call |
terms |
the |
coefficients |
a |
sigma |
the square root of the estimated variance of the random error. |
df |
degrees of freedom, a 3-vector |
fstatistic |
a 3-vector with the value of the F-statistic with its numerator and denominator degrees of freedom. |
r.squared |
|
adj.r.squared |
the above |
cov.unscaled |
a |
Sum Sq |
a 3-vector with the model's Sum of Squares Regression (SSR), Sum of Squares Error (SSE), and Sum of Squares Total (SST). |
Wolf JM, Westra J, Tintle N (2021). “Using Summary Statistics to Model Multiplicative Combinations of Initially Analyzed Phenotypes With a Flexible Choice of Covariates.” Frontiers in Genetics, 12, 1962. ISSN 1664-8021, doi:10.3389/fgene.2021.745901, https://www.frontiersin.org/articles/10.3389/fgene.2021.745901/full.
ex_data <- pcsstools_example[c("g1", "x1", "y4", "y5")] head(ex_data) means <- colMeans(ex_data) covs <- cov(ex_data) n <- nrow(ex_data) predictors <- list( g1 = new_predictor_snp(maf = mean(ex_data$g1) / 2), x1 = new_predictor_normal(mean = mean(ex_data$x1), sd = sd(ex_data$x1)) ) model_and( y4 & y5 ~ g1 + x1, means = means, covs = covs, n = n, predictors = predictors ) summary(lm(y4 & y5 ~ g1 + x1, data = ex_data))
ex_data <- pcsstools_example[c("g1", "x1", "y4", "y5")] head(ex_data) means <- colMeans(ex_data) covs <- cov(ex_data) n <- nrow(ex_data) predictors <- list( g1 = new_predictor_snp(maf = mean(ex_data$g1) / 2), x1 = new_predictor_normal(mean = mean(ex_data$x1), sd = sd(ex_data$x1)) ) model_and( y4 & y5 ~ g1 + x1, means = means, covs = covs, n = n, predictors = predictors ) summary(lm(y4 & y5 ~ g1 + x1, data = ex_data))
model_combo
calculates the linear model for a linear combination of
phenotypes as a function of a set of predictors.
model_combo(formula, phi, n, means, covs, ...)
model_combo(formula, phi, n, means, covs, ...)
formula |
an object of class |
phi |
named vector of linear weights for each variable in the
dependent variable in |
n |
sample size. |
means |
named vector of predictor and response means. |
covs |
named matrix of the covariance of all model predictors and the responses. |
... |
additional arguments |
an object of class "pcsslm"
.
An object of class "pcsslm"
is a list containing at least the
following components:
call |
the matched call |
terms |
the |
coefficients |
a |
sigma |
the square root of the estimated variance of the random error. |
df |
degrees of freedom, a 3-vector |
fstatistic |
a 3-vector with the value of the F-statistic with its numerator and denominator degrees of freedom. |
r.squared |
|
adj.r.squared |
the above |
cov.unscaled |
a |
Sum Sq |
a 3-vector with the model's Sum of Squares Regression (SSR), Sum of Squares Error (SSE), and Sum of Squares Total (SST). |
Wolf JM, Barnard M, Xia X, Ryder N, Westra J, Tintle N (2020). “Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks.” Pacific Symposium on Biocomputing, 25, 719–730. ISSN 2335-6928, doi:10.1142/9789811215636_0063, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907735/.
Gasdaska A, Friend D, Chen R, Westra J, Zawistowski M, Lindsey W, Tintle N (2019). “Leveraging summary statistics to make inferences about complex phenotypes in large biobanks.” Pacific Symposium on Biocomputing, 24, 391–402. ISSN 2335-6928, doi:10.1142/9789813279827_0036, https://pubmed.ncbi.nlm.nih.gov/30963077/.
ex_data <- pcsstools_example[c("g1", "x1", "x2", "x3", "y1", "y2", "y3")] head(ex_data) means <- colMeans(ex_data) covs <- cov(ex_data) n <- nrow(ex_data) phi <- c("y1" = 1, "y2" = -1, "y3" = 0.5) model_combo( y1 + y2 + y3 ~ g1 + x1 + x2 + x3, phi = phi, n = n, means = means, covs = covs ) summary(lm(y1 - y2 + 0.5 * y3 ~ g1 + x1 + x2 + x3, data = ex_data))
ex_data <- pcsstools_example[c("g1", "x1", "x2", "x3", "y1", "y2", "y3")] head(ex_data) means <- colMeans(ex_data) covs <- cov(ex_data) n <- nrow(ex_data) phi <- c("y1" = 1, "y2" = -1, "y3" = 0.5) model_combo( y1 + y2 + y3 ~ g1 + x1 + x2 + x3, phi = phi, n = n, means = means, covs = covs ) summary(lm(y1 - y2 + 0.5 * y3 ~ g1 + x1 + x2 + x3, data = ex_data))
model_or
approximates the linear model for the a disjunction
of m phenotypes as a function of a set of predictors.
model_or(formula, n, means, covs, predictors, ...)
model_or(formula, n, means, covs, predictors, ...)
formula |
an object of class |
n |
sample size. |
means |
named vector of predictor and response means. |
covs |
named matrix of the covariance of all model predictors and the responses. |
predictors |
named list of objects of class |
... |
additional arguments |
an object of class "pcsslm"
.
An object of class "pcsslm"
is a list containing at least the
following components:
call |
the matched call |
terms |
the |
coefficients |
a |
sigma |
the square root of the estimated variance of the random error. |
df |
degrees of freedom, a 3-vector |
fstatistic |
a 3-vector with the value of the F-statistic with its numerator and denominator degrees of freedom. |
r.squared |
|
adj.r.squared |
the above |
cov.unscaled |
a |
Sum Sq |
a 3-vector with the model's Sum of Squares Regression (SSR), Sum of Squares Error (SSE), and Sum of Squares Total (SST). |
Wolf JM, Westra J, Tintle N (2021). “Using Summary Statistics to Model Multiplicative Combinations of Initially Analyzed Phenotypes With a Flexible Choice of Covariates.” Frontiers in Genetics, 12, 1962. ISSN 1664-8021, doi:10.3389/fgene.2021.745901, https://www.frontiersin.org/articles/10.3389/fgene.2021.745901/full.
ex_data <- pcsstools_example[c("g1", "x1", "y4", "y5")] head(ex_data) means <- colMeans(ex_data) covs <- cov(ex_data) n <- nrow(ex_data) predictors <- list( g1 = new_predictor_snp(maf = mean(ex_data$g1) / 2), x1 = new_predictor_normal(mean = mean(ex_data$x1), sd = sd(ex_data$x1)) ) model_or( y4 | y5 ~ g1 + x1, means = means, covs = covs, n = n, predictors = predictors ) summary(lm(y4 | y5 ~ g1 + x1, data = ex_data))
ex_data <- pcsstools_example[c("g1", "x1", "y4", "y5")] head(ex_data) means <- colMeans(ex_data) covs <- cov(ex_data) n <- nrow(ex_data) predictors <- list( g1 = new_predictor_snp(maf = mean(ex_data$g1) / 2), x1 = new_predictor_normal(mean = mean(ex_data$x1), sd = sd(ex_data$x1)) ) model_or( y4 | y5 ~ g1 + x1, means = means, covs = covs, n = n, predictors = predictors ) summary(lm(y4 | y5 ~ g1 + x1, data = ex_data))
model_prcomp
calculates the linear model for the mth principal
component score of a set of phenotypes as a function of a set of
predictors.
model_prcomp( formula, comp = 1, n, means, covs, center = FALSE, standardize = FALSE, ... )
model_prcomp( formula, comp = 1, n, means, covs, center = FALSE, standardize = FALSE, ... )
formula |
an object of class |
comp |
integer indicating which principal component score to analyze. Must be less than or equal to the total number of phenotypes. |
n |
sample size. |
means |
named vector of predictor and response means. |
covs |
named matrix of the covariance of all model predictors and the responses. |
center |
logical. Should the dependent variables be centered before principal components are calculated? |
standardize |
logical. Should the dependent variables be standardized before principal components are calculated? |
... |
additional arguments |
an object of class "pcsslm"
.
An object of class "pcsslm"
is a list containing at least the
following components:
call |
the matched call |
terms |
the |
coefficients |
a |
sigma |
the square root of the estimated variance of the random error. |
df |
degrees of freedom, a 3-vector |
fstatistic |
a 3-vector with the value of the F-statistic with its numerator and denominator degrees of freedom. |
r.squared |
|
adj.r.squared |
the above |
cov.unscaled |
a |
Sum Sq |
a 3-vector with the model's Sum of Squares Regression (SSR), Sum of Squares Error (SSE), and Sum of Squares Total (SST). |
Wolf JM, Barnard M, Xia X, Ryder N, Westra J, Tintle N (2020). “Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks.” Pacific Symposium on Biocomputing, 25, 719–730. ISSN 2335-6928, doi:10.1142/9789811215636_0063, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907735/.
ex_data <- pcsstools_example[c("g1", "x1", "x2", "y1", "y2", "y3")] head(ex_data) means <- colMeans(ex_data) covs <- cov(ex_data) n <- nrow(ex_data) model_prcomp( y1 + y2 + y3 ~ g1 + x1 + x2, comp = 1, n = n, means = means, covs = covs )
ex_data <- pcsstools_example[c("g1", "x1", "x2", "y1", "y2", "y3")] head(ex_data) means <- colMeans(ex_data) covs <- cov(ex_data) n <- nrow(ex_data) model_prcomp( y1 + y2 + y3 ~ g1 + x1 + x2, comp = 1, n = n, means = means, covs = covs )
model_product
approximates the linear model for the product
of m phenotypes as a function of a set of predictors.
model_product( formula, n, means, covs, predictors, responses = NULL, response = "continuous", ... )
model_product( formula, n, means, covs, predictors, responses = NULL, response = "continuous", ... )
formula |
an object of class |
n |
sample size. |
means |
named vector of predictor and response means. |
covs |
named matrix of the covariance of all model predictors and the responses. |
predictors |
named list of objects of class |
responses |
named list of objects of class |
response |
character. Describe distribution of all product terms.
Either |
... |
additional arguments |
an object of class "pcsslm"
.
An object of class "pcsslm"
is a list containing at least the
following components:
call |
the matched call |
terms |
the |
coefficients |
a |
sigma |
the square root of the estimated variance of the random error. |
df |
degrees of freedom, a 3-vector |
fstatistic |
a 3-vector with the value of the F-statistic with its numerator and denominator degrees of freedom. |
r.squared |
|
adj.r.squared |
the above |
cov.unscaled |
a |
Sum Sq |
a 3-vector with the model's Sum of Squares Regression (SSR), Sum of Squares Error (SSE), and Sum of Squares Total (SST). |
Wolf JM, Westra J, Tintle N (2021). “Using Summary Statistics to Model Multiplicative Combinations of Initially Analyzed Phenotypes With a Flexible Choice of Covariates.” Frontiers in Genetics, 12, 1962. ISSN 1664-8021, doi:10.3389/fgene.2021.745901, https://www.frontiersin.org/articles/10.3389/fgene.2021.745901/full.
ex_data <- pcsstools_example[c("g1", "g2", "g3", "x1", "y4", "y5", "y6")] head(ex_data) means <- colMeans(ex_data) covs <- cov(ex_data) n <- nrow(ex_data) predictors <- list( g1 = new_predictor_snp(maf = mean(ex_data$g1) / 2), g2 = new_predictor_snp(maf = mean(ex_data$g2) / 2), g3 = new_predictor_snp(maf = mean(ex_data$g3) / 2), x1 = new_predictor_normal(mean = mean(ex_data$x1), sd = sd(ex_data$x1)) ) responses <- lapply(means[c("y4", "y5", "y6")], new_predictor_binary) model_product( y4 * y5 * y6 ~ g1 + g2 + g3 + x1, means = means, covs = covs, n = n, predictors = predictors, responses = responses, response = "binary" ) summary(lm(y4 * y5 * y6 ~ g1 + g2 + g3 + x1, data = ex_data))
ex_data <- pcsstools_example[c("g1", "g2", "g3", "x1", "y4", "y5", "y6")] head(ex_data) means <- colMeans(ex_data) covs <- cov(ex_data) n <- nrow(ex_data) predictors <- list( g1 = new_predictor_snp(maf = mean(ex_data$g1) / 2), g2 = new_predictor_snp(maf = mean(ex_data$g2) / 2), g3 = new_predictor_snp(maf = mean(ex_data$g3) / 2), x1 = new_predictor_normal(mean = mean(ex_data$x1), sd = sd(ex_data$x1)) ) responses <- lapply(means[c("y4", "y5", "y6")], new_predictor_binary) model_product( y4 * y5 * y6 ~ g1 + g2 + g3 + x1, means = means, covs = covs, n = n, predictors = predictors, responses = responses, response = "binary" ) summary(lm(y4 * y5 * y6 ~ g1 + g2 + g3 + x1, data = ex_data))
model_singular
calculates the linear model for a singular
phenotype as a function of a set of predictors.
model_singular(formula, n, means, covs, ...)
model_singular(formula, n, means, covs, ...)
formula |
an object of class |
n |
sample size. |
means |
named vector of predictor and response means. |
covs |
named matrix of the covariance of all model predictors and the responses. |
... |
additional arguments |
an object of class "pcsslm"
.
An object of class "pcsslm"
is a list containing at least the
following components:
call |
the matched call |
terms |
the |
coefficients |
a |
sigma |
the square root of the estimated variance of the random error. |
df |
degrees of freedom, a 3-vector |
fstatistic |
a 3-vector with the value of the F-statistic with its numerator and denominator degrees of freedom. |
r.squared |
|
adj.r.squared |
the above |
cov.unscaled |
a |
Sum Sq |
a 3-vector with the model's Sum of Squares Regression (SSR), Sum of Squares Error (SSE), and Sum of Squares Total (SST). |
Wolf JM, Barnard M, Xia X, Ryder N, Westra J, Tintle N (2020). “Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks.” Pacific Symposium on Biocomputing, 25, 719–730. ISSN 2335-6928, doi:10.1142/9789811215636_0063, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907735/.
ex_data <- pcsstools_example[c("g1", "x1", "y1")] means <- colMeans(ex_data) covs <- cov(ex_data) n <- nrow(ex_data) model_singular( y1 ~ g1 + x1, n = n, means = means, covs = covs ) summary(lm(y1 ~ g1 + x1, data = ex_data))
ex_data <- pcsstools_example[c("g1", "x1", "y1")] means <- colMeans(ex_data) covs <- cov(ex_data) n <- nrow(ex_data) model_singular( y1 ~ g1 + x1, n = n, means = means, covs = covs ) summary(lm(y1 ~ g1 + x1, data = ex_data))
Create an object of class "predictor"
new_predictor( f = function() { }, predictor_type = character(), lb, ub, support )
new_predictor( f = function() { }, predictor_type = character(), lb, ub, support )
f |
a function that gives the probability mass/distribution function of a random variable. |
predictor_type |
a character describing the random variable. Either "discrete" or "continuous". |
lb , ub
|
if |
support |
if |
an object of class "predictor"
.
new_predictor_normal
,
new_predictor_snp
and new_predictor_binary
.
new_predictor( f = function(x0) dnorm(x0, mean = 0, sd = 1), predictor_type = "continuous", lb = -Inf, ub = Inf )
new_predictor( f = function(x0) dnorm(x0, mean = 0, sd = 1), predictor_type = "continuous", lb = -Inf, ub = Inf )
new_predictor_binary
calls new_predictor
new_predictor_binary(p)
new_predictor_binary(p)
p |
probability of success (predictor mean) |
an object of class "predictor"
.
new_predictor_binary(p = 0.75)
new_predictor_binary(p = 0.75)
new_predictor_normal
calls new_predictor
new_predictor_normal(mean, sd)
new_predictor_normal(mean, sd)
mean |
predictor mean (double). |
sd |
predictor standard deviation (double) |
an object of class "predictor"
.
new_predictor_normal(mean = 10, sd = 1)
new_predictor_normal(mean = 10, sd = 1)
new_predictor_snp
calls new_predictor
new_predictor_snp(maf)
new_predictor_snp(maf)
maf |
minor allele frequency |
an object of class "predictor"
.
new_predictor_snp(maf = 0.3)
new_predictor_snp(maf = 0.3)
pcsslm
approximates a linear model of a combination of variables using
precomputed summary statistics.
pcsslm(formula, pcss = list(), ...)
pcsslm(formula, pcss = list(), ...)
formula |
an object of class formula whose dependent variable is a
combination of variables and logical | operators.
All model terms must have appropriate PCSS in |
pcss |
a list of precomputed summary statistics. In all cases, this
should include |
... |
additional arguments. See Details for more information. |
pcsslm
parses the input formula
's dependent variable for
functions such as sums (+
), products (*
), or logical
operators (|
and &
).
It then identifies models the combination of variables using one of
model_combo
, model_product
,
model_or
, model_and
, or
model_prcomp
.
Different precomputed summary statistics are needed inside pcss
depending on the function that combines the dependent variable.
For linear combinations (and principal component analysis), only
n
, means
, and covs
are required
For products and logical combinations, the additional items
predictors
and responses
are required.
These are named lists of objects of class predictor
generated by new_predictor
, with a predictor
object for each independent variable in predictors
and
each dependent variable in responses
.
However, if only modeling the product or logical combination of
only two variables, responses
can be NULL
without
consequence.
If modeling a principal component score of a set of variables, include
the argument comp
where comp
is an integer indicating which
principal component score to analyze. Optional logical arguments
center
and standardize
determine if responses should be
centered and standardized before principal components are calculated.
If modeling a linear combination, include the argument phi
, a named
vector of linear weights for each variable in the dependent variable in
formula.
If modeling a product, include the argument response
, a character
equal to either "continuous"
or "binary"
. If "binary"
,
specialized approximations are performed to estimate means and variances.
an object of class "pcsslm"
.
An object of class "pcsslm"
is a list containing at least the
following components:
call |
the matched call |
terms |
the |
coefficients |
a |
sigma |
the square root of the estimated variance of the random error. |
df |
degrees of freedom, a 3-vector |
fstatistic |
a 3-vector with the value of the F-statistic with its numerator and denominator degrees of freedom. |
r.squared |
|
adj.r.squared |
the above |
cov.unscaled |
a |
Sum Sq |
a 3-vector with the model's Sum of Squares Regression (SSR), Sum of Squares Error (SSE), and Sum of Squares Total (SST). |
Wolf JM, Westra J, Tintle N (2021). “Using Summary Statistics to Model Multiplicative Combinations of Initially Analyzed Phenotypes With a Flexible Choice of Covariates.” Frontiers in Genetics, 12, 1962. ISSN 1664-8021, doi:10.3389/fgene.2021.745901, https://www.frontiersin.org/articles/10.3389/fgene.2021.745901/full.
Wolf JM, Barnard M, Xia X, Ryder N, Westra J, Tintle N (2020). “Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks.” Pacific Symposium on Biocomputing, 25, 719–730. ISSN 2335-6928, doi:10.1142/9789811215636_0063, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907735/.
Gasdaska A, Friend D, Chen R, Westra J, Zawistowski M, Lindsey W, Tintle N (2019). “Leveraging summary statistics to make inferences about complex phenotypes in large biobanks.” Pacific Symposium on Biocomputing, 24, 391–402. ISSN 2335-6928, doi:10.1142/9789813279827_0036, https://pubmed.ncbi.nlm.nih.gov/30963077/.
model_combo
, model_product
,
model_or
, model_and
, and
model_prcomp
.
## Principal Component Analysis ex_data <- pcsstools_example[c("g1", "x1", "y1", "y2", "y3")] pcss <- list( means = colMeans(ex_data), covs = cov(ex_data), n = nrow(ex_data) ) pcsslm(y1 + y2 + y3 ~ g1 + x1, pcss = pcss, comp = 1) ## Linear combination of variables ex_data <- pcsstools_example[c("g1", "g2", "y1", "y2")] pcss <- list( means = colMeans(ex_data), covs = cov(ex_data), n = nrow(ex_data) ) pcsslm(y1 + y2 ~ g1 + g2, pcss = pcss, phi = c(1, -1)) summary(lm(y1 - y2 ~ g1 + g2, data = ex_data)) ## Product of variables ex_data <- pcsstools_example[c("g1", "x1", "y4", "y5", "y6")] pcss <- list( means = colMeans(ex_data), covs = cov(ex_data), n = nrow(ex_data), predictors = list( g1 = new_predictor_snp(maf = mean(ex_data$g1) / 2), x1 = new_predictor_normal(mean = mean(ex_data$x1), sd = sd(ex_data$x1)) ), responses = lapply( colMeans(ex_data)[3:length(colMeans(ex_data))], new_predictor_binary ) ) pcsslm(y4 * y5 * y6 ~ g1 + x1, pcss = pcss, response = "binary") summary(lm(y4 * y5 * y6 ~ g1 + x1, data = ex_data)) ## Disjunct (OR statement) of variables ex_data <- pcsstools_example[c("g1", "x1", "y4", "y5")] pcss <- list( means = colMeans(ex_data), covs = cov(ex_data), n = nrow(ex_data), predictors = list( g1 = new_predictor_snp(maf = mean(ex_data$g1) / 2), x1 = new_predictor_normal(mean = mean(ex_data$x1), sd = sd(ex_data$x1)) ) ) pcsslm(y4 | y5 ~ g1 + x1, pcss = pcss) summary(lm(y4 | y5 ~ g1 + x1, data = ex_data))
## Principal Component Analysis ex_data <- pcsstools_example[c("g1", "x1", "y1", "y2", "y3")] pcss <- list( means = colMeans(ex_data), covs = cov(ex_data), n = nrow(ex_data) ) pcsslm(y1 + y2 + y3 ~ g1 + x1, pcss = pcss, comp = 1) ## Linear combination of variables ex_data <- pcsstools_example[c("g1", "g2", "y1", "y2")] pcss <- list( means = colMeans(ex_data), covs = cov(ex_data), n = nrow(ex_data) ) pcsslm(y1 + y2 ~ g1 + g2, pcss = pcss, phi = c(1, -1)) summary(lm(y1 - y2 ~ g1 + g2, data = ex_data)) ## Product of variables ex_data <- pcsstools_example[c("g1", "x1", "y4", "y5", "y6")] pcss <- list( means = colMeans(ex_data), covs = cov(ex_data), n = nrow(ex_data), predictors = list( g1 = new_predictor_snp(maf = mean(ex_data$g1) / 2), x1 = new_predictor_normal(mean = mean(ex_data$x1), sd = sd(ex_data$x1)) ), responses = lapply( colMeans(ex_data)[3:length(colMeans(ex_data))], new_predictor_binary ) ) pcsslm(y4 * y5 * y6 ~ g1 + x1, pcss = pcss, response = "binary") summary(lm(y4 * y5 * y6 ~ g1 + x1, data = ex_data)) ## Disjunct (OR statement) of variables ex_data <- pcsstools_example[c("g1", "x1", "y4", "y5")] pcss <- list( means = colMeans(ex_data), covs = cov(ex_data), n = nrow(ex_data), predictors = list( g1 = new_predictor_snp(maf = mean(ex_data$g1) / 2), x1 = new_predictor_normal(mean = mean(ex_data$x1), sd = sd(ex_data$x1)) ) ) pcsslm(y4 | y5 ~ g1 + x1, pcss = pcss) summary(lm(y4 | y5 ~ g1 + x1, data = ex_data))
A dataset containing simulated genetic data with 3 SNPs, 3 continuous covariates, and 6 continuous phenotypes.
pcsstools_example
pcsstools_example
A data frame with 1000 rows and 12 columns:
Minor allele counts at three sites
Continuous covariates
Continuous phenotypes
Binary phenotypes
Prints a linear model fit through pre-computed summary statistics
## S3 method for class 'pcsslm' print( x, digits = max(3L, getOption("digits") - 3L), symbolic.cor = x$symbolic.cor, signif.stars = getOption("show.signif.stars"), ... )
## S3 method for class 'pcsslm' print( x, digits = max(3L, getOption("digits") - 3L), symbolic.cor = x$symbolic.cor, signif.stars = getOption("show.signif.stars"), ... )
x |
an object of class |
digits |
the number of significant digits to use when printing. |
symbolic.cor |
logical. If |
signif.stars |
logical. If |
... |
further arguments passed to or from other methods. |
an object of class "pcsslm"
.
An object of class "pcsslm"
is a list containing at least the
following components:
call |
the matched call |
terms |
the |
coefficients |
a |
sigma |
the square root of the estimated variance of the random error. |
df |
degrees of freedom, a 3-vector |
fstatistic |
a 3-vector with the value of the F-statistic with its numerator and denominator degrees of freedom. |
r.squared |
|
adj.r.squared |
the above |
cov.unscaled |
a |
Sum Sq |
a 3-vector with the model's Sum of Squares Regression (SSR), Sum of Squares Error (SSE), and Sum of Squares Total (SST). |
R Core Team and contributors worldwide. Modified by Jack Wolf