Package 'tehtuner' reference manual

Title:	Fit and Tune Models to Detect Treatment Effect Heterogeneity
Description:	Implements methods to fit Virtual Twins models (Foster et al. (2011) <doi:10.1002/sim.4322>) for identifying subgroups with differential effects in the context of clinical trials while controlling the probability of falsely detecting a differential effect when the conditional average treatment effect is uniform across the study population using parameter selection methods proposed in Wolf et al. (2022) <doi:10.1177/17407745221095855>.
Authors:	Jack Wolf [aut, cre]
Maintainer:	Jack Wolf <[email protected]>
License:	GPL (>= 3)
Version:	0.3.2
Built:	2025-02-28 04:47:44 UTC
Source:	https://github.com/jackmwolf/tehtuner

Get the MNPP for the Step 2 model

Description

Find the lowest penalty parameter so that the Step 2 model fit for the estimated CATE from Step 1 is constant for all subjects.

Usage

get_mnpp(z, data, step2, Trt, Y, threshold)
get_mnpp(z, data, step2, Trt, Y, threshold)

Arguments

`z`	a numeric vector of estimated CATEs from Step 1
`data`	a data frame containing a response, binary treatment indicators, and covariates.
`step2`	a character string specifying the Step 2 model. Supports "`lasso`", "`rtree`", "`classtree`", or "`ctree`".
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`Y`	a string specifying the name of the column of `data` contains the response.
`threshold`	for "`step2 = 'classtree'`" only. The value against which to test if the estimated individual treatment effect from Step 1 is higher (TRUE) or lower (FALSE).

Get the MNPP for a Classification Tree

Description

Finds the lowest complexity parameter for a null regression tree fit

Usage

get_mnpp.classtree(z, data, Trt, Y, threshold)
get_mnpp.classtree(z, data, Trt, Y, threshold)

Arguments

`z`	a numeric vector of estimated CATEs from Step 1
`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`Y`	a string specifying the name of the column of `data` contains the response.
`threshold`	for "`step2 = 'classtree'`" only. The value against which to test if the estimated individual treatment effect from Step 1 is higher (TRUE) or lower (FALSE).

Value

the MNPP

Get the MNPP for a Conditional Inference Tree

Description

Finds the lowest test statistic for a null conditional inference tree

Usage

get_mnpp.ctree(z, data, Trt, Y)
get_mnpp.ctree(z, data, Trt, Y)

Arguments

`z`	a numeric vector of estimated CATEs from Step 1
`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`Y`	a string specifying the name of the column of `data` contains the response.

Value

the MNPP

Get the MNPP for a Model fit via Lasso

Description

Finds the lowest penalty parameter for a null lasso model.

Usage

get_mnpp.lasso(z, data, Trt, Y)
get_mnpp.lasso(z, data, Trt, Y)

Arguments

`z`	a numeric vector of estimated CATEs from Step 1
`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`Y`	a string specifying the name of the column of `data` contains the response.

Get the MNPP for a Regression Tree

Description

Finds the lowest complexity parameter for a null regression tree fit

Usage

get_mnpp.rtree(z, data, Trt, Y)
get_mnpp.rtree(z, data, Trt, Y)

Arguments

`z`	a numeric vector of estimated CATEs from Step 1
`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`Y`	a string specifying the name of the column of `data` contains the response.

Value

the MNPP

Permute a dataset under the null hypothesis and get the MNPP

Description

Permute a dataset under the null hypothesis and get the MNPP

Usage

get_theta_null(data, Trt, Y, zbar, step1, step2, threshold, ...)
get_theta_null(data, Trt, Y, zbar, step1, step2, threshold, ...)

Arguments

`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`Y`	a string specifying the name of the column of `data` contains the response.
`zbar`	the estimated marginal treatment effect
`step1`	character strings specifying the Step 1 model. Supports either "`lasso`", "`mars`", "`randomforest`", or "`superlearner`".
`step2`	a character string specifying the Step 2 model. Supports "`lasso`", "`rtree`", "`classtree`", or "`ctree`".
`threshold`	for "`step2 = 'classtree'`" only. The value against which to test if the estimated individual treatment effect from Step 1 is higher (TRUE) or lower (FALSE).
`...`	additional arguments to the Step 1 model call.

Value

the MNPP for the permuted data set

Get the appropriate Step 1 estimation function associated with a method

Description

Get the appropriate Step 1 estimation function associated with a method

Usage

get_vt1(step1)
get_vt1(step1)

Arguments

step1

character strings specifying the Step 1 model. Supports either "lasso", "mars", "randomforest", or "superlearner".

Value

a function that estimates the CATE through Step 1 of Virtual Twins

Get the appropriate Step 2 estimation function associated with a method

Description

Get the appropriate Step 2 estimation function associated with a method

Usage

get_vt2(step2)
get_vt2(step2)

Arguments

step2

a character string specifying the Step 2 model. Supports "lasso", "rtree", "classtree", or "ctree".

Value

a function that fits a model for the CATE through Step 2 of Virtual Twins

Generate a dataset with permuted treatment indicators

Description

Sets the marginal treatment effect to zero and then permute all treatment indicators.

Usage

permute(data, Trt, Y, zbar)
permute(data, Trt, Y, zbar)

Arguments

`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`Y`	a string specifying the name of the column of `data` contains the response.
`zbar`	the estimated marginal treatment effect

Value

a permuted dataset of the same size as data

Print an object of class tunevt

Description

Prints a Virtual Twins model for the conditional average treatment effect with a tuned Step 2 model.

Usage

## S3 method for class 'tunevt'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'tunevt'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

Arguments

`x`	an object of class `tunevt`
`digits`	the number of significant digits to use when printing.
`...`	further arguments passed to or from other methods.

Value

An object of class "tunevt".

An object of class "tunevt" is a list containing at least the following components:

`call`	the matched call
`vtmod`	the model estimated by the given `step2` procedure fit with the permuted tuning parameter for the estimated CATEs from the `step1` model. See `vt2_lasso`, `vt2_rtree`, or `vt2_ctree` for specifics.
`mnpp`	the MNPP for the estimated CATEs from Step 1.
`theta_null`	a vector of the MNPPs from each permutation under the null hypothesis.
`pvalue`	the probability of observing a MNPP as or more extreme as the observed MNPP under the null hypothesis of no effect heterogeneity.
`z`	if `keepz = TRUE`, the estimated CATEs from the `step1` model.

Simulated example data

Description

Simulated data from a clinical trial with heterogeneous treatment effects where the CATE was a function of V1 and V9.

Usage

tehtuner_example
tehtuner_example

Format

A data frame with 1000 rows and 12 columns:

Trt: Binary treatment indicator
Y: Continuous response
V1,V2,V3,V4,V5,V6,V7,V8: Continuous covariates
V9,V10: Binary covariates

Test if a Value Gives a Null Conditional Inference Tree

Description

Fits a conditional inference tree with minimal test statistic theta and tests if the tree has more than one terminal node.

Usage

test_null_theta_ctree(theta, z, data, Trt, Y)
test_null_theta_ctree(theta, z, data, Trt, Y)

Arguments

`theta`	a positive double
`z`	a numeric vector of estimated CATEs from Step 1
`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`Y`	a string specifying the name of the column of `data` contains the response.

Value

a boolean. True if theta is large enough to give a null conditional inference tree. False otherwise.

Estimate the penalty parameter for Step 2 of Virtual Twins

Description

Permutes data under the null hypothesis of a constant treatment effect and calculates the MNPP on each permuted data set. The 1 - alpha quantile of the distribution is taken.

Usage

tune_theta(
  data,
  Trt,
  Y,
  zbar,
  step1,
  step2,
  threshold,
  alpha0,
  p_reps,
  parallel,
  ...
)
tune_theta(
  data,
  Trt,
  Y,
  zbar,
  step1,
  step2,
  threshold,
  alpha0,
  p_reps,
  parallel,
  ...
)

Arguments

`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`Y`	a string specifying the name of the column of `data` contains the response.
`zbar`	the estimated marginal treatment effect
`step1`	character strings specifying the Step 1 model. Supports either "`lasso`", "`mars`", "`randomforest`", or "`superlearner`".
`step2`	a character string specifying the Step 2 model. Supports "`lasso`", "`rtree`", "`classtree`", or "`ctree`".
`threshold`	for "`step2 = 'classtree'`" only. The value against which to test if the estimated individual treatment effect from Step 1 is higher (TRUE) or lower (FALSE).
`alpha0`	the nominal Type I error rate.
`p_reps`	the number of permutations to run.
`parallel`	Should the loop over replications be parallelized? If `FALSE`, then no, if `TRUE`, then yes. Note that running in parallel requires a parallel backend that must be registered before performing the computation. See the `foreach` documentation for more details.
`...`	additional arguments to the Step 1 model call.

Value

the estimated penalty parameter

Fit a tuned Virtual Twins model

Description

tunevt fits a Virtual Twins model to estimate factors and subgroups associated with differential treatment effects while controlling the Type I error rate of falsely detecting at least one heterogeneous effect when the treatment effect is uniform across the study population.

Usage

tunevt(
  data,
  Y = "Y",
  Trt = "Trt",
  step1 = "randomforest",
  step2 = "rtree",
  alpha0,
  p_reps,
  threshold = NA,
  keepz = FALSE,
  parallel = FALSE,
  ...
)
tunevt(
  data,
  Y = "Y",
  Trt = "Trt",
  step1 = "randomforest",
  step2 = "rtree",
  alpha0,
  p_reps,
  threshold = NA,
  keepz = FALSE,
  parallel = FALSE,
  ...
)

Arguments

`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Y`	a string specifying the name of the column of `data` contains the response.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`step1`	character strings specifying the Step 1 model. Supports either "`lasso`", "`mars`", "`randomforest`", or "`superlearner`".
`step2`	a character string specifying the Step 2 model. Supports "`lasso`", "`rtree`", "`classtree`", or "`ctree`".
`alpha0`	the nominal Type I error rate.
`p_reps`	the number of permutations to run.
`threshold`	for "`step2 = 'classtree'`" only. The value against which to test if the estimated individual treatment effect from Step 1 is higher (TRUE) or lower (FALSE).
`keepz`	logical. Should the estimated CATE from Step 1 be returned?
`parallel`	Should the loop over replications be parallelized? If `FALSE`, then no, if `TRUE`, then yes. Note that running in parallel requires a parallel backend that must be registered before performing the computation. See the `foreach` documentation for more details.
`...`	additional arguments to the Step 1 model call.

Details

Virtual Twins is a two-step approach to detecting differential treatment effects. Subjects' conditional average treatment effects (CATEs) are first estimated in Step 1 using a flexible model. Then, a simple and interpretable model is fit in Step 2 to model either (1) the expected value of these estimated CATEs if step2 is equal to "lasso", "rtree", or "ctree" or (2) the probability that the CATE is greater than a specified threshold if step2 is equal to "classtree".

The Step 2 model is dependent on some tuning parameter. This parameter is selected to control the Type I error rate by permuting the data under the null hypothesis of a constant treatment effect and identifying the minimal null penalty parameter (MNPP), which is the smallest penalty parameter that yields a Step 2 model with no covariate effects. The 1-alpha0 quantile of the distribution of is then used to fit the Step 2 model on the original data.

Value

An object of class "tunevt".

An object of class "tunevt" is a list containing at least the following components:

`call`	the matched call
`vtmod`	the model estimated by the given `step2` procedure fit with the permuted tuning parameter for the estimated CATEs from the `step1` model. See `vt2_lasso`, `vt2_rtree`, or `vt2_ctree` for specifics.
`mnpp`	the MNPP for the estimated CATEs from Step 1.
`theta_null`	a vector of the MNPPs from each permutation under the null hypothesis.
`pvalue`	the probability of observing a MNPP as or more extreme as the observed MNPP under the null hypothesis of no effect heterogeneity.
`z`	if `keepz = TRUE`, the estimated CATEs from the `step1` model.

References

Foster JC, Taylor JM, Ruberg SJ (2011). “Subgroup identification from randomized clinical trial data.” Statistics in Medicine, 30(24), 2867–2880. ISSN 02776715, doi:10.1002/sim.4322.

Wolf JM, Koopmeiners JS, Vock DM (2022). “A permutation procedure to detect heterogeneous treatment effects in randomized clinical trials while controlling the type I error rate.” Clinical Trials, 19(5), 512-521. ISSN 1740-7745, doi:10.1177/17407745221095855, Publisher: SAGE Publications.

Deng C, Wolf JM, Vock DM, Carroll DM, Hatsukami DK, Leng N, Koopmeiners JS (2023). “Practical guidance on modeling choices for the virtual twins method.” Journal of Biopharmaceutical Statistics. doi:10.1080/10543406.2023.2170404.

Examples

data(tehtuner_example)
# Low p_reps for example use only
tunevt(
  tehtuner_example, step1 = "lasso", step2 = "rtree",
  alpha0 = 0.2, p_reps = 5
)

data(tehtuner_example)
# Low p_reps for example use only
tunevt(
  tehtuner_example, step1 = "lasso", step2 = "rtree",
  alpha0 = 0.2, p_reps = 5
)

Check if alpha0 is a valid input to tunevt

Description

Check if alpha0 is a valid input to tunevt

Usage

validate_alpha0(data, alpha0)
validate_alpha0(data, alpha0)

Arguments

`data`	a data frame containing a response, binary treatment indicators, and covariates.
`alpha0`	the nominal Type I error rate.

Value

TRUE if alpha0 is a valid input. Errors otherwise.

Check if p_reps is a valid input to tunevt

Description

Check if p_reps is a valid input to tunevt

Usage

validate_p_reps(data, p_reps)
validate_p_reps(data, p_reps)

Arguments

`data`	a data frame containing a response, binary treatment indicators, and covariates.
`p_reps`	the number of permutations to run.

Value

TRUE if p_reps is a valid input. Errors otherwise.

Check if Trt is a valid input to tunevt

Description

Check if Trt is a valid input to tunevt

Usage

validate_Trt(data, Trt)
validate_Trt(data, Trt)

Arguments

`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.

Value

TRUE if Trt is a valid input. Errors otherwise.

Check if Y is a valid input to tunevt

Description

Check if Y is a valid input to tunevt

Usage

validate_Y(data, Y)
validate_Y(data, Y)

Arguments

`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Y`	a string specifying the name of the column of `data` contains the response.

Value

TRUE if Y is a valid input. Errors otherwise.

Estimate the CATE Using the Lasso for Step 1 of Virtual Twins

Description

Estimate the CATE Using the Lasso for Step 1 of Virtual Twins

Usage

vt1_lasso(data, Trt, Y, ...)
vt1_lasso(data, Trt, Y, ...)

Arguments

`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`Y`	a string specifying the name of the column of `data` contains the response.
`...`	additional arguments to `cv.glmnet`

Value

Estimated CATEs for each subject in data.

Estimate the CATE Using MARS for Step 1 of Virtual Twins

Description

Estimate the CATE Using MARS for Step 1 of Virtual Twins

Usage

vt1_mars(data, Trt, Y, ...)
vt1_mars(data, Trt, Y, ...)

Arguments

`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`Y`	a string specifying the name of the column of `data` contains the response.
`...`	additional arguments to `earth`

Value

Estimated CATEs for each subject in data.

Estimate the CATE Using a Random Forest for Step 1 of Virtual Twins

Description

Estimate the CATE Using a Random Forest for Step 1 of Virtual Twins

Usage

vt1_rf(data, Trt, Y, ...)
vt1_rf(data, Trt, Y, ...)

Arguments

`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`Y`	a string specifying the name of the column of `data` contains the response.
`...`	additional arguments to `rfsrc`

Value

Estimated CATEs for each subject in data.

Estimate the CATE Using Super Learner for Step 1 of Virtual Twins

Description

Estimate the CATE Using Super Learner for Step 1 of Virtual Twins

Usage

vt1_super(data, Trt, Y, SL.library, ...)
vt1_super(data, Trt, Y, SL.library, ...)

Arguments

`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`Y`	a string specifying the name of the column of `data` contains the response.
`SL.library`	Either a character vector of prediction algorithms or a list containing character vector. See `SuperLearner` for more details.
`...`	additional arguments to `SuperLearner`

Value

Estimated CATEs for each subject in data.

Estimate the CATE using a classification tree for Step 2

Description

Estimate the CATE using a classification tree for Step 2

Usage

vt2_classtree(z, data, Trt, Y, theta, threshold)
vt2_classtree(z, data, Trt, Y, theta, threshold)

Arguments

`z`	a numeric vector of estimated CATEs from Step 1
`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`Y`	a string specifying the name of the column of `data` contains the response.
`theta`	tree complexity parameter (`cp`)
`threshold`	for "`step2 = 'classtree'`" only. The value against which to test if the estimated individual treatment effect from Step 1 is higher (TRUE) or lower (FALSE).

Value

an object of class rpart. See rpart.object.

Estimate the CATE using a conditional inference tree for Step 2

Description

Estimate the CATE using a conditional inference tree for Step 2

Usage

vt2_ctree(z, data, Trt, Y, theta)
vt2_ctree(z, data, Trt, Y, theta)

Arguments

`z`	a numeric vector of estimated CATEs from Step 1
`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`Y`	a string specifying the name of the column of `data` contains the response.
`theta`	the value of the test statistic that must be exceeded in order to implement a split (`mincriterion`)

Value

An object of class BinaryTree-class. See BinaryTree-class.

Estimate the CATE using the Lasso for Step 2

Description

Estimate the CATE using the Lasso for Step 2

Usage

vt2_lasso(z, data, Trt, Y, theta)
vt2_lasso(z, data, Trt, Y, theta)

Arguments

`z`	a numeric vector of estimated CATEs from Step 1
`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`Y`	a string specifying the name of the column of `data` contains the response.
`theta`	lasso penalty parameter (`lambda`)

Value

a list of length 3 containing the following elements:

`mod`	an object of class `glmnet`. See `glmnet`.
`coefficients`	coefficients associated with the penalty parameter `theta`.
`fitted.values`	predicted values associated with the penalty parameter `theta`.

Estimate the CATE using a regression tree for Step 2

Description

Estimate the CATE using a regression tree for Step 2

Usage

vt2_rtree(z, data, Trt, Y, theta)
vt2_rtree(z, data, Trt, Y, theta)

Arguments

`z`	a numeric vector of estimated CATEs from Step 1
`data`	a data frame containing a response, binary treatment indicators, and covariates.
`Trt`	a string specifying the name of the column of `data` contains the treatment indicators.
`Y`	a string specifying the name of the column of `data` contains the response.
`theta`	tree complexity parameter (`cp`)

Value

an object of class rpart. See rpart.object.

Package 'tehtuner'

Help Index

Get the MNPP for the Step 2 model

Description

Usage

Arguments

Get the MNPP for a Classification Tree

Description

Usage

Arguments

Value

Get the MNPP for a Conditional Inference Tree

Description

Usage

Arguments

Value

Get the MNPP for a Model fit via Lasso

Description

Usage

Arguments

Get the MNPP for a Regression Tree

Description

Usage

Arguments

Value

Permute a dataset under the null hypothesis and get the MNPP

Description

Usage

Arguments

Value

Get the appropriate Step 1 estimation function associated with a method

Description

Usage

Arguments

Value

Get the appropriate Step 2 estimation function associated with a method

Description

Usage

Arguments

Value

Generate a dataset with permuted treatment indicators

Description

Usage

Arguments

Value

Print an object of class tunevt

Description

Usage

Arguments

Value

Simulated example data

Description

Usage

Format

Test if a Value Gives a Null Conditional Inference Tree

Description

Usage

Arguments

Value

Estimate the penalty parameter for Step 2 of Virtual Twins

Description

Usage

Arguments

Value

Fit a tuned Virtual Twins model

Description

Usage

Arguments

Details

Value

References

Examples

Check if alpha0 is a valid input to tunevt

Description

Usage

Arguments

Value

Check if p_reps is a valid input to tunevt

Description

Usage