Skip to contents

Functions that uses random walk or phylogenetic walk to obtain training datasets

Usage

get_bags(
  pheno_mat,
  dist_mat_or_tree,
  bagging = "phylogenetic_walk",
  misslabel_mat = FALSE,
  bag_size = NA,
  no_rf = 1,
  max_per_bag = NA
)

Arguments

pheno_mat

Data frame that contains unique indexes in the first column and the phenotype classes in the second column. The unique indexes should contain only letters, numbers and special signs "_", ".".

dist_mat_or_tree

Provide either distance matrix that will contain phylogenetic distances or a phylogenetic tree as a phylo object.

bagging

The algorithm used to bootstrap the original dataset. Choose either "phylogenetic_walk" or "random_walk". Default: "phylogenetic_walk"

misslabel_mat

Matrix that indicates which strains were mislabeled. This is argument is only used internally in aurora_pheno. Default: FALSE

bag_size

The size of the bag for each class. Default: NA. If NA than the bag_size is calculated as 5* the number of strains in the class with the fewest strains. Provide the size as a number for each class i.e., c(50, 50, 50) for a phenotype with three classes.

no_rf

Number of bags that the function returns. Default: 100

max_per_bag

Maximum number of times a strain can be repeated in the bag. Default: NA If NA than the max_per_bag is calculated so that none of the strains exceeds 20% of the bag of each class.

Value

The output of this function is a list with strain indexes selected by phylogenetic_walk" or "random_walk". The list is as long as the no_rf argument specifies. Additionally the output contains count that shows how many times was each strain selected in the bag.

Examples

if (FALSE) { # \dontrun{
  data(tree_reuteri)
  data(pheno_mat_reuteri)

  get_bags(pheno_mat = pheno_mat,
           tree = tree,
           no_rf = 100) # generates 100 bags using phylogenetic_walk
} # }