
Functions that uses random walk or phylogenetic walk to obtain training datasets
Source:R/get_bags.R
get_bags.Rd
Functions that uses random walk or phylogenetic walk to obtain training datasets
Usage
get_bags(
pheno_mat,
dist_mat_or_tree,
bagging = "phylogenetic_walk",
misslabel_mat = FALSE,
bag_size = NA,
no_rf = 1,
max_per_bag = NA
)
Arguments
- pheno_mat
Data frame that contains unique indexes in the first column and the phenotype classes in the second column. The unique indexes should contain only letters, numbers and special signs "_", ".".
- dist_mat_or_tree
Provide either distance matrix that will contain phylogenetic distances or a phylogenetic tree as a
phylo
object.- bagging
The algorithm used to bootstrap the original dataset. Choose either "phylogenetic_walk" or "random_walk". Default: "phylogenetic_walk"
- misslabel_mat
Matrix that indicates which strains were mislabeled. This is argument is only used internally in
aurora_pheno
. Default: FALSE- bag_size
The size of the bag for each class. Default: NA. If NA than the bag_size is calculated as 5* the number of strains in the class with the fewest strains. Provide the size as a number for each class i.e., c(50, 50, 50) for a phenotype with three classes.
- no_rf
Number of bags that the function returns. Default: 100
- max_per_bag
Maximum number of times a strain can be repeated in the bag. Default: NA If NA than the max_per_bag is calculated so that none of the strains exceeds 20% of the bag of each class.