Title: | A Pipeline for Meta-Genome Wide Association |
---|---|
Description: | Correlates variation within the meta-genome to target species phenotype variations in meta-genome with association studies. Follows the pipeline described in Chaston, J.M. et al. (2014) <doi:10.1128/mBio.01631-14>. |
Authors: | Corinne Sexton [aut], John Chaston [aut, cre], Hayden Smith [ctb] |
Maintainer: | John Chaston <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.0.4 |
Built: | 2024-11-05 03:57:52 UTC |
Source: | https://github.com/johnchaston/magnamwar |
A list created by inputting the output of OrthoMCL clusters into the FormatAfterOrtho function.
after_ortho_format
after_ortho_format
List of 2: (1) presence absence matrix, (2) protein ids:
matrix showing taxa presence/absence in OG
matrix listing protein_id contained in each OG
A list created by inputting the output of OrthoMCL clusters into the FormatAfterOrtho function.
after_ortho_format_grps
after_ortho_format_grps
List of 2: (1) presence absence matrix, (2) protein ids:
matrix showing taxa presence/absence in OG
matrix listing protein_id contained in each OG
Main function for analyzing the statistical association of OG (orthologous group) presence with phenotype data
AnalyzeOrthoMCL(mcl_data, pheno_data, model, species_name, resp = NULL, fix2 = NULL, rndm1 = NULL, rndm2 = NULL, multi = 1, time = NULL, event = NULL, time2 = NULL, startnum = 1, stopnum = "end", output_dir = NULL, sig_digits = NULL, princ_coord = 0)
AnalyzeOrthoMCL(mcl_data, pheno_data, model, species_name, resp = NULL, fix2 = NULL, rndm1 = NULL, rndm2 = NULL, multi = 1, time = NULL, event = NULL, time2 = NULL, startnum = 1, stopnum = "end", output_dir = NULL, sig_digits = NULL, princ_coord = 0)
mcl_data |
output of FormatAfterOrtho; a list of matrices; (1) a presence/absence matrix of taxa per OG, (2) a list of the specific protein ids within each OG |
pheno_data |
a data frame of phenotypic data with specific column names used to specify response variable as well as other fixed and random effects |
model |
linear model with gene presence as fixed effect (lm), linear mixed mffect models with gene presence as fixed effect and additional variables specified as: one random effect (lmeR1); two independent random effects (lmeR2ind); two random effects with rndm2 nested in rndm1 (lmeR2nest); or two independent random effects with one additional fixed effect (lmeF2), Wilcox Test with gene presence as fixed effect (wx), Survival Tests with support for multi core design: with two random effects (survmulti), and with two times as well as an additional fixed variable (survmulticensor) |
species_name |
Column name in pheno_data containing 4-letter species designations |
resp |
Column name in pheno_data containing response variable |
fix2 |
Column name in pheno_data containing second fixed effect |
rndm1 |
Column name in pheno_data containing first random variable |
rndm2 |
Column name in pheno_data containing second random variable |
multi |
(can only be used with survival tests) Number of cores |
time |
(can only be used with survival tests) Column name in pheno_data containing first time |
event |
(can only be used with survival tests) Column name in pheno_data containing event |
time2 |
(can only be used with survival tests) Column name in pheno_data containing second time |
startnum |
number of test to start on |
stopnum |
number of test to stop on |
output_dir |
(if using survival tests) directory where small output files will be placed before using SurvAppendMatrix. Must specify a directory if choosing to output small files, else only written as a matrix |
sig_digits |
amount of digits to display for p-values and means of data; default to NULL (no rounding) |
princ_coord |
the number of principle coordinates to be included in model as fixed effects (1, 2, or 3), if a decimal is specified, as many principal coordinates as are needed to account for that percentage of the variance will be included in the analysis |
A matrix with the following columns: OG, p-values, Bonferroni corrected p-values, mean phenotype of OG-containing taxa, mean pheotype of OG-lacking taxa, taxa included in OG, taxa not included in OG
#Linear Model ## Not run: mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lm', 'Treatment', resp='RespVar') ## End(Not run) # the rest of the examples are not run for time's sake #Linear Mixed Effect with one random effect ## Not run: mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lmeR1', 'Treatment', resp='RespVar', rndm1='Experiment') ## End(Not run) #Linear Mixed Effect with two independent random effects ## Not run: mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lmeR2ind', 'Treatment', resp='RespVar', rndm1='Experiment', rndm2='Vial') ## End(Not run) #Linear Mixed Effect with rndm2 nested in rndm1 ## Not run: mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lmeR2nest', 'Treatment', resp='RespVar', rndm1='Experiment', rndm2='Vial') ## End(Not run) #Linear Mixed Effect with two independent random effects and one additional fixed effect ## Not run: mcl_mtrx3 <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lmeF2', 'Treatment', resp='RespVar', fix2='Treatment', rndm1='Experiment', rndm2='Vial', princ_coord = 4) ## End(Not run) #Wilcoxon Test ## Not run: mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'wx', 'Treatment', resp='RespVar') ## End(Not run) # ~ 5 minutes #Survival with two independent random effects, run on multiple cores ## Not run: mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, starv_pheno_data, 'TRT', model='survmulti', time='t2', event='event', rndm1='EXP', rndm2='VIAL', multi=1) ## End(Not run) # ~ 5 minutes #Survival with two independent random effects and one additional fixed effect, #including drops on multi cores ## Not run: mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, starv_pheno_data, 'TRT', model='survmulticensor', time='t1', time2='t2', event='event', rndm1='EXP', rndm2='VIAL', fix2='BACLO', multi=1) ## End(Not run) #to be appended with SurvAppendMatrix
#Linear Model ## Not run: mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lm', 'Treatment', resp='RespVar') ## End(Not run) # the rest of the examples are not run for time's sake #Linear Mixed Effect with one random effect ## Not run: mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lmeR1', 'Treatment', resp='RespVar', rndm1='Experiment') ## End(Not run) #Linear Mixed Effect with two independent random effects ## Not run: mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lmeR2ind', 'Treatment', resp='RespVar', rndm1='Experiment', rndm2='Vial') ## End(Not run) #Linear Mixed Effect with rndm2 nested in rndm1 ## Not run: mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lmeR2nest', 'Treatment', resp='RespVar', rndm1='Experiment', rndm2='Vial') ## End(Not run) #Linear Mixed Effect with two independent random effects and one additional fixed effect ## Not run: mcl_mtrx3 <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lmeF2', 'Treatment', resp='RespVar', fix2='Treatment', rndm1='Experiment', rndm2='Vial', princ_coord = 4) ## End(Not run) #Wilcoxon Test ## Not run: mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'wx', 'Treatment', resp='RespVar') ## End(Not run) # ~ 5 minutes #Survival with two independent random effects, run on multiple cores ## Not run: mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, starv_pheno_data, 'TRT', model='survmulti', time='t2', event='event', rndm1='EXP', rndm2='VIAL', multi=1) ## End(Not run) # ~ 5 minutes #Survival with two independent random effects and one additional fixed effect, #including drops on multi cores ## Not run: mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, starv_pheno_data, 'TRT', model='survmulticensor', time='t1', time2='t2', event='event', rndm1='EXP', rndm2='VIAL', fix2='BACLO', multi=1) ## End(Not run) #to be appended with SurvAppendMatrix
Function to show Principal Components statistics based on the OrthoMCL presence absence groupings.
CalculatePrincipalCoordinates(mcl_data)
CalculatePrincipalCoordinates(mcl_data)
mcl_data |
output of FormatAfterOrtho –list of 2 things– 1: binary matrix indicating the presence / absence of genes in each OG and 2: vector of names of OGs |
returns a named list of principal components and accompanying proportion of variance for each
CalculatePrincipalCoordinates(after_ortho_format)
CalculatePrincipalCoordinates(after_ortho_format)
After running OrthoMCL and/or submitting to www.orthomcl.org, formats the output file to be used in AnalyzeOrthoMCL
FormatAfterOrtho(file, format = "ortho")
FormatAfterOrtho(file, format = "ortho")
file |
Path to the OrthoMCL output file |
format |
Specification of the method by which file was obtained: defaults to 'ortho' for output from orthomcl.org. Other option is 'groups' for output from local run of OrthoMCL software. |
a list of matrices; (1) a presence/absence matrix of taxa per OG, (2) a list of the specific protein ids within each OG
file <- system.file('extdata', 'orthologGroups.txt', package='MAGNAMWAR') after_ortho_format <- FormatAfterOrtho(file) file_grps <- system.file('extdata', 'groups_example_r.txt', package='MAGNAMWAR') after_ortho_format_grps <- FormatAfterOrtho(file_grps, format = 'groups')
file <- system.file('extdata', 'orthologGroups.txt', package='MAGNAMWAR') after_ortho_format <- FormatAfterOrtho(file) file_grps <- system.file('extdata', 'groups_example_r.txt', package='MAGNAMWAR') after_ortho_format_grps <- FormatAfterOrtho(file_grps, format = 'groups')
Creates the composite fasta file for use in running OrthoMCL and/or submitting to www.orthomcl.org
FormatMCLFastas(fa_dir, genbnk_id = 4)
FormatMCLFastas(fa_dir, genbnk_id = 4)
fa_dir |
Path to the directory where all raw GenBank files are stored. Note, all file names must be changed to a 4-letter code representing each species and have '.fasta' file descriptor |
genbnk_id |
(Only necessary for the deprecated version of fasta headers) The index of the sequence ID in the GenBank pipe-separated annotation line (default: 4) |
Returns nothing, but prints the path to the final OrthoMCL compatible fasta file
## Not run: dir <- system.file('extdata', 'fasta_dir', package='MAGNAMWAR') dir <- paste(dir,'/',sep='') formatted_file <- FormatMCLFastas(dir) ## End(Not run)
## Not run: dir <- system.file('extdata', 'fasta_dir', package='MAGNAMWAR') dir <- paste(dir,'/',sep='') formatted_file <- FormatMCLFastas(dir) ## End(Not run)
A data frame containing the final results of statistical analysis with protein ids, annotations, and sequences added.
joined_mtrx
joined_mtrx
A data frame with 17 rows and 11 variables:
taxa cluster id, as defined by OrthoMCL
p-value, based on presence absence
Bonferroni p-value, corrected by number of tests
mean of all taxa phenotypes in that OG
mean of all taxa phenotypes not in that OG
taxa in that cluster
taxa not in that cluster
randomly selected representative taxa from the cluster
protein id, from randomly selected representative taxa
fasta annotation, from randomly selected representative taxa
AA sequence, from randomly selected representative taxa
A data frame containing the final results of statistical analysis with protein ids, annotations, and sequences added.
joined_mtrx_grps
joined_mtrx_grps
A data frame with 10 rows and 11 variables:
taxa cluster id, as defined by OrthoMCL
p-value, based on presence absence
Bonferroni p-value, corrected by number of tests
mean of all taxa phenotypes in that OG
mean of all taxa phenotypes not in that OG
taxa in that cluster
taxa not in that cluster
randomly selected representative taxa from the cluster
protein id, from randomly selected representative taxa
fasta annotation, from randomly selected representative taxa
AA sequence, from randomly selected representative taxa
Joins the OrthoMCL output matrix to representative sequences
JoinRepSeq(mcl_data, fa_dir, mcl_mtrx, fastaformat = "new")
JoinRepSeq(mcl_data, fa_dir, mcl_mtrx, fastaformat = "new")
mcl_data |
output of FormatAfterOrtho; a list of matrices; (1) a presence/absence matrix of taxa per OG, (2) a list of the specific protein ids within each OG |
fa_dir |
Path to the directory where all raw GenBank files are stored. Note, all file names must be changed to a 4-letter code representing each species and have '.fasta' file descriptor |
mcl_mtrx |
OrthoMCL output matrix from AnalyzeOrthoMCL() |
fastaformat |
options: new & old; new = no GI numbers included; defaults to new |
Returns the original OrthoMCL output matrix with additional columns: representative sequence taxon, representative sequence id, representative sequence annotation, representative sequence
## Not run: dir <- system.file('extdata', 'fasta_dir', package='MAGNAMWAR') dir <- paste(dir,'/',sep='') joined_mtrx_grps <- JoinRepSeq(after_ortho_format_grps, dir, mcl_mtrx_grps, fastaformat = 'old') ## End(Not run)
## Not run: dir <- system.file('extdata', 'fasta_dir', package='MAGNAMWAR') dir <- paste(dir,'/',sep='') joined_mtrx_grps <- JoinRepSeq(after_ortho_format_grps, dir, mcl_mtrx_grps, fastaformat = 'old') ## End(Not run)
Manhattan plot that graphs all p-values for taxa.
ManhatGrp(mcl_data, mcl_mtrx, tree = NULL)
ManhatGrp(mcl_data, mcl_mtrx, tree = NULL)
mcl_data |
FormatAfterOrtho output |
mcl_mtrx |
output of AnalyzeOrthoMCL() |
tree |
tree file optional, used for ordering taxa along x axis |
a manhattan plot
Some sort of reference
ManhatGrp(after_ortho_format, mcl_mtrx) #@param equation of line of significance, defaults to -log10((.05)/dim(pdgs)[1])
ManhatGrp(after_ortho_format, mcl_mtrx) #@param equation of line of significance, defaults to -log10((.05)/dim(pdgs)[1])
A matrix containing the final results of statistical analysis.
mcl_mtrx
mcl_mtrx
A matrix with 17 rows and 7 variables:
taxa cluster id, as defined by OrthoMCL
p-value, based on presence absence
Bonferroni p-value, corrected by number of tests
mean of all taxa phenotypes in that OG
mean of all taxa phenotypes not in that OG
taxa in that cluster
taxa not in that cluster
A matrix containing the final results of statistical analysis.
mcl_mtrx_grps
mcl_mtrx_grps
A matrix with 10 rows and 7 variables:
taxa cluster id, as defined by OrthoMCL
p-value, based on presence absence
Bonferroni p-value, corrected by number of tests
mean of all taxa phenotypes in that OG
mean of all taxa phenotypes not in that OG
taxa in that cluster
taxa not in that cluster
Bar plot of PDG vs phenotype data with presence of taxa in PDG indicated by color
PDGPlot(data, mcl_matrix, OG = "NONE", species_colname, data_colname, xlab = "Taxa", ylab = "Data", ylimit = NULL, tree = NULL, order = NULL, main_title = NULL)
PDGPlot(data, mcl_matrix, OG = "NONE", species_colname, data_colname, xlab = "Taxa", ylab = "Data", ylimit = NULL, tree = NULL, order = NULL, main_title = NULL)
data |
R object of phenotype data |
mcl_matrix |
AnalyzeOrthoMCL output |
OG |
optional parameter, a string with the name of chosen group (OG) to be colored |
species_colname |
name of column in phenotypic data file with taxa designations |
data_colname |
name of column in phenotypic data file with data observations |
xlab |
string to label barplot's x axis |
ylab |
string to label barplot's y axis |
ylimit |
optional parameter to limit y axis |
tree |
optional parameter (defaults to NULL) Path to tree file, orders the taxa by phylogenetic distribution, else it defaults to alphabetical |
order |
vector with order of taxa names for across the x axis (defaults to alpha ordering) |
main_title |
string for title of the plot (defaults to OG) |
a barplot with taxa vs phenotypic data complete with standard error bars
PDGPlot(pheno_data, mcl_mtrx, 'OG5_126778', 'Treatment', 'RespVar', ylimit=12)
PDGPlot(pheno_data, mcl_mtrx, 'OG5_126778', 'Treatment', 'RespVar', ylimit=12)
Barplot that indicates the number of PDGs vs OGs(clustered orthologous groups) in a PDG
PDGvOG(mcl_data, num = 40, ...)
PDGvOG(mcl_data, num = 40, ...)
mcl_data |
FormatAfterOrtho output |
num |
an integer indicating where the x axis should end and be compiled |
... |
args to be passed to barplot |
a barplot with a height determined by the second column and the first column abbreviated to accomodate visual spacing
PDGvOG(after_ortho_format_grps,2)
PDGvOG(after_ortho_format_grps,2)
A subset of the TAG content of fruit flies, collected in the Chaston Lab, to be used as a brief example for tests in AnalyzeOrthoMCL.
pheno_data
pheno_data
A data frame with 586 rows and 4 variables:
4-letter taxa designation of associated bacteria
response variable, TAG content
random effect variable, vial number of flies
random effect variable, experiment number of flies
Presents data for each taxa including standard error bars next to a phylogenetic tree.
PhyDataError(phy, data, mcl_matrix, species_colname, data_colname, color = NULL, OG = NULL, xlabel = "xlabel", ...)
PhyDataError(phy, data, mcl_matrix, species_colname, data_colname, color = NULL, OG = NULL, xlabel = "xlabel", ...)
phy |
Path to tree file |
data |
R object of phenotype data |
mcl_matrix |
AnalyzeOrthoMCL output |
species_colname |
name of column in data file with taxa designations |
data_colname |
name of column in data file with data observations |
color |
optional parameter, (defaults to NULL) assign colors to individual taxa by providing file (format: Taxa | Color) |
OG |
optional parameter, (defaults to NULL) a string with the names of chosen group to be colored |
xlabel |
string to label barplot's x axis |
... |
argument to be passed from other methods such as parameters from barplot() function |
A phylogenetic tree with a barplot of the data (with standard error bars) provided matched by taxa.
Some sort of reference
file <- system.file('extdata', 'muscle_tree2.dnd', package='MAGNAMWAR') PhyDataError(file, pheno_data, mcl_mtrx, species_colname = 'Treatment', data_colname = 'RespVar', OG='OG5_126778', xlabel='TAG Content')
file <- system.file('extdata', 'muscle_tree2.dnd', package='MAGNAMWAR') PhyDataError(file, pheno_data, mcl_mtrx, species_colname = 'Treatment', data_colname = 'RespVar', OG='OG5_126778', xlabel='TAG Content')
Print all protein sequences and annotations in a given OG
PrintOGSeqs(after_ortho, OG, fasta_dir, out_dir = NULL, outfile = "none")
PrintOGSeqs(after_ortho, OG, fasta_dir, out_dir = NULL, outfile = "none")
after_ortho |
output from FormatAfterOrtho |
OG |
name of OG |
fasta_dir |
directory to fastas |
out_dir |
complete path to output directory |
outfile |
name of file that will be written to |
A fasta file with all protein sequences and ids for a given OG
## Not run: OG <- 'OG5_126968' dir <- system.file('extdata', 'fasta_dir', package='MAGNAMWAR') dir <- paste(dir,'/',sep='') PrintOGSeqs(after_ortho_format, OG, dir) ## End(Not run)
## Not run: OG <- 'OG5_126968' dir <- system.file('extdata', 'fasta_dir', package='MAGNAMWAR') dir <- paste(dir,'/',sep='') PrintOGSeqs(after_ortho_format, OG, dir) ## End(Not run)
Makes a qqplot of the p-values obtained through AnalyzeOrthoMCL
QQPlotter(mcl_mtrx)
QQPlotter(mcl_mtrx)
mcl_mtrx |
matrix generated by AnalyzeOrthoMCL |
a qqplot of the p-values obtained through AnalyzeOrthoMCL
Some sore of reference
QQPlotter(mcl_mtrx)
QQPlotter(mcl_mtrx)
Useful for reformating RAST files to GBK format
RASTtoGBK(input_fasta, input_reference, out_name_path)
RASTtoGBK(input_fasta, input_reference, out_name_path)
input_fasta |
path to input fasta file |
input_reference |
path to a .csv file; it should be downloaded from RAST as excel format, saved as a .csv (saved as the tab-delimited version has compatibility problems) |
out_name_path |
name and path of the file to write to |
## Not run: lfrc_fasta <- system.file('extdata', 'RASTtoGBK//lfrc.fasta', package='MAGNAMWAR') lfrc_reference <- system.file('extdata', 'RASTtoGBK//lfrc_lookup.csv', package='MAGNAMWAR') lfrc_path <- system.file('extdata', 'RASTtoGBK//lfrc_out.fasta', package='MAGNAMWAR') RASTtoGBK(lfrc_fasta,lfrc_reference,lfrc_path) ## End(Not run)
## Not run: lfrc_fasta <- system.file('extdata', 'RASTtoGBK//lfrc.fasta', package='MAGNAMWAR') lfrc_reference <- system.file('extdata', 'RASTtoGBK//lfrc_lookup.csv', package='MAGNAMWAR') lfrc_path <- system.file('extdata', 'RASTtoGBK//lfrc_out.fasta', package='MAGNAMWAR') RASTtoGBK(lfrc_fasta,lfrc_reference,lfrc_path) ## End(Not run)
A subset of the Starvation rate of fruit flies, collected in the Chaston Lab, to be used as a brief example for survival tests in AnalyzeOrthoMCL.
starv_pheno_data
starv_pheno_data
A matrix with 543 rows and 7 variables:
random effect variable, experiment number of flies
random effect variable, vial number of flies
fixed effect variable, loss of bacteria in flies
4-letter taxa designation of associated bacteria
time 1
time 2
event
Function used to append all .csv files that are outputted from AnalyzeOrthoMCL into one matrix.
SurvAppendMatrix(work_dir, out_name = "surv_matrix.csv", out_dir = NULL)
SurvAppendMatrix(work_dir, out_name = "surv_matrix.csv", out_dir = NULL)
work_dir |
the directory where the output files of AnalyzeOrthoMCL are located |
out_name |
file name of outputted matrix |
out_dir |
the directory where the outputted matrix is placed |
A csv file containing a matrix with the following columns: OG, p-values, Bonferroni corrected p-values, mean phenotype of OG-containing taxa, mean pheotype of OG-lacking taxa, taxa included in OG, taxa not included in OG
## Not run: file <- system.file('extdata', 'outputs', package='MAGNAMWAR') directory <- paste(file, '/', sep = '') SurvAppendMatrix(directory) ## End(Not run)
## Not run: file <- system.file('extdata', 'outputs', package='MAGNAMWAR') directory <- paste(file, '/', sep = '') SurvAppendMatrix(directory) ## End(Not run)
Writes a tab separated version of the analyzed OrthoMCL data with or without the joined representative sequences
WriteMCL(mtrx, filename)
WriteMCL(mtrx, filename)
mtrx |
Matrix derived from AnalyzeOrthoMCL |
filename |
File name to save final output |
The path to the written file
## Not run: WriteMCL(mcl_mtrx, 'matrix.tsv') #mcl_mtrx previously derived from AnalyzeOrthoMCL() or join_repset() ## End(Not run)
## Not run: WriteMCL(mcl_mtrx, 'matrix.tsv') #mcl_mtrx previously derived from AnalyzeOrthoMCL() or join_repset() ## End(Not run)