InformMe.jl Documentation
Functions
InformMe.fastaToCpG
— MethodThis function is used to analyze a reference genome in order to find and store the locations of all CpG sites within each chromosome and to compute the CpG densities at each CpG site as well as the distances between neighboring CpG sites. A 1-based coordinate system is used, in which the first base is assigned to position 1 and the location of a CpG site is defined by the position of the C nucleotide on the forward strand of the reference genome.
This function must be run ONLY ONCE before proceeding with analysis of BAM files.
USAGE (default):
fastaToCpg(FASTAfilename)
USAGE (optional):
Example of optional usage with additional input parameters.
FastaToCpG(FASTAfilename,outdir="/path/to/outputdir/")
MADATORY INPUT:
FASTAfilename
Full path of FASTA-formatted reference genome to which
available BAM files have been aligned to.
OPTIONAL INPUTS:
outdir
Path where the output will be stored at.
Default value: "./"
wsize
Window size used in CpG density calculations.
Default value: 1000
InformMe.convertBAMtoBits
— MethodRuns the entire MethToBits pipeline.
This function depends on a working instalation of SAMtools that is on the system path PATH.
Before running this function, FastaToCpG.m must be run ONCE.
USAGE (default):
convertBAMtoBits(bamFilenames,phenoName)
USAGE (optional):
Example of optional usage with additional input parameters.
matrixFromBam(bam_prefix,chr_num,reference_path="/path/to/ref")
MADATORY INPUTS:
bamFilenames
A comma seperated string with list of input bam file
names without the ".bam" extension. These
files must be sorted from the least to the greatest base
pair position along the reference sequence and must be
indexed (i.e., the associated BAI file must be available).
The file name must not contain "." characters, but can
contain "_" instead. Moreover, the file name should be
unique.
phenoName
A string which will be the unique identifier of this
sample/model that is built.
##OPTIONAL INPUTS:
reference_path
Path to the root subdirectory where the outputs of this
function are stored.
Default value: "./genome/"
bamfile_path
Path to the subdirectory where the BAM file is located.
Default value: "./indexedBAMfiles/"
matrices_path
Path to the subdirectory where the output of this function
is stored.
Default value: "./matrices/"
estimation_path
A string that specifies the path to the directory that
contains the results of parameter estimation performed
by estParamsForChr.jl.
Default value: "./estimation/"
outdir
A string that specifies the path of the directory in which
the methylation analysis results are written.
Default value: "./output/"
pairedEnds
Flag for paired end read support. A value of 1 indicates
that the sequencer employed paired end reads, whereas a
value of 0 indicates that the sequencer employed single
end reads.
Default value: true
numBasesToTrim
A vector of integers specifying how many bases should be
trimmed from the begining of each read. If the vector
contains two integers, then the first integer specifies
how many bases to trim from the first read in a read pair,
whereas the second integer specifies how many bases should
be trimmed from the second read in the pair. If the
vector contains one integer, then all reads will have
that number of bases trimmed from the beginning of the
read. If no bases are to be trimmed, then this input
must be set to 0.
Default value: 0
regionSize
The size of the genomic regions for which methylation
information is produced (in number of base pairs).
Default value: 3000
minCpGsReqToModel
The minimum number of CpG sites within a genomic region
required to perform statistical estimation.
Default value: 10
boundaryConditions
Flag to decide if boundary conditions should be estimated
freely in MLE.
Default value: false
MSIflag
Flag that determines whether this function performs
computation of the methylation sensitivity index (MSI).
false: no MSI computation.
true: allow MSI computation.
Default value: false
ESIflag
Flag that determines whether this function performs
computation of the entropic sensitivity index (ESI).
false: no ESI computation.
true: allow ESI computation.
Default value: false
MCflag
Flag that determines whether this function performs
computation of turnover ratios, CpG entropies, capacities,
and relative dissipated energies of methylation
channels (MCs).
false: no MC computations.
true: allow MC computations.
Default value: false
chr_nums
A vector with the chromosomes to be processed (without
"chr" string).
Default value: 1:22
numProcessors
The number of processors to use in the computations.
Note that julia must be started as "julia -p 4" if
four processors are desired. The nprocs() function
tells how many cores are available in julia, and
we default to use them all.
Default value: nprocs()
The default values of regionSize and minCpGsReqToModel should only be changed by an expert with a detailed understanding of the code and the methods used.
InformMe.matrixFromBam
— MethodThis function processes a BAM file with aligned reads to a reference genome and produces methylation information for nonoverlapping genomic regions (containing the same number of base pairs) in a given chromosome. The final output for each genomic region is a matrix with -1,0,1 values. Each row of the matrix is a methylation read, whereas each column represents a CpG site within the genomic region. A value of -1 indicates no methylation information is available for the CPG site, 0 indicates that the CpG site is unmethylated, and 1 indicates that the CpG site is methylated.
This function depends on a working instalation of SAMtools that is on the system path PATH.
Before running this function, FastaToCpG.m must be run ONCE.
USAGE (default):
matrixFromBam(bam_prefix,chr_num)
USAGE (optional):
Example of optional usage with additional input parameters.
matrixFromBam(bam_prefix,chr_num; reference_path="/path/to/ref")
MADATORY INPUTS:
bam_prefix
Prefix of the BAM file (without the .bam extension). This
file must be sorted from the least to the greatest base
pair position along the reference sequence and must be
indexed (i.e., the associated BAI file must be available).
The file name must not contain "." characters, but can
contain "_" instead. Moreover, the file name should be
unique.
chr_num
Number representing the chromosome to be processed.
OPTIONAL INPUTS:
reference_path
Path to the root subdirectory where the outputs of this
function are stored.
Default value: "./genome/"
bamfile_path
Path to the subdirectory where the BAM file is located.
Default value: "./indexedBAMfiles/"
matrices_path
Path to the subdirectory where the output of this function
is stored.
Default value: "./matrices/"
pairedEnds
Flag for paired end read support. A value of true indicates
that the sequencer employed paired end reads, whereas a
value of false indicates that the sequencer employed single
end reads.
Default value: true
numBasesToTrim
A vector of integers specifying how many bases should be
trimmed from the begining of each read. If the vector
contains two integers, then the first integer specifies
how many bases to trim from the first read in a read pair,
whereas the second integer specifies how many bases should
be trimmed from the second read in the pair. If the
vector contains one integer, then all reads will have
that number of bases trimmed from the beginning of the
read. If no bases are to be trimmed, then this input
must be set to 0.
Default value: 0
regionSize
The size of the genomic regions for which methylation
information is produced (in number of base pairs).
Default value: 3000
minCpGsReqToModel
The minimum number of CpG sites within a genomic region
required to perform statistical estimation.
Default value: 10
The default values of regionSize and minCpGsReqToModel should only be changed by an expert with a detailed understanding of the code and the methods used.
InformMe.estParamsForChr
— MethodThis function takes a list of BAM files (which correspond to the same phenotype) and performs statistical model estimation within a specific chromosome of interest. The function can be used on a computing cluster to break the work of model estimation to many independent parallel job processes. This is performed only after matrixFromBam.m.
USAGE (default):
estParamsForChr(mat_files,prefix,matrices_path,reference_path,chr_num)
USAGE (optional):
Example of optional usage with additional input parameters.
estParamsForChr(mat_files,prefix,matrices_path,reference_path,chr_num, regionSize=2000)
MANDATORY INPUTS:
mat_files
All the .mat files to be included in the model. This can be a
single .mat file or multiple files in the form of a comma-sepa-
rated list of files.
prefix
A string that specifies the name of the modeled phenotype.
The output files produced will contain this prefix.
matrices_path
A string that specifies the path to the directory that
where the output will be stored.
reference_path
A string that specifies the path to the directory that
contains the results of analysis of the reference genome
performed by FastaToCpG.m as well as the results of
methylation calling performed by matrixFromBam.m.
chr_num
Chromosome number 1 to 22 (in humans) specifying the
chromosome for which statistical estimation must be
performed.
OPTIONAL INPUTS:
regionSize
The size of the genomic region used for parameter
estimation (in number of base pairs).
Default value: 3000
boundaryConditions
Flag to decide if boundary conditions should be estimated
freely in MLE.
Default value: false
The default value of regionSize should only be changed by an expert with a detailed understanding of the code and the methods used.
InformMe.methAnalysisForChr
— MethodThis function performs methylation analysis of a given chromosome in a single phenotype. The function can be used on a computing cluster to break the analysis work to many independent parallel job processes. This is performed only after estParamsForChr.m in the Modeling subdirectory is run to build the Ising models for the phenotype.
USAGE (default):
methAnalysisForChr(prefix,chr_num,reference_path,estimation_path)
USAGE (optional):
Example of optional usage with additional input parameters.
methAnalysisForChr(prefix,chr_num,reference_path,estimation_path, outdir="/path/to/output")
MANDATORY INPUTS:
prefix
A string that specifies the name of the phenotype to be
analyzed.
chr_num
Chromosome number (1 to 22 in humans) specifying the
chromosome for which methylation analyis must be
performed.
reference_path
A string that specifies the path to the directory that
contains the results of analysis of the reference genome
performed by FastaToCpG.m as well as the results of
methylation calling performed by matrixFromBam.jl.
estimation_path
A string that specifies the path to the directory that
contains the results of parameter estimation performed
by estParamsForChr.jl.
OPTIONAL INPUTS:
outdir
A string that specifies the path of the directory in which
the methylation analysis results are written.
Default value: "./results/"
MSIflag
Flag that determines whether this function performs
computation of the methylation sensitivity index (MSI).
false: no MSI computation.
true: allow MSI computation.
Default value: false
ESIflag
Flag that determines whether this function performs
computation of the entropic sensitivity index (ESI).
false: no ESI computation.
true: allow ESI computation.
Default value: false
MCflag
Flag that determines whether this function performs
computation of turnover ratios, CpG entropies, capacities,
and relative dissipated energies of methylation
channels (MCs).
false: no MC computations.
true: allow MC computations.
Default value: false
regionSize
The size of the genomic regions used for parameter
estimation (in number of base pairs).
Default value: 3000
subregionSize
The size of the subregions of a genomic region used
for methylation analysis (in number of base pairs).
The ratio regionSize/subregionSize must be an integer.
Default value: 150
The default values of regionSize and subregionSize should only be changed by an expert with a detailed understanding of the code and the methods used.
InformMe.makeBedsForMethAnalysis
— MethodThis function makes BED files for the methylation analysis results obtained by means of MethAnalysisForChr.m for a single phenotype.
USAGE (default):
makeBedsForMethAnalysis(prefix,analysis_path,reference_path)
USAGE (optional):
Example of optional usage with additional input parameters.
makeBedsForMethAnalysis(prefix,analysis_path,reference_path, outdir="/path/to/output")
MANDATORY INPUTS:
prefix
A string that specifies the name of the phenotype.
analysis_path
A string that specifies the path of the directory in which
the model was constructed.
reference_path
A string that specifies the path to the directory that
contains the results of analysis of the reference genome
performed by FastaToCpG.m as well as the results of
methylation calling performed by matrixFromBam.jl.
OPTIONAL INPUTS:
chrs
A vector of strings for the chromosomes to output to the
final bed files. Default value: `[string("chr",i) for i=1:22]`
outdir
A string that specifies the path of the directory in which
the output BED files are written.
Default value: "./"
MSIflag
Flag that determines whether this function performs
computation of the methylation sensitivity index (MSI).
false: no MSI computation.
true: allow MSI computation.
Default value: false
ESIflag
Flag that determines whether this function performs
computation of the entropic sensitivity index (ESI).
false: no ESI computation.
true: allow ESI computation.
Default value: false
MCflag
Flag that determines whether this function performs
computation of turnover ratios, CpG entropies, capacities,
and relative dissipated energies of methylation
channels (MCs).
false: no MC computations.
true: allow MC computations.
Default value: false
thresh
A scalar used as a threshold in methylation-based
classification.
Default value: 0.4
regionSize
The size of the genomic regions used for parameter
estimation (in number of base pairs).
Default value: 3000
subregionSize
The size of the subregions of a genomic region used
for methylation analysis (in number of base pairs).
The ratio regionSize/subregionSize must be an integer.
Default value: 150
The default values of thresh, regionSize, and subregionSize should only be changed by an expert with a detailed understanding of the code and the methods used.
InformMe.diffMethAnalysisToBed
— FunctionThis function makes BED files for the differential version of the methylation analysis results obtained by means of MethAnalysisForChr.m applied on two dinstict phenotypes.
USAGE (default):
makeBedsForDiffMethAnalysis(prefix_1,prefix_2,analysis_path_1, analysis_path_2,reference_path)
MANDATORY INPUTS:
prefix_X
Strings with the first string specifying
the name of the first phenotype and the second string
specifying the name of the second phenotype used for
differential methylation analysis. Both phenotypes
must have already been analyzed with methAnalysisForChr.jl.
analysis_path_X
A string that specifies the path of the directory in which
the methylation analysis results obtained by
MethAnalysisForChr.jl are stored.
reference_path
A string that specifies the path to the directory that
contains the results of analysis of the reference genome
performed by FastaToCpG.m as well as the results of
methylation calling performed by matrixFromBam.jl.
OPTIONAL INPUTS:
chrs
A vector of strings of chromosomes to be output to the final
bed files. Default value: `[string("chr",i) for i=1:22]`
outdir
A string that specifies the path of the directory in which
the output BED files are written.
Default value: "./makeBedsForDiffMethAnalysis_out/"
MSIflag
Flag that determines whether this function performs
computation of the methylation sensitivity index (MSI).
false: no MSI computation.
true: allow MSI computation.
Default value: false
ESIflag
Flag that determines whether this function performs
computation of the entropic sensitivity index (ESI).
false: no ESI computation.
true: allow ESI computation.
Default value: false
MCflag
Flag that determines whether this function performs
computation of turnover ratios, CpG entropies, capacities,
and relative dissipated energies of methylation
channels (MCs).
false: no MC computations.
true: allow MC computations.
Default value: false
regionSize
The size of the genomic regions used for parameter
estimation (in number of base pairs).
Default value: 3000
subregionSize
The size of the subregions of a genomic region used
for methylation analysis (in number of base pairs).
The ratio regionSize/subregionSize must be an integer.
Default value: 150
minNumCpG
The minimum number of CpG sites within an analysis
subregion required for performing full methylation-based
differential classification.
Default value: 2
thresh
A scalar used as a threshold in methylation-based
differential classification.
Default value: 0.55
threshDMU
A 1x6 vector containing threshold values used for
methylation-based differential classification.
Default value: [-1,-0.55,-0.1,0.1,0.55,1]
threshDEU
A 1x8 vector containing threshold values used for
entropy-based differential classification.
Default value: [-1,-0.5,-0.3,-0.05,0.05,0.3,0.5,1]
The default values of regionSize, subregionSize, minNumCpG, thresh, threshDMU, and threshDEU should only be changed by an expert with a detailed understanding of the code and the methods used.