Macs Chip Timing
Latest Release |
Package Status |
License |
Build Status |
Downloads (PyPI) |
Downloads (Conda) |
Introduction
With the improvement of sequencing techniques, chromatinimmunoprecipitation followed by high throughput sequencing (ChIP-Seq)is getting popular to study genome-wide protein-DNA interactions. Toaddress the lack of powerful ChIP-Seq analysis method, we presentedthe Model-based Analysis of ChIP-Seq (MACS), foridentifying transcript factor binding sites. MACS captures theinfluence of genome complexity to evaluate the significance ofenriched ChIP regions and MACS improves the spatial resolution ofbinding sites through combining the information of both sequencing tagposition and orientation. MACS can be easily used for ChIP-Seq dataalone, or with a control sample with the increase ofspecificity. Moreover, as a general peak-caller, MACS can also beapplied to any 'DNA enrichment assays' if the question to be asked issimply: where we can find significant reads coverage than the randombackground.
- AccuChip provides state of the art timing and scoring services for running, triathlon, cycling, ocean swims, and obstacle course races. Our company has timed and scored 1400+ races and has the experience, knowledge, equipment and staff to make your event a success.
- This is where timing matters. Apple now has a great deal of experience designing its own chips. If an Apple-chip-powered Mac arrives in 2020, it could be a specialized product in a Mac lineup.
- MACS can be applied to scenarios other than calling enriched regions from ChIP-seq data. MACS-1.4.2 and older versions can be used to identify differential read-enriched regions from two ChIP-seq samples by viewing one of the samples as a control, so that peak regions correspond to more enriched regions in the other sample.
Neither a Snap-On, Mac nor OTC can do it, TECH 2 only. Set base timing to 3.5 degrees with the stepper motor disabled via the TECH 2. Then, perform a 'TDC Learn' procedure and set 'TDC offset' to.25 to.75. Desired' timing must read ZERO in order to set 'actual' timing to 3.5. If not ZERO, you skipped a step and timing cannot be set.
Recent Changes for MACS (2.2.4)
2.2.4
2.1.4
Install
Please check the file 'INSTALL.md' in the distribution.
Usage
Example for regular peak calling: macs2 callpeak -t ChIP.bam -c Control.bam -f BAM -g hs -n test -B -q 0.01
Example for broad peak calling: macs2 callpeak -t ChIP.bam -c Control.bam --broad -g hs --broad-cutoff 0.1
There are twelve functions available in MAC2S serving as sub-commands.
Subcommand | Description |
---|---|
callpeak | Main MACS2 Function to call peaks from alignment results. |
bdgpeakcall | Call peaks from bedGraph output. |
bdgbroadcall | Call broad peaks from bedGraph output. |
bdgcmp | Comparing two signal tracks in bedGraph format. |
bdgopt | Operate the score column of bedGraph file. |
cmbreps | Combine BEDGraphs of scores from replicates. |
bdgdiff | Differential peak detection based on paired four bedGraph files. |
filterdup | Remove duplicate reads, then save in BED/BEDPE format. |
predictd | Predict d or fragment size from alignment results. |
pileup | Pileup aligned reads (single-end) or fragments (paired-end) |
randsample | Randomly choose a number/percentage of total reads. |
refinepeak | Take raw reads alignment, refine peak summits. |
We only cover callpeak
module in this document. Please use macs2 COMMAND -h
to see the detail description for each option of eachmodule.
Call peaks
This is the main function in MACS2. It can be invoked by 'macs2callpeak' command. If you type this command without parameters, youwill see a full description of command-line options. Here we only listthe essential options.
Essential Options
-t/--treatment FILENAME
This is the only REQUIRED parameter for MACS. The file can be in anysupported format specified by --format
option. Check --format
fordetail. If you have more than one alignment file, you can specify themas -t A B C
. MACS will pool up all these files together.
-c/--control
The control or mock data file. Please follow the same direction as for-t
/--treatment
.
-n/--name
The name string of the experiment. MACS will use this string NAME tocreate output files like NAME_peaks.xls
, NAME_negative_peaks.xls
,NAME_peaks.bed
, NAME_summits.bed
, NAME_model.r
and so on. Soplease avoid any confliction between these filenames and your existingfiles.
--outdir
MACS2 will save all output files into the specified folder for thisoption.
-f/--format FORMAT
Format of tag file can be ELAND
, BED
, ELANDMULTI
, ELANDEXPORT
,ELANDMULTIPET
(for pair-end tags), SAM
, BAM
, BOWTIE
, BAMPE
or BEDPE
. Default is AUTO
which will allow MACS to decide theformat automatically. AUTO
is also useful when you combine differentformats of files. Note that MACS can't detect BAMPE
or BEDPE
format with AUTO
, and you have to implicitly specify the format forBAMPE
and BEDPE
.
Nowadays, the most common formats are BED or BAM/SAM.
BED
The BED format can be found at UCSC genome browserwebsite.
The essential columns in BED format input are the 1st columnchromosome name
, the 2nd start position
, the 3rd end position
,and the 6th, strand
.
Note that, for BED format, the 6th column of strand information isrequired by MACS. And please pay attention that the coordinates in BEDformat are zero-based and half-open(http://genome.ucsc.edu/FAQ/FAQtracks#tracks1).
BAM/SAM
If the format is BAM/SAM, please check the definition in(http://samtools.sourceforge.net/samtools.shtml). If the BAM file isgenerated for paired-end data, MACS will only keep the left mate(5'end) tag. However, when format BAMPE is specified, MACS will use thereal fragments inferred from alignment results for reads pileup.
BEDPE or BAMPE
A special mode will be triggered while the format is specified as'BAMPE' or 'BEDPE'. In this way, MACS2 will process the BAM or BEDfiles as paired-end data. Instead of building a bimodal distributionof plus and minus strand reads to predict fragment size, MACS2 willuse actual insert sizes of pairs of reads to build fragment pileup.
The BAMPE format is just a BAM format containing paired-end alignmentinformation, such as those from BWA or BOWTIE.
The BEDPE format is a simplified and more flexible BED format, whichonly contains the first three columns defining the chromosome name,left and right position of the fragment from Paired-endsequencing. Please note, this is NOT the same format used by BEDTOOLS,and the BEDTOOLS version of BEDPE is actually not in a standard BEDformat. You can use MACS2 subcommand randsample
to convert a BAMfile containing paired-end information to a BEDPE format file:
-g/--gsize
PLEASE assign this parameter to fit your needs!
It's the mappable genome size or effective genome size which isdefined as the genome size which can be sequenced. Because of therepetitive features on the chromosomes, the actual mappable genomesize will be smaller than the original size, about 90% or 70% of thegenome size. The default hs -- 2.7e9 is recommended for humangenome. Here are all precompiled parameters for effective genome size:
- hs: 2.7e9
- mm: 1.87e9
- ce: 9e7
- dm: 1.2e8
Users may want to use k-mer tools to simulate mapping of Xbps longreads to target genome, and to find the ideal effective genomesize. However, usually by taking away the simple repeats and Ns fromthe total genome, one can get an approximate number of effectivegenome size. A slight difference in the number won't cause a bigdifference of peak calls, because this number is used to estimate agenome-wide noise level which is usually the least significant onecompared with the local biases modeled by MACS.
-s/--tsize
The size of sequencing tags. If you don't specify it, MACS will try touse the first 10 sequences from your input treatment file to determinethe tag size. Specifying it will override the automatically determinedtag size.
-q/--qvalue
The q-value (minimum FDR) cutoff to call significant regions. Defaultis 0.05. For broad marks, you can try 0.05 as the cutoff. Q-values arecalculated from p-values using the Benjamini-Hochberg procedure.
-p/--pvalue
The p-value cutoff. If -p
is specified, MACS2 will use p-value insteadof q-value.
--min-length
, --max-gap
These two options can be used to fine-tune the peak calling behaviorby specifying the minimum length of a called peak and the maximumallowed a gap between two nearby regions to be merged. In anotherword, a called peak has to be longer than min-length, and if thedistance between two nearby peaks is smaller than max-gap then theywill be merged as one. If they are not set, MACS2 will set the DEFAULTvalue for min-length as the predicted fragment size d, and theDEFAULT value for max-gap as the detected read length. Note, if youset a min-length value smaller than the fragment size, it may haveNO effect on the result. For BROAD peak calling, try to set a largevalue such as 500bps. You can also use '--cutoff-analysis' option withthe default setting, and check the column 'avelpeak' under differentcutoff values to decide a reasonable min-length value.
--nolambda
With this flag on, MACS will use the background lambda as locallambda. This means MACS will not consider the local bias at peakcandidate regions.
--slocal
, --llocal
These two parameters control which two levels of regions will bechecked around the peak regions to calculate the maximum lambda aslocal lambda. By default, MACS considers 1000bp for small localregion(--slocal
), and 10000bps for large local region(--llocal
)which captures the bias from a long-range effect like an openchromatin domain. You can tweak these according to yourproject. Remember that if the region is set too small, a sharp spikein the input data may kill a significant peak.
--nomodel
While on, MACS will bypass building the shifting model.
--extsize
While --nomodel
is set, MACS uses this parameter to extend reads in5'->3' direction to fix-sized fragments. For example, if the size ofthe binding region for your transcription factor is 200 bp, and youwant to bypass the model building by MACS, this parameter can be setas 200. This option is only valid when --nomodel
is set or when MACSfails to build model and --fix-bimodal
is on.
--shift
Note, this is NOT the legacy --shiftsize
option which is replaced by--extsize
! You can set an arbitrary shift in bp here. Please Usediscretion while setting it other than the default value (0). When--nomodel
is set, MACS will use this value to move cutting ends (5')then apply --extsize
from 5' to 3' direction to extend them tofragments. When this value is negative, ends will be moved toward3'->5' direction, otherwise 5'->3' direction. Recommended to keep itas default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE togetherwith --extsize
option for detecting enriched cutting loci such ascertain DNAseI-Seq datasets. Note, you can't set values other than 0if the format is BAMPE or BEDPE for paired-end data. The default is 0.
Here are some examples for combining --shift
and --extsize
:
To find enriched cutting sites such as some DNAse-Seq datasets. Inthis case, all 5' ends of sequenced reads should be extended in bothdirections to smooth the pileup signals. If the wanted smoothingwindow is 200bps, then use
--nomodel --shift -100 --extsize 200
.For certain nucleosome-seq data, we need to pile up the centers ofnucleosomes using a half-nucleosome size for wavelet analysis(e.g. NPS algorithm). Since the DNA wrapped on nucleosome is about147bps, this option can be used:
--nomodel --shift 37 --extsize 73
.
--keep-dup
It controls the MACS behavior towards duplicate tags at the exact samelocation -- the same coordination and the same strand. The default'auto' option makes MACS calculate the maximum tags at the exact samelocation based on binomial distribution using 1e-5 as p-value cutoff;and the 'all' option keeps every tags. If an integer is given, atmost this number of tags will be kept at the same location. Thedefault is to keep one tag at the same location. Default: 1
--broad
When this flag is on, MACS will try to composite broad regions inBED12 ( a gene-model-like format ) by putting nearby highly enrichedregions into a broad region with loose cutoff. The broad region iscontrolled by another cutoff through --broad-cutoff
. The maximumlength of broad region length is 4 times of d from MACS. DEFAULT:False
Chip Timing Uk
--broad-cutoff
Cutoff for the broad region. This option is not available unless--broad
is set. If -p
is set, this is a p-value cutoff, otherwise,it's a q-value cutoff. DEFAULT: 0.1
--scale-to <large small>
When set to 'large', linearly scale the smaller dataset to the samedepth as larger dataset. By default or being set as 'small', thelarger dataset will be scaled towards the smaller dataset. Beware, toscale up small data would cause more false positives.
-B/--bdg
If this flag is on, MACS will store the fragment pileup, controllambda in bedGraph files. The bedGraph files will be stored in thecurrent directory named NAME_treat_pileup.bdg
for treatment data,NAME_control_lambda.bdg
for local lambda values from control.
--call-summits
MACS will now reanalyze the shape of signal profile (p or q-scoredepending on the cutoff setting) to deconvolve subpeaks within eachpeak called from the general procedure. It's highly recommended todetect adjacent binding events. While used, the output subpeaks of abig peak region will have the same peak boundaries, and differentscores and peak summit positions.
Chip Timing Companies
--buffer-size
MACS uses a buffer size for incrementally increasing internal arraysize to store reads alignment information for each chromosome orcontig. To increase the buffer size, MACS can run faster but willwaste more memory if certain chromosome/contig only has very fewreads. In most cases, the default value 100000 works fine. However, ifthere are a large number of chromosomes/contigs in your alignment andreads per chromosome/contigs are few, it's recommended to specify asmaller buffer size in order to decrease memory usage (but it willtake longer time to read alignment files). Minimum memory requestedfor reading an alignment file is about # of CHROMOSOME * BUFFER_SIZE *8 Bytes. DEFAULT: 100000
Output files
NAME_peaks.xls
is a tabular file which contains information aboutcalled peaks. You can open it in excel and sort/filter using excelfunctions. Information include:- chromosome name
- start position of peak
- end position of peak
- length of peak region
- absolute peak summit position
- pileup height at peak summit
- -log10(pvalue) for the peak summit (e.g. pvalue =1e-10, thenthis value should be 10)
- fold enrichment for this peak summit against random Poissondistribution with local lambda,
- -log10(qvalue) at peak summit
Coordinates in XLS is 1-based which is different from BEDformat. When
--broad
is enabled for broad peak calling, thepileup, p-value, q-value, and fold change in the XLS file will bethe mean value across the entire peak region, since peak summitwon't be called in broad peak calling mode.Deadlands reloaded pdf download. Gather up your posse, strap on your Peacemaker, and get ready to ride!Deadlands: Reloaded is the setting and core rules book for the Weird West line. From the back of the core rulebook:'Welcome to the Weird West!Strap on your six-gun and saddle up, partner, because there's Hell to pay and the currency is hot lead!The Tombstone Epitaph has always been filled with lurid tales of daring desperadoes and deadly drifters, but lately the West's most read tabloid claims there's something more sinister stalking the frontier's lonely plains: Monsters.Fortunately, where there are monsters, there are heroes. And if they fight hard enough, they might just discover the identity of the mysterious Reckoners some say are behind it all.There's Hell on the High Plains, amigo. It includes new rules for shootouts at High Noon, new Edges, Hindrances, and powers, a savage change to melee damage, a complete history of the Weird West, and more creepy critters than you can throw a tomahawk at.' Squint-eyed gunfighters, card chucking hexslingers, savage braves, and righteous padres have all answered the call.
NAME_peaks.narrowPeak
is BED6+4 format file which contains thepeak locations together with peak summit, p-value, and q-value. Youcan load it to the UCSC genome browser. Definition of some specificcolumns are:- 5th: integer score for display. It's calculated as
int(-10*log10pvalue)
orint(-10*log10qvalue)
depending onwhether-p
(pvalue) or-q
(qvalue) is used as scorecutoff. Please note that currently this value might be out of the[0-1000] range defined in UCSC ENCODE narrowPeakformat. Youcan let the value saturated at 1000 (i.e. p/q-value = 10^-100) byusing the following 1-liner awk:awk -v OFS='t' '{$5=$5>1000?1000:$5} {print}' NAME_peaks.narrowPeak
- 7th: fold-change at peak summit
- 8th: -log10pvalue at peak summit
- 9th: -log10qvalue at peak summit
- 10th: relative summit position to peak start
The file can be loaded directly to the UCSC genome browser. Removethe beginning track line if you want to analyze it by other tools.
- 5th: integer score for display. It's calculated as
NAME_summits.bed
is in BED format, which contains the peaksummits locations for every peak. The 5th column in this file isthe same as what is in thenarrowPeak
file. If you want to findthe motifs at the binding sites, this file is recommended. The filecan be loaded directly to the UCSC genome browser. Remove thebeginning track line if you want to analyze it by other tools.NAME_peaks.broadPeak
is in BED6+3 format which is similar tonarrowPeak
file, except for missing the 10th column forannotating peak summits. This file and thegappedPeak
file willonly be available when--broad
is enabled. Since in the broadpeak calling mode, the peak summit won't be called, the values inthe 5th, and 7-9th columns are the mean value across all positionsin the peak region. Refer tonarrowPeak
if you want to fix thevalue issue in the 5th column.NAME_peaks.gappedPeak
is in BED12+3 format which contains boththe broad region and narrow peaks. The 5th column is the score forshowing grey levels on the UCSC browser as innarrowPeak
. The 7this the start of the first narrow peak in the region, and the 8thcolumn is the end. The 9th column should be RGB color key, however,we keep 0 here to use the default color, so change it if youwant. The 10th column tells how many blocks including the starting1bp and ending 1bp of broad regions. The 11th column shows thelength of each block and 12th for the start of each block. 13th:fold-change, 14th: -log10pvalue, 15th: -log10qvalue. The file canbe loaded directly to the UCSC genome browser. Refer tonarrowPeak
if you want to fix the value issue in the 5th column.NAME_model.r
is an R script which you can use to produce a PDFimage of the model based on your data. Load it to R by:$ Rscript NAME_model.r
Then a pdf file
NAME_model.pdf
will be generated in your currentdirectory. Note, R is required to draw this figure.The
NAME_treat_pileup.bdg
andNAME_control_lambda.bdg
files arein bedGraph format which can be imported to the UCSC genome browseror be converted into even smaller bigWig files. TheNAME_treat_pielup.bdg
contains the pileup signals (normalizedaccording to--scale-to
option) from ChIP/treatment sample. TheNAME_control_lambda.bdg
contains local biases estimated for eachgenomic location from the control sample, or from treatment samplewhen the control sample is absent. The subcommandbdgcmp
can beused to compare these two files and make a bedGraph file of scoressuch as p-value, q-value, log-likelihood, and log fold changes.
Other useful links
Mac S Chip Timing
Tips of fine-tuning peak calling
There are several subcommands within MACSv2 package to fine-tune orcustomize your analysis:
bdgcmp
can be used on*_treat_pileup.bdg
and*_control_lambda.bdg
or bedGraph files from other resources tocalculate the score track.bdgpeakcall
can be used on*_treat_pvalue.bdg
or the filegenerated from bdgcmp or bedGraph file from other resources to callpeaks with given cutoff, maximum-gap between nearby mergeable peaksand a minimum length of peak. bdgbroadcall works similarly tobdgpeakcall, however, it will output_broad_peaks.bed
in BED12format.Differential calling tool --
bdgdiff
, can be used on 4 bedGraphfiles which are scores between treatment 1 and control 1, treatment2 and control 2, treatment 1 and treatment 2, treatment 2 andtreatment 1. It will output consistent and unique sites accordingto parameter settings for minimum length, the maximum gap andcutoff.You can combine subcommands to do a step-by-step peak calling. Readdetail at MACS2wikipage