.. EpiMapper documentation master file, created by sphinx-quickstart on Thu Nov 23 12:56:02 2023. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. .. image:: content/figures/oslo-universitetssykehus-ous-logo.png :width: 150 :target: https://www.oslo-universitetssykehus.no/ .. image:: content/figures/uio_segl_a.png :width: 150 :target: https://www.uio.no/ .. image:: content/figures/helse-sor-ost.jpg :width: 150 :target: https://www.helse-sorost.no/ .. image:: content/figures/ahhus.png :width: 150 :target: https://www.ahus.no/ ================= EpiMapper ================= A Python package containing a full data analysis pipeline of ChIP-seq, CUT&RUN, ATAC-seq and CUT&Tag data. Contents: --------- .. toctree:: :maxdepth: 1 content/Installation content/fastqc content/bowtie2_alignment content/remove_duplicates content/fragment_length content/filtering content/spike_in_calibration content/peak_calling content/heatmap content/differential_analysis content/Demos content/Required_files_and_where_to_find_them content/further_analysis_example **positional arguments**: - **task**: Pipeline task to run **optional arguments**: - **-h, --help**: Show this help message and exits Usage: epimapper [] Tasks available for using: -------------------------------- - **fastqc**: Performs quality control on raw reads fastq files from high-thoughput sequencing. - **bowtie2_alignment**: Mapping reads to a reference genome with Bowtie2 alignment of fastq sequencing reads files from high-thoughput sequecing, and visulizing results. - **remove_duplicates**: Remove duplicated reads mapped to the same place in of a reference genome during alignment, and visulizing results. - **fragment_length**: Evaluation of mapped fragment length distribution of input SAM files exported from high-thoughput sequencing alignment, and visulizing results. - **spike_in_calibration**: Removes experimental bias by normalizing fragment counts based on sequencing depth to a spike-in genome and visulizes results. - **peak_calling**: Finds enriched regions/calls for peaks from chromatin profiling data with SEACR or MACS2, then visulizes results. - **heatmap**: Visualizes the enrichment of target protein in predefined genomic regions and peaks by creating heatmaps. - **differential_analysis**:Preforms differntial analysis on enriched reagions/peaks before annotating the stastitically significant changes to spesific genomic reagions and visulizing the results. Files needed for a complete run of the pipeline: ------------------------------------------------ **Required**: - **FASTQ**: Text files that contains the sequence data from next generation sequencing. The files must be stored togheter in one single directory. - **Chromosome Sizes**: Text file containin one column with information the chromosome sizes of the genome. - **Genome Blacklist**: A BED file conaining genome that should be avoided in the data analysis (i.e highly repetative regions) - **Genome RefFlat Reference**: A text (.txt) file containing the reference genome in RefFlat format. *either* - **Bowtie2 Indexing Files**: Input file folder of Bowtie2 reference genome indexing files. *or* - **Reference Genome FASTA**: Input reference genome FASTA file for the creation of Bowtie2 reference genome. **Optional**: - **Enchancher**: A sorted BED file containing the enhancer regions. This will be used as a refereence for annotation of the differntially bound marks. Name requirements for files: -------------------------------- All input FASTQ files must follow a naming protocal, this is to ensure smooth transitions between the pipeline´s functions. The naming protocal is as following: [SAMPEL-NAME]_rep[REPLICATION-NUMBER]_[R1/R2]_[(TECHNICAL-REPLICATE)].fastq The sampl name section must not contain underscore "_" or dot "." in the name. If there are none technical replicates, just skip the number. Some examples: - H4K3me3_rep1_R2.fastq - ZNF143-Control-48h_rep1_R1.fastq - IgG_rep1_R2_001.fastq - Control-sample_rep2_R2.fastq - H4K27me3_rep1_R2_3.fastq - Any-thing-you-want-except-underscore-or-dot_rep2_R2.fastq Output created by the pipeline -------------------------------- After a complete run of the EpiMapper pipeline you will be left with several newly created folders following the directory structure below. .. image:: content/figures/output.png :width: 550 As the figure shows, and to make it eaiser for the user the main output directory "Epimapper" contains all the output from every subfunction. Please see each functions´documentation page to gain further knowledge about the output file of each repective function.