=====================
spike_in_calibration
=====================

.. contents::
    :local:

Removes experimental bias by normalizing fragment counts based on sequencing depth to a spike-in genome and visulizes results.

Arguments
==========

**Required arguments:**

- ``-b, --bed``: Input file folder of filterd BED files for normalization
- ``-ss, --sam_spike_in``: Input file folder of SAM files exported from alignment to a spike in genome.
- ``-cs, --chromosome_sizes``: Input file of sorted chromosome sizes information.


**Optional arguments:**

- ``-tbl, --fragment_table ALIGNMENT SUMMARY TABLE``:Input CSV file containing the following columns = ["Sample",	"Replication", "SequencingDepth", "MappedFragments", "AlignmentRate", "MappedFragments_SpikeIn",	"AlignmentRate_SpikeIn"] with corresponding sample information , default = “bowtie2_alignment_ref_and_spike_in.csv” exported by this pipeline function: ``bowtie2_alignment``.
- ``-o, --out_dir``: Output directory, default = current working directory.

Example usage
==============

The function will assume that the “bowtie2_alignment_ref_and_spike_in.csv" file is present $out_dir/"Epimapper/summary_tables". Therefore, it is important to use the same output directory "-o/--out_dir" as the one you utilized for the spike-in alignment. This will make sure that the fucntion will find the table.

.. code-block:: bash
    
    $ epimapper spike_in_calibration -b /Users/me/results/Epimapper/alignment/bed -ss /Users/me/results/Epimapper/alignment/sam_spike_in -cs /Users/me/in_folder/hg38_chromosome_sizes.txt -o /Users/me/results

If you want a differnet output directory you may choose to input the path to the table manually:

.. code-block:: bash
    
    $ epimapper spike_in_calibration -b /Users/me/results/Epimapper/alignment/bed -tbl /Users/me/results/Epimapper/summary_tables/bowtie2_alignment_ref_and_spike_in.csv -ss /Users/me/results/Epimapper/alignment/sam_spike_in -cs /Users/me/in_folder/hg38_chromosome_sizes.txt -o /Users/me/results 


If you have not used this pipelines ``bowtie2_alignment`` to preform the reference genome and spike-in alignment, you must manually create a summary table containing the following columns:["Sample", "Replication", "SequencingDepth", "MappedFragments", "AlignmentRate", "MappedFragments_SpikeIn",	"AlignmentRate_SpikeIn"] with corresponding infromation for each sample. 
Therefore, it is recommended to use the pipeline as a whole to avoid any manual labor.


.. code-block:: bash
    
    $ epimapper spike_in_calibration -b /Users/me/results/Epimapper/alignment/bed -tbl /Users/me/results/my_table.csv -ss /Users/me/results/sam_spike_in -cs /Users/me/in_folder/hg38_chromosome_sizes.txt -o /Users/me/results 


Output
=======

Like all the other functions in EpiMapper Python package, the function will create a main ``Epimapper`` output directiry, if it is not already present in the chosen output directory. Further, this function will create a "bedgraph" folder to store the spike-in normalized files. Further, this function will create a summay table and a PNG figure with boxplots of spike-in scaling factors and normalized fragment count.

.. code-block:: bash

    Epimapper
    |- alignment
    |   |- bedgraph
    |   |   |- "sample-name".fragments.normalized.bedgraph
    |- summary_tables
    |   |- spike_in_calibration_summary.csv 
    |   |- spike_in_calibration.png