Skip to content

baudisgroup/tum2pop-mapping

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tum2pop-mapping

Population origin mapping from cancer SNP profile into 5 continental groups as defined in 1000 Genomes Project. This tool supports mapping from B-allele frequency data generated with 9 Affymetrix SNP array platforms as input and a population assignment to one of the five continental groups -- AFR(Africa), EUR (Europe), AMR(south America), EAS (East Asia), SAS (SAS). The currently supported genome version is GRCh37 (hg19). A mapping to other genome versions is planned.

Docker version installation

The easiest way is to use docker application. First, install Docker application:

docker pull qingyao/tum2pop

Usage

First, you would like to create a working directory $hostdir (use absolute path) to place your input files and to receive the output from the pipeline.

docker run -it --rm --mount type=bind,source=$hostdir,target=/data qingyao/tum2pop

Example

You need to download the /test folder [here] (https://github.com/baudisgroup/tum2pop-mapping/tree/master/test_data) and copy the absolute path as $test_dir.

docker run -it --rm --mount type=bind,source=$test_dir,target=/data qingyao/tum2pop
Rscript --vanilla run_pop.r -i BAF -o CONT -p Mapping250K_Nsp

After entering the interactive mode of the container, you can place your input files in $hostdir/input directory. Then:

Rscript --vanilla run_pop.r <parameters>

Then you will receive in the /results folder under $hostdir your mapping results.

Options

-i --input TEXT input as B allele frequency file format (BAF), or genotype calling format (GC) -p --platform TEXT SNP array platform -o --output TEXT output as 6 theoretical fractions (FRAC), or standard output as ratio of 5 continents and a voting result (CONT)

The current pipeline supports 9 SNP array platforms from Affymetrix:

  • Mapping10K_Xba142
  • Mapping50K_Hind240
  • Mapping50K_Xba240
  • Mapping250K_Nsp
  • Mapping250K_Sty
  • GenomeWideSNP_5
  • GenomeWideSNP_6
  • CytoScan750K_Array
  • CytoScanHD_Array

The input file should be tab separated. There should be 4 columns: ID (SNP ID or simply indicating row number), chromosome (1-23), nucleotide base position, and a value column (a number within 0-1 if BAF format, or AA/AB/BB if GC format).

Example for BAF input format:

ID CHRO BASEPOS VALUE SNP_A-2131660 1 1220751 0.3487 SNP_A-1967418 1 2302812 0.9451 SNP_A-1969580 1 2398125 1.0000 SNP_A-4263484 1 2622185 0.4612 . . . .

Example for GC input format:

ID CHRO BASEPOS VALUE SNP_1 1 1220751 AB SNP_2 1 2302812 BB SNP_3 1 2398125 BB SNP_4 1 2622185 AB . . . .

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • OpenEdge ABL 96.6%
  • q 3.4%