"Invalid format: plink" after filtering plink dosage files and saving output as plink2 fileset (pgen, pvar, psam) #398

mathewrm · 2024-12-17T07:13:45Z

mathewrm
Dec 17, 2024

Hi all,

I have PLINK dosage files (e.g., CHT1_2_chr1.plink.dosage.gz), one for each chromosome (1 to 22), containing genetic data for two cohorts, CHT_1 and CHT_2. In the same directory, there are corresponding .fam and .map files for each chromosome. Initially, I provided filepaths to the chromosomes with data of the two cohorts in the "path_prefix" column of the samplesheet. Then, on running pgsc_calc, I got an error message saying that 2 sample sets had been detected.

So, I wrote a script to filter out samples from CHT in which I am interested. These samples have "IID", the 1st column of the .fam file, containing the string "CHT1". Here is the code:

---

module load plink2

input_dir="/path/to/location/of/dosage_files/"
output_dir="/path/to/destination/of/output/files/"
mkdir -p $output_dir

for chr in {1..22}; do
        input_file="${input_dir}/CHT1_2_chr${chr}.plink.dosage.gz"
        fam_file="${input_dir}/CHT1_2_chr${chr}.plink.fam"
        temp_prefix="${output_dir}/temp_chr${chr}"
        output_prefix="${output_dir}/CHT1_chr${chr}"

# Convert dosage file to PLINK 2 format
plink2 --import-dosage $input_file \
       --fam $fam_file \
       --make-pgen \
       --out $temp_prefix

# Filter for CHT 1 samples
plink2 --pfile $temp_prefix \
       --keep-fam <(awk '$1 ~ /CHT1/' $fam_file) \
       --make-pgen \
       --out $output_prefix

# Clean up temporary files
rm ${temp_prefix}.pgen ${temp_prefix}.pvar ${temp_prefix}.psam
done

---

Then I run the pgsc_calc as follows, with the "path_prefix" in the sample sheet pointing to the directory containing the plink2 fileset (pgen, pvar, psam) without the extensions:

---

nextflow run main.nf \
   -profile singularity \
   --input samplesheet.csv --target_build GRCh38 \
   --scorefile "/path/to/PGS_scoring_file/PGS003727.txt.gz"

---

Then, I get the following message: "Invalid format: plink"

I also keep getting this "WARNING: Could not load nf-core/config profiles: https://raw.githubusercontent.com/nf-core/configs/master/nfcore_custom.config" (I got in the first instance too. Could it be because I am running the pipeline in offline mode?)

System Information

Version: 24.10.2 build 5932
Created: 27-11-2024 21:23 UTC (28-11-2024 07:23 AEDT)
System: Linux 3.10.0-1160.119.1.el7.x86_64
Runtime: Groovy 4.0.23 on OpenJDK 64-Bit Server VM 21.0.1+12-29
Encoding: UTF-8 (UTF-8)

PBS executor
apptainer/singularity 3.11.4

Answered by smlmbrt

Dec 17, 2024

@mathewrm, if you're supplying .pgen the format in the sample sheet needs to be pfile (see docs).

View full answer

smlmbrt · 2024-12-17T10:01:12Z

smlmbrt
Dec 17, 2024
Maintainer

@mathewrm, if you're supplying .pgen the format in the sample sheet needs to be pfile (see docs).

2 replies

mathewrm Dec 17, 2024
Author

@smlmbrt This solves it. The error was due a typo - a space between p and file. Your response made me look twice!

On running the code, I get several errors which I can't make much sense of:

executor > local (3)
[- ] PGS…CALC:INPUT_CHECK:COMBINE_SCOREFILES | 0 of 1
[- ] PGS…C:MAKE_COMPATIBLE:PLINK2_RELABELBIM -
[23/37d243] PGS…_RELABELPVAR (cohort1 chromosome 3) | 0 of 22
[- ] PGS…PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF -
[- ] PGS…CCALC:PGSCCALC:MATCH:MATCH_VARIANTS -
[- ] PGS…SCCALC:PGSCCALC:MATCH:MATCH_COMBINE -
[- ] PGS…C:PGSCCALC:APPLY_SCORE:PLINK2_SCORE -
[- ] PGS…GSCCALC:APPLY_SCORE:SCORE_AGGREGATE -
[- ] PGS…SCCALC:PGSCCALC:REPORT:SCORE_REPORT -
[- ] PGS…CCALC:PGSCCALC:DUMPSOFTWAREVERSIONS -
ERROR ~ Error executing process > 'PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (cohort1 chromosome 2)'

Caused by:
Process PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (cohort1 chromosome 2) terminated with an error exit status (255)

Command executed:

plink2
--threads 2
--memory 8192
--freq
--missing vcols=fmissdosage,fmiss
--new-id-max-allele-len 100 missing --allow-extra-chr
--set-all-var-ids '@:#:$r:$a'
--var-id-multi @:#
--pfile CHT_1_chr2
executor > local (3)
[- ] PGS…CALC:INPUT_CHECK:COMBINE_SCOREFILES | 0 of 1
[- ] PGS…C:MAKE_COMPATIBLE:PLINK2_RELABELBIM -
[23/37d243] PGS…_RELABELPVAR (cohort1 chromosome 3) | 2 of 22, failed: 2
[- ] PGS…PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF -
[- ] PGS…CCALC:PGSCCALC:MATCH:MATCH_VARIANTS -
[- ] PGS…SCCALC:PGSCCALC:MATCH:MATCH_COMBINE -
[- ] PGS…C:PGSCCALC:APPLY_SCORE:PLINK2_SCORE -
[- ] PGS…GSCCALC:APPLY_SCORE:SCORE_AGGREGATE -
[- ] PGS…SCCALC:PGSCCALC:REPORT:SCORE_REPORT -
[- ] PGS…CCALC:PGSCCALC:DUMPSOFTWAREVERSIONS -
Execution cancelled -- Finishing pending tasks before exit
ERROR ~ Error executing process > 'PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (cohort1 chromosome 2)'

Caused by:
Process PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (cohort1 chromosome 2) terminated with an error exit status (255)

Command executed:

plink2
--threads 2
--memory 8192
--freq
--missing vcols=fmissdosage,fmiss
--new-id-max-allele-len 100 missing --allow-extra-chr
--set-all-var-ids '@:#:$r:$a'
--var-id-multi @:#
--pfile CHT_1_chr2
executor > local (3)
[- ] PGS…CALC:INPUT_CHECK:COMBINE_SCOREFILES | 0 of 1
[- ] PGS…C:MAKE_COMPATIBLE:PLINK2_RELABELBIM -
[23/84593e] PGS…_RELABELPVAR (cohort1 chromosome 1) | 3 of 22, failed: 2
[- ] PGS…PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF -
[- ] PGS…CCALC:PGSCCALC:MATCH:MATCH_VARIANTS -
[- ] PGS…SCCALC:PGSCCALC:MATCH:MATCH_COMBINE -
[- ] PGS…C:PGSCCALC:APPLY_SCORE:PLINK2_SCORE -
[- ] PGS…GSCCALC:APPLY_SCORE:SCORE_AGGREGATE -
[- ] PGS…SCCALC:PGSCCALC:REPORT:SCORE_REPORT -
[- ] PGS…CCALC:PGSCCALC:DUMPSOFTWAREVERSIONS -
Execution cancelled -- Finishing pending tasks before exit
-[pgscatalog/pgsc_calc] Pipeline completed with errors-
ERROR ~ Error executing process > 'PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (cohort1 chromosome 2)'

Caused by:
Process PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (cohort1 chromosome 2) terminated with an error exit status (255)

Command executed:

plink2
--threads 2
--memory 8192
--freq
--missing vcols=fmissdosage,fmiss
--new-id-max-allele-len 100 missing --allow-extra-chr
--set-all-var-ids '@:#:$r:$a'
--var-id-multi @:#
--pfile CHT_1_chr2
executor > local (3)
[- ] PGS…CALC:INPUT_CHECK:COMBINE_SCOREFILES | 0 of 1
[- ] PGS…C:MAKE_COMPATIBLE:PLINK2_RELABELBIM -
[23/84593e] PGS…_RELABELPVAR (cohort1 chromosome 1) | 3 of 22, failed: 2
[- ] PGS…PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF -
[- ] PGS…CCALC:PGSCCALC:MATCH:MATCH_VARIANTS -
[- ] PGS…SCCALC:PGSCCALC:MATCH:MATCH_COMBINE -
[- ] PGS…C:PGSCCALC:APPLY_SCORE:PLINK2_SCORE -
[- ] PGS…GSCCALC:APPLY_SCORE:SCORE_AGGREGATE -
[- ] PGS…SCCALC:PGSCCALC:REPORT:SCORE_REPORT -
[- ] PGS…CCALC:PGSCCALC:DUMPSOFTWAREVERSIONS -
Execution cancelled -- Finishing pending tasks before exit
-[pgscatalog/pgsc_calc] Pipeline completed with errors-
ERROR ~ Error executing process > 'PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (cohort1 chromosome 2)'

Caused by:
Process PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (cohort1 chromosome 2) terminated with an error exit status (255)

Command executed:

plink2
--threads 2
--memory 8192
--freq
--missing vcols=fmissdosage,fmiss
--new-id-max-allele-len 100 missing --allow-extra-chr
--set-all-var-ids '@:#:$r:$a'
--var-id-multi @:#
--pfile CHT_1_chr2
--make-just-pvar zs cols="-xheader,-maybequal,-maybefilter,-maybeinfo,-maybecm"
--out GRCh38_cohort1_2

#-a: cross platform (mac, linux) method of preserving symlinks
#|| true: if file exists, ignore error, will be handled by includeInputs

cp -a CHT_1_chr2.pgen GRCh38_cohort1_2.pgen || true
cp -a CHT_1_chr2.psam GRCh38_cohort1_2.psam || true

gzip GRCh38_cohort1_2.vmiss
gzip GRCh38_cohort1_2.afreq

cat <<-END_VERSIONS > versions.yml
PLINK2_RELABELPVAR:
plink2: $(plink2 --version 2>&1 | sed 's/^PLINK v//; s/ 64.*$//' )
END_VERSIONS

Command exit status:
255

Command output:
(empty)

Command error:
FATAL: container creation failed: mount /proc/self/fd/4->/scratch/state/singularity/mnt/session/rootfs error: while mounting image /proc/self/fd/4: failed to find loop device: could not attach image file to loop device: no loop devices available
cp: ‘.command.out’ and ‘.command.out’ are the same file
cp: ‘.command.err’ and ‘.command.err’ are the same file
cp: cannot stat ‘.command.trace’: No such file or directory

Work dir:
/XXXXXXXXXXX/pgsc_calc2/work/c5/7c8bf42def4baec9e74ae0a861ee52

Container:
/XXXXXXXXX/pgsc_calc2/work/singularity/ghcr.io-pgscatalog-plink2-2.00a5.10-singularity.img

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details

nebfield Dec 18, 2024
Maintainer

Your computer is having trouble creating singularity containers:

 FATAL: container creation failed: mount /proc/self/fd/4->/scratch/state/singularity/mnt/session/rootfs error: while mounting image /proc/self/fd/4: failed to find loop device: could not attach image file to loop device: no loop devices available

Are you able to run the test profile with singularity? e.g. nextflow run pgscatalog/pgsc_calc -r v2.0.0 -profile test,singularity

If you can't run the test profile it means there's a problem with your computer's singularity configuration. In that case you might need to talk to a HPC admin or try the conda profile.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Invalid format: plink" after filtering plink dosage files and saving output as plink2 fileset (pgen, pvar, psam) #398

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

"Invalid format: plink" after filtering plink dosage files and saving output as plink2 fileset (pgen, pvar, psam) #398

mathewrm Dec 17, 2024

---

---

---

---

Replies: 1 comment · 2 replies

smlmbrt Dec 17, 2024 Maintainer

mathewrm Dec 17, 2024 Author

nebfield Dec 18, 2024 Maintainer

mathewrm
Dec 17, 2024

Replies: 1 comment 2 replies

smlmbrt
Dec 17, 2024
Maintainer

mathewrm Dec 17, 2024
Author

nebfield Dec 18, 2024
Maintainer