Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataprep result is empty #232

Open
Seongmin-Jang-1165 opened this issue Nov 20, 2024 · 4 comments
Open

dataprep result is empty #232

Seongmin-Jang-1165 opened this issue Nov 20, 2024 · 4 comments

Comments

@Seongmin-Jang-1165
Copy link

hello developer!

i ran xpore dataprep wiht my direct RNAseq data generated with SQK-RNA004 kit

but the output file is empty and i cannot identify what the problem is..

can you advise me about this problem..?

run log.out.txt

@yuukiiwa
Copy link
Collaborator

Hi @Seongmin-Jang-1165,

Can you provide the following, please?

first 10 lines from xpore dataprep eventalign_index

head eventalign.index

first 10 lines from nanopolish eventalign.txt

head eventalign.txt

first 10 lines from the gtf file

head [annotation.gtf]

Thanks!

Best wishes,
Yuk Kei

@Moretta1
Copy link

Moretta1 commented Dec 22, 2024

Updated at 23rd Dec.

Hi, I have solved the following error by running cmd:
python -m xxx.xxx.xxx(pth of xpore code).xpore dataprep --eventalign eventalign_file.txt --out_dir output_pth

============================================================================

Hi, I have come across the same question:

I tried the following steps before running the cmd xpore-dataprep:

dataset: mm39, WT and KO, here use ko as an example

pre-processing:

1. multi-fast5 to single-fast5:

multi_to_single_fast5 -i demo/guppy -s demo/guppy_single -t 40 --recursive

2. basecalling:

guppy_basecaller -i /data/fast5_data/mm_WT/single_fast5/ -s ko.guppy/ --config ~/nanopore_methods/ont-guppy/data/rna_r9.4.1_70bps_hac.cfg -r --num_callers 4 --cpu_threads_per_caller 2 --device auto
cat ko.guppy/pass/*.fastq > ko.fastq

3. minimap2 generates .sam file:

minimap2 -ax map-ont -k 14 GRCm38.transcripts.fa -t 25 --secondary=no /data/fast5_data/mm_KO/ko.fastq -o /data/fast5_data/mm_KO/ko.sam

4. minimap generates .bam file:

samtools view -@ 30 -F 2048 -F 4 -b ko.sam | samtools sort -O BAM -@ 20  -o ko.bam

samtools index -@ 16 ko.bam  # generate index

5. nanopolish

first generate index: nanopolish index -d <PATH/TO/FAST5_DIR> <PATH/TO/FASTQ_FILE>

nanopolish index -d single_fast5/ ko.fastq > index.log 2>&1

then eventalign:

nanopolish eventalign --read ko.fastq \
--bam ko.bam \
--genome ~/reference_fa/mm10/GRCm38.transcripts.fa \
--scale-events \
--signal-index \
--summary ko_summary.txt \
--threads 50 \
> ~/nanopore_methods/xpore/nanopolish_files/ko_eventalign.log \
2>&1

xpore processing:

# For mm_WT, it follows the same previous steps,
# here I just run one set for checking whether it could work.

xpore-dataprep --eventalign /data/fast5_data/mm_WT/wt_eventalign.txt \
    --summary /data/fast5_data/mm_WT/wt_summary.txt \
    --out_dir ~/nanopore_methods/xpore/wt/ \
    --n_processes 4 --readcount_max 20000 > ~/nanopore_methods/xpore/wt/xpore_dataprep.log 2>&1

The output of nanopolish eventalign step are like:

(/home/rlwang/m6a) rlwang@dell-tower-server:/data/fast5_data/mm_WT$ head wt_eventalign.txt -n 3
contig  position        reference_kmer  read_index      strand  event_index     event_level_mean    event_stdv       event_length    model_kmer      model_mean      model_stdv      standardized_level  start_idx        end_idx
ENSMUST00000130201      548     TGTTA   20      t       5       104.38  2.869   0.00697 TGTTA   106.43       7.49    -0.23   16780   16801
ENSMUST00000130201      548     TGTTA   20      t       6       110.41  6.096   0.00963 TGTTA   106.43       7.49    0.44    16751   16780
(/home/rlwang/m6a) rlwang@dell-tower-server:/data/fast5_data/mm_WT$ head wt_summary.txt -n 3
read_index      read_name       fast5_path      model_name      strand  num_events      num_steps   num_skips        num_stays       total_duration  shift   scale   drift   var
20      571a5dee-2649-41de-8bb3-c65aae7359f6    /data/fast5_data/mm_WT/single_fast5/all_fast5/571a5dee-2649-41de-8bb3-c65aae7359f6.fast5             template        453     226     4       222     2.61-2.967   0.903   0.000   1.423
37      0ce4f4e3-19aa-4aed-a8bf-86f3ec729fce    /data/fast5_data/mm_WT/single_fast5/all_fast5/0ce4f4e3-19aa-4aed-a8bf-86f3ec729fce.fast5             template        1929    951     20      957     11.674.012   0.955   0.000   1.266

Then we I run the xpore dataprep processing cmd, I got the outputs like:

(/home/rlwang/m6a) rlwang@dell-tower-server:~/nanopore_methods/xpore/wt$ ls
eventalign.hdf5  eventalign.log  xpore_dataprep.log
(/home/rlwang/m6a) rlwang@dell-tower-server:~/nanopore_methods/xpore/wt$ du -sh *
4.0K    eventalign.hdf5
0       eventalign.log
36K     xpore_dataprep.log

xpore_dataprep.log is like:

(base) rlwang@dell-tower-server:~/nanopore_methods/xpore/wt$ ls
eventalign.hdf5  eventalign.log  xpore_dataprep.log
(base) rlwang@dell-tower-server:~/nanopore_methods/xpore/wt$ tail -f xpore_dataprep.log 
    obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
  File "/home/rlwang/m6a/lib/python3.6/site-packages/pandas/core/indexing.py", line 1099, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "/home/rlwang/m6a/lib/python3.6/site-packages/pandas/core/indexing.py", line 1037, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
  File "/home/rlwang/m6a/lib/python3.6/site-packages/pandas/core/indexing.py", line 1240, in _get_listlike_indexer
    indexer, keyarr = ax._convert_listlike_indexer(key)
  File "/home/rlwang/m6a/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 2400, in _convert_listlike_indexer
    raise KeyError(f"{keyarr[mask]} not in index")
KeyError: "['08be73d2-2dcc-4c45-b572-8ce3b807c2a1'] not in index"

but this record could be found in wt_summary.txt file:

(base) rlwang@dell-tower-server:/data/fast5_data/mm_WT$ grep '08be73d2-2dcc-4c45-b572-8ce3b807c2a1' wt_summary.txt
6	08be73d2-2dcc-4c45-b572-8ce3b807c2a1	/data/fast5_data/mm_WT/single_fast5/all_fast5/08be73d2-2dcc-4c45-b572-8ce3b807c2a1.fast5		template	2467	1262	25	1179	15.35	1.159	0.939	0.000	1.289

@yuukiiwa
Copy link
Collaborator

yuukiiwa commented Jan 3, 2025

Hi @Seongmin-Jang-1165,

Sorry for the delayed reply! Just came back from vacation.

Can you update xpore, please? xpore-dataprep is deprecated.

Also, you will need to indicate the RNA004 kmer model when you get to the xpore diffmod step:
https://github.com/GoekeLab/xpore/blob/RNA004_kmer_model/xpore/diffmod/RNA004_5mer_model.txt

Thanks!

Best wishes,
Yuk Kei

@Seongmin-Jang-1165
Copy link
Author

@yuukiiwa Sorry for late reply...

I'll attach the information that you requested

and

can you tell me how to indicate the RNA004 model when I run the xpore diffmod?? is there specific code for this??

Xpore_code.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants