Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用infer_rec时出现问题 #71

Open
super-tian opened this issue Dec 27, 2024 · 4 comments
Open

使用infer_rec时出现问题 #71

super-tian opened this issue Dec 27, 2024 · 4 comments

Comments

@super-tian
Copy link

使用infer_rec + svtrv2推理单张图片时,出现问题:

推理代码:

from tools.infer_rec import OpenRecognizer
import yaml
from PIL import Image

config = yaml.load(open('/home/server/python/other_work/OCR/OpenOCR/models/svtrv2_smtr_gtc_ch/config.yml', 'r', encoding='utf-8'), Loader=yaml.FullLoader)
ocr = OpenRecognizer(config=config)
image = Image.open("/home/server/python/other_work/OCR/OpenOCR/test.jpg")
ocr(img_numpy=image)

image

推测是因为使用了ctc + gtc 解码,但是没进行融合或加权,一个图片推理出现了两个结果;最后输出的部分也出现了问题:

image

image

@Topdu
Copy link
Owner

Topdu commented Dec 27, 2024

The following changes to the configuration file are required for svtrv2 to perform inference:

...
  Decoder:
    name: GTCDecoder
    infer_gtc: False # True to False
    detach: False
    gtc_decoder:
...

Loss:
  name: CTCLoss # GTCLoss to CTCLoss
  # ctc_weight: 0.1 # delete
  # gtc_loss: # delete
   #  name: SMTRLoss # delete

PostProcess:
  name: CTCLabelDecode # GTCLabelDecode to CTCLabelDecode
  # gtc_label_decode: # delete
    # name: SMTRLabelDecode # delete
    # next_mode: *next # delete
  character_dict_path: *character_dict_path
  use_space_char: *use_space_char

Metric:
  name: RecMetric # RecGTCMetric to RecMetric
  main_indicator: acc
  # is_filter: True

...

Eval:
  dataset:
    name: RatioDataSetTVResize
    ds_width: True
    padding: False
    data_dir_list: ['../benchmark_bctr/benchmark_bctr_test/scene_test']
    transforms:
      - DecodeImagePIL: # load image
          img_mode: RGB
      - CTCLabelEncode: # GTCLabelEncode to CTCLabelEncode
          # gtc_label_encode: # delete
            # name: ARLabelEncode # delete
          character_dict_path: *character_dict_path
          use_space_char: *use_space_char
          max_text_length: *max_text_length
      - KeepKeys:
          # keep_keys: ['image', 'label', 'length', 'ctc_label', 'ctc_length'] to
          keep_keys: ['image', 'label', 'length']
  sampler:
    name: RatioSampler
    scales: [[128, 32]] # w, h
    # divide_factor: to ensure the width and height dimensions can be devided by downsampling multiple
    first_bs: *bs
    fix_bs: false
    divided_factor: [4, 16] # w, h
    is_training: False
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: *bs
    max_ratio: *max_ratio
    num_workers: 4

@super-tian
Copy link
Author

感谢回复,推理问题已解决;在训练中碰到了新的问题,具体如下:

系统环境:


Ubuntu 20.04.6 LTS
python 3.11.7
CUDA Device Count: 2
Device 0: Tesla V100-SXM2-32GB
Device 1: Tesla V100-SXM2-32GB
torch: 2.1.1
cudnn: 8700

推理代码:
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 tools/train_rec.py --c runs/train0/svtrv2_smtr_gtc_rctc_ch.yml
训练文件:svtrv2_smtr_gtc_rctc_ch.txt
使用预训练模型为:
image

出现了下方的异常,

(base) server@server:~/python/other_work/OCR/OpenOCR$ CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 tools/train_rec.py --c runs/train0/config.yml
/home/server/anaconda3/lib/python3.11/site-packages/torch/distributed/launch.py:181: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects `--local-rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(
[2024/12/31 16:45:50] openrec INFO: ----------- Config -----------
[2024/12/31 16:45:50] openrec INFO: Architecture : 
[2024/12/31 16:45:50] openrec INFO:     Decoder : 
[2024/12/31 16:45:50] openrec INFO:         ctc_decoder : 
[2024/12/31 16:45:50] openrec INFO:             name : RCTCDecoder
[2024/12/31 16:45:50] openrec INFO:             only_attn : False
[2024/12/31 16:45:50] openrec INFO:         detach : False
[2024/12/31 16:45:50] openrec INFO:         gtc_decoder : 
[2024/12/31 16:45:50] openrec INFO:             ds : True
[2024/12/31 16:45:50] openrec INFO:             max_len : 25
[2024/12/31 16:45:50] openrec INFO:             name : SMTRDecoder
[2024/12/31 16:45:50] openrec INFO:             next_mode : True
[2024/12/31 16:45:50] openrec INFO:             num_layer : 1
[2024/12/31 16:45:50] openrec INFO:             sub_str_len : 5
[2024/12/31 16:45:50] openrec INFO:         infer_gtc : True
[2024/12/31 16:45:50] openrec INFO:         name : GTCDecoder
[2024/12/31 16:45:50] openrec INFO:     Encoder : 
[2024/12/31 16:45:50] openrec INFO:         depths : [6, 6, 6]
[2024/12/31 16:45:50] openrec INFO:         dims : [128, 256, 384]
[2024/12/31 16:45:50] openrec INFO:         feat2d : True
[2024/12/31 16:45:50] openrec INFO:         last_stage : False
[2024/12/31 16:45:50] openrec INFO:         local_k : [[5, 5], [5, 5], [-1, -1]]
[2024/12/31 16:45:50] openrec INFO:         mixer : [['Conv', 'Conv', 'Conv', 'Conv', 'Conv', 'Conv'], ['Conv', 'Conv', 'FGlobal', 'Global', 'Global', 'Global'], ['Global', 'Global', 'Global', 'Global', 'Global', 'Global']]
[2024/12/31 16:45:50] openrec INFO:         name : SVTRv2LNConvTwo33
[2024/12/31 16:45:50] openrec INFO:         num_convs : [[2, 2, 2, 2, 2, 2], [2, 2, 2, 2, 2, 2], [3, 3, 3, 3, 3, 3]]
[2024/12/31 16:45:50] openrec INFO:         num_heads : [4, 8, 12]
[2024/12/31 16:45:50] openrec INFO:         out_channels : 256
[2024/12/31 16:45:50] openrec INFO:         sub_k : [[1, 1], [2, 1], [-1, -1]]
[2024/12/31 16:45:50] openrec INFO:         use_pos_embed : False
[2024/12/31 16:45:50] openrec INFO:     Transform : None
[2024/12/31 16:45:50] openrec INFO:     algorithm : BGPD
[2024/12/31 16:45:50] openrec INFO:     in_channels : 3
[2024/12/31 16:45:50] openrec INFO:     model_type : rec
[2024/12/31 16:45:50] openrec INFO: Eval : 
[2024/12/31 16:45:50] openrec INFO:     dataset : 
[2024/12/31 16:45:50] openrec INFO:         data_dir_list : ['datas/handwriting_ic13_test_images', 'datas/HW_Chinese_test', 'datas/HWDB2.0Test_label', 'datas/HWDB2.1Test_label']
[2024/12/31 16:45:50] openrec INFO:         ds_width : True
[2024/12/31 16:45:50] openrec INFO:         name : RatioDataSetTVResize
[2024/12/31 16:45:50] openrec INFO:         padding : False
[2024/12/31 16:45:50] openrec INFO:         transforms : 
[2024/12/31 16:45:50] openrec INFO:             DecodeImagePIL : 
[2024/12/31 16:45:50] openrec INFO:                 img_mode : RGB
[2024/12/31 16:45:50] openrec INFO:             GTCLabelEncode : 
[2024/12/31 16:45:50] openrec INFO:                 character_dict_path : ./tools/utils/ppocr_keys_v1.txt
[2024/12/31 16:45:50] openrec INFO:                 gtc_label_encode : 
[2024/12/31 16:45:50] openrec INFO:                     name : ARLabelEncode
[2024/12/31 16:45:50] openrec INFO:                 max_text_length : 25
[2024/12/31 16:45:50] openrec INFO:                 use_space_char : False
[2024/12/31 16:45:50] openrec INFO:             KeepKeys : 
[2024/12/31 16:45:50] openrec INFO:                 keep_keys : ['image', 'label', 'length', 'ctc_label', 'ctc_length']
[2024/12/31 16:45:50] openrec INFO:     loader : 
[2024/12/31 16:45:50] openrec INFO:         batch_size_per_card : 256
[2024/12/31 16:45:50] openrec INFO:         drop_last : False
[2024/12/31 16:45:50] openrec INFO:         max_ratio : 8
[2024/12/31 16:45:50] openrec INFO:         num_workers : 4
[2024/12/31 16:45:50] openrec INFO:         shuffle : False
[2024/12/31 16:45:50] openrec INFO:     sampler : 
[2024/12/31 16:45:50] openrec INFO:         divided_factor : [4, 16]
[2024/12/31 16:45:50] openrec INFO:         first_bs : 256
[2024/12/31 16:45:50] openrec INFO:         fix_bs : False
[2024/12/31 16:45:50] openrec INFO:         is_training : False
[2024/12/31 16:45:50] openrec INFO:         name : RatioSampler
[2024/12/31 16:45:50] openrec INFO:         scales : [[128, 32]]
[2024/12/31 16:45:50] openrec INFO: Global : 
[2024/12/31 16:45:50] openrec INFO:     cal_metric_during_train : False
[2024/12/31 16:45:50] openrec INFO:     character_dict_path : ./tools/utils/ppocr_keys_v1.txt
[2024/12/31 16:45:50] openrec INFO:     checkpoints : None
[2024/12/31 16:45:50] openrec INFO:     device : gpu
[2024/12/31 16:45:50] openrec INFO:     distributed : True
[2024/12/31 16:45:50] openrec INFO:     epoch_num : 100
[2024/12/31 16:45:50] openrec INFO:     eval_batch_step : [0, 2000]
[2024/12/31 16:45:50] openrec INFO:     eval_epoch_step : [0, 1]
[2024/12/31 16:45:50] openrec INFO:     grad_clip_val : 20
[2024/12/31 16:45:50] openrec INFO:     infer_img : None
[2024/12/31 16:45:50] openrec INFO:     log_smooth_window : 20
[2024/12/31 16:45:50] openrec INFO:     max_text_length : 25
[2024/12/31 16:45:50] openrec INFO:     output_dir : ./train0/rec/ch/svtrv2_smtr_gtc_rctc_ch
[2024/12/31 16:45:50] openrec INFO:     pretrained_model : models/svtrv2_smtr_gtc_ch/best.pth
[2024/12/31 16:45:50] openrec INFO:     print_batch_step : 10
[2024/12/31 16:45:50] openrec INFO:     save_epoch_step : 1
[2024/12/31 16:45:50] openrec INFO:     save_res_path : ./output/rec/predicts_smtr.txt
[2024/12/31 16:45:50] openrec INFO:     use_amp : True
[2024/12/31 16:45:50] openrec INFO:     use_space_char : False
[2024/12/31 16:45:50] openrec INFO:     use_tensorboard : False
[2024/12/31 16:45:50] openrec INFO: LRScheduler : 
[2024/12/31 16:45:50] openrec INFO:     cycle_momentum : False
[2024/12/31 16:45:50] openrec INFO:     name : OneCycleLR
[2024/12/31 16:45:50] openrec INFO:     warmup_epoch : 5
[2024/12/31 16:45:50] openrec INFO: Loss : 
[2024/12/31 16:45:50] openrec INFO:     ctc_weight : 0.1
[2024/12/31 16:45:50] openrec INFO:     gtc_loss : 
[2024/12/31 16:45:50] openrec INFO:         name : SMTRLoss
[2024/12/31 16:45:50] openrec INFO:     name : GTCLoss
[2024/12/31 16:45:50] openrec INFO: Metric : 
[2024/12/31 16:45:50] openrec INFO:     main_indicator : acc
[2024/12/31 16:45:50] openrec INFO:     name : RecGTCMetric
[2024/12/31 16:45:50] openrec INFO: Optimizer : 
[2024/12/31 16:45:50] openrec INFO:     filter_bias_and_bn : True
[2024/12/31 16:45:50] openrec INFO:     lr : 0.00065
[2024/12/31 16:45:50] openrec INFO:     name : AdamW
[2024/12/31 16:45:50] openrec INFO:     weight_decay : 0.05
[2024/12/31 16:45:50] openrec INFO: PostProcess : 
[2024/12/31 16:45:50] openrec INFO:     character_dict_path : ./tools/utils/ppocr_keys_v1.txt
[2024/12/31 16:45:50] openrec INFO:     gtc_label_decode : 
[2024/12/31 16:45:50] openrec INFO:         name : SMTRLabelDecode
[2024/12/31 16:45:50] openrec INFO:         next_mode : True
[2024/12/31 16:45:50] openrec INFO:     name : GTCLabelDecode
[2024/12/31 16:45:50] openrec INFO:     use_space_char : False
[2024/12/31 16:45:50] openrec INFO: Train : 
[2024/12/31 16:45:50] openrec INFO:     dataset : 
[2024/12/31 16:45:50] openrec INFO:         data_dir_list : ['datas/12', 'datas/handwriting_hwdb_train_images', 'datas/HW_Chinese_train', 'datas/HWDB2.0Train_label', 'datas/HWDB2.1Train_label', 'datas/HWDB2.2Train_label', 'datas/local_lmdb_split_note', 'datas/tal_ocr_eng']
[2024/12/31 16:45:50] openrec INFO:         ds_width : True
[2024/12/31 16:45:50] openrec INFO:         name : RatioDataSetTVResize
[2024/12/31 16:45:50] openrec INFO:         padding : False
[2024/12/31 16:45:50] openrec INFO:         transforms : 
[2024/12/31 16:45:50] openrec INFO:             DecodeImagePIL : 
[2024/12/31 16:45:50] openrec INFO:                 img_mode : RGB
[2024/12/31 16:45:50] openrec INFO:             PARSeqAugPIL : None
[2024/12/31 16:45:50] openrec INFO:             GTCLabelEncode : 
[2024/12/31 16:45:50] openrec INFO:                 character_dict_path : ./tools/utils/ppocr_keys_v1.txt
[2024/12/31 16:45:50] openrec INFO:                 gtc_label_encode : 
[2024/12/31 16:45:50] openrec INFO:                     name : SMTRLabelEncode
[2024/12/31 16:45:50] openrec INFO:                     sub_str_len : 5
[2024/12/31 16:45:50] openrec INFO:                 max_text_length : 25
[2024/12/31 16:45:50] openrec INFO:                 use_space_char : False
[2024/12/31 16:45:50] openrec INFO:             KeepKeys : 
[2024/12/31 16:45:50] openrec INFO:                 keep_keys : ['image', 'label', 'label_subs', 'label_next', 'length_subs', 'label_subs_pre', 'label_next_pre', 'length_subs_pre', 'length', 'ctc_label', 'ctc_length']
[2024/12/31 16:45:50] openrec INFO:     loader : 
[2024/12/31 16:45:50] openrec INFO:         batch_size_per_card : 256
[2024/12/31 16:45:50] openrec INFO:         drop_last : True
[2024/12/31 16:45:50] openrec INFO:         max_ratio : 8
[2024/12/31 16:45:50] openrec INFO:         num_workers : 4
[2024/12/31 16:45:50] openrec INFO:         shuffle : True
[2024/12/31 16:45:50] openrec INFO:     sampler : 
[2024/12/31 16:45:50] openrec INFO:         divided_factor : [4, 16]
[2024/12/31 16:45:50] openrec INFO:         first_bs : 256
[2024/12/31 16:45:50] openrec INFO:         fix_bs : False
[2024/12/31 16:45:50] openrec INFO:         is_training : True
[2024/12/31 16:45:50] openrec INFO:         name : RatioSampler
[2024/12/31 16:45:50] openrec INFO:         scales : [[128, 32]]
[2024/12/31 16:45:50] openrec INFO: config : runs/train0/config.yml
[2024/12/31 16:45:50] openrec INFO: eval : True
[2024/12/31 16:45:50] openrec INFO: filename : config
[2024/12/31 16:45:50] openrec INFO: local_rank : 0
[2024/12/31 16:45:50] openrec INFO: ---------------------------------------------
[2024-12-31 16:45:52,585] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -7) local_rank: 0 (pid: 467397) of binary: /home/server/anaconda3/bin/python
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/server/anaconda3/lib/python3.11/site-packages/torch/distributed/launch.py", line 196, in <module>
    main()
  File "/home/server/anaconda3/lib/python3.11/site-packages/torch/distributed/launch.py", line 192, in main
    launch(args)
  File "/home/server/anaconda3/lib/python3.11/site-packages/torch/distributed/launch.py", line 177, in launch
    run(args)
  File "/home/server/anaconda3/lib/python3.11/site-packages/torch/distributed/run.py", line 797, in run
    elastic_launch(
  File "/home/server/anaconda3/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/server/anaconda3/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
======================================================
tools/train_rec.py FAILED
------------------------------------------------------
Failures:
[1]:
  time      : 2024-12-31_16:45:52
  host      : server
  rank      : 1 (local_rank: 1)
  exitcode  : -7 (pid: 467398)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 467398
------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-12-31_16:45:52
  host      : server
  rank      : 0 (local_rank: 0)
  exitcode  : -7 (pid: 467397)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 467397
======================================================

@Topdu
Copy link
Owner

Topdu commented Jan 2, 2025

Are there more error logs When running training commands, the information provided so far is not able to determine the error.

@super-tian
Copy link
Author

Are there more error logs When running training commands, the information provided so far is not able to determine the error.

The above are all the running logs, but I checked the issue records of Torch and found that it may be a version issue. I solved the problem by replacing the Torch version,now,my torch version is 2.0.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants