Incorrect Evaluation Script (of LaSOT) #14

yangchris11 · 2024-10-24T22:41:57Z

Just want to confirm my understanding is right.

In the evaluation of SOT performance of Elysium from otb.py:

Lines 172 to 198 in 5e6d14e

    
           for seq_id, item in enumerate(value): 
        
               w, h = item["image_size"] 
        
               scale_tenosr = torch.tensor([w, h, w, h]) / 100 
        
               pred_bb = torch.tensor(parse_box_from_raw_text(item["predict"])) * scale_tenosr 
        
               anno_bb = torch.tensor(parse_box_from_raw_text(item["gt"])) * scale_tenosr 
        
               # pred_bb = torch.tensor(parse_box_from_raw_text(item["predict"])) 
        
               # anno_bb = torch.tensor(parse_box_from_raw_text(item["gt"])) 
        
               if len(pred_bb) < 1: 
        
                   continue 
        
               if len(pred_bb[0]) < 4: 
        
                   continue 
        
               err_overlap, err_center, err_center_normalized, valid_frame = calc_seq_err_robust( 
        
                   tlbr_to_tlwh(pred_bb), tlbr_to_tlwh(anno_bb), "ours", target_visible=None) 
        
               print(err_overlap, err_center, err_center_normalized, valid_frame) 
        
               avg_overlap_all[seq_id, trk_id] = err_overlap[valid_frame].mean() 
        
               if exclude_invalid_frames: 
        
                   seq_length = valid_frame.long().sum() 
        
               else: 
        
                   seq_length = anno_bb.shape[0] 
        
               if seq_length <= 0: 
        
                   raise Exception('Seq length zero') 
        
               ave_success_rate_plot_overlap[seq_id, trk_id, :] = (err_overlap.view(-1, 1) > threshold_set_overlap.view(1, -1)).sum(0).float() / seq_length 
        
               ave_success_rate_plot_center[seq_id, trk_id, :] = (err_center.view(-1, 1) <= threshold_set_center.view(1, -1)).sum(0).float() / seq_length 
        
               ave_success_rate_plot_center_norm[seq_id, trk_id, :] = (err_center_normalized.view(-1, 1) <= threshold_set_center_norm.view(1, -1)).sum(0).float() / seq_length 
        
               valid_sequence.append(seq_id)

the final metric are average across a total of 98036 sequences (8-frame each). The result I got is matching (and higher) to the paper's reported number.

auc:  tensor([57.9670])
prec_score:  tensor([62.4371])
norm_prec_score:  tensor([53.4776])

To me, the SUC and Precision on the LaSOT evaluating this way may be highly influenced by different # of frames of the sequences in the LaSOT testing set (longer sequences will be more dominant in such evaluation).

Will this be a fair comparison against other VOT trackers as they are evaluating on the entire sequence (then average across 280 sequences)?

The text was updated successfully, but these errors were encountered:

Hon-Wong · 2024-10-25T07:16:31Z

Very good question!

Sorry that we did not give enough instructions. I just uploaded the script for merging clips and updated the readme for better instructions.

Please merge short clips into one video before you check everything. Also, make sure to get rid of any overlapping frames (the first frame of each clip except the first clip), so that you can evaluate Elysium in a way consistent with other VOT trackers. To achieve this, just run eval/merge_result.py following the instructions. Also, you can find the merged result here.

You are expected to get something like:

auc:  tensor([58.7632])
prec_score:  tensor([64.0076])
norm_prec_score:  tensor([54.4493])

If you're still having trouble reproducing the same result, feel free to ask me for more specific instructions, so that I can update the guidance in readme.

yangchris11 · 2024-10-26T06:55:00Z

Thank you for the merge file.

However I will like to raise another issue I found in the otb.py. I may be missing something, please kindly correct me if I am wrong.

In

Elysium/eval/otb.py

Lines 183 to 184 in d591904

    
           err_overlap, err_center, err_center_normalized, valid_frame = calc_seq_err_robust( 
        
               tlbr_to_tlwh(pred_bb), tlbr_to_tlwh(anno_bb), "ours", target_visible=None)

the tlbr_to_tlwh function is

Elysium/eval/otb.py

Lines 131 to 138 in d591904

    
           def tlbr_to_tlwh(tlbr): 
        
               # 计算边界框的宽度和高度 
        
               width = tlbr[:, 2] - tlbr[:, 0] 
        
               height = tlbr[:, 3] - tlbr[:, 1] 
        
               # 转换为TLWH表示 
        
               tlwh = torch.stack([tlbr[:, 0], tlbr[:, 1], width, height], dim=1).clamp(1, 100) 
        
               return tlwh

However I think the original tlbr had already gone through a scaling process in the earlier lines

Elysium/eval/otb.py

Lines 173 to 176 in d591904

    
           w, h = item["image_size"] 
        
           scale_tenosr = torch.tensor([w, h, w, h]) / 100 
        
           pred_bb = torch.tensor(parse_box_from_raw_text(item["predict"])) * scale_tenosr 
        
           anno_bb = torch.tensor(parse_box_from_raw_text(item["gt"])) * scale_tenosr

therefore making the .clamp(1,100) a bit absurd as it should not be clamp to (1, 100) as it is now in pixel representation.

If remove the .clamp(1,100)clipping, I got

auc:  tensor([30.8999])
prec_score:  tensor([23.0065])
norm_prec_score:  tensor([27.7496])

from running the otb.py with the merged json.

yangchris11 closed this as completed Oct 26, 2024

yangchris11 reopened this Oct 26, 2024

yangchris11 changed the title ~~Evaluation of LaSOT~~ Incorrect Evaluation Script (of LaSOT) Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect Evaluation Script (of LaSOT) #14

Incorrect Evaluation Script (of LaSOT) #14

yangchris11 commented Oct 24, 2024

Hon-Wong commented Oct 25, 2024 •

edited

Loading

yangchris11 commented Oct 26, 2024 •

edited

Loading

Incorrect Evaluation Script (of LaSOT) #14

Incorrect Evaluation Script (of LaSOT) #14

Comments

yangchris11 commented Oct 24, 2024

Hon-Wong commented Oct 25, 2024 • edited Loading

yangchris11 commented Oct 26, 2024 • edited Loading

Hon-Wong commented Oct 25, 2024 •

edited

Loading

yangchris11 commented Oct 26, 2024 •

edited

Loading