Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Evaluation Script (of LaSOT) #14

Open
yangchris11 opened this issue Oct 24, 2024 · 2 comments
Open

Incorrect Evaluation Script (of LaSOT) #14

yangchris11 opened this issue Oct 24, 2024 · 2 comments

Comments

@yangchris11
Copy link

Just want to confirm my understanding is right.

In the evaluation of SOT performance of Elysium from otb.py:

Elysium/eval/otb.py

Lines 172 to 198 in 5e6d14e

for seq_id, item in enumerate(value):
w, h = item["image_size"]
scale_tenosr = torch.tensor([w, h, w, h]) / 100
pred_bb = torch.tensor(parse_box_from_raw_text(item["predict"])) * scale_tenosr
anno_bb = torch.tensor(parse_box_from_raw_text(item["gt"])) * scale_tenosr
# pred_bb = torch.tensor(parse_box_from_raw_text(item["predict"]))
# anno_bb = torch.tensor(parse_box_from_raw_text(item["gt"]))
if len(pred_bb) < 1:
continue
if len(pred_bb[0]) < 4:
continue
err_overlap, err_center, err_center_normalized, valid_frame = calc_seq_err_robust(
tlbr_to_tlwh(pred_bb), tlbr_to_tlwh(anno_bb), "ours", target_visible=None)
print(err_overlap, err_center, err_center_normalized, valid_frame)
avg_overlap_all[seq_id, trk_id] = err_overlap[valid_frame].mean()
if exclude_invalid_frames:
seq_length = valid_frame.long().sum()
else:
seq_length = anno_bb.shape[0]
if seq_length <= 0:
raise Exception('Seq length zero')
ave_success_rate_plot_overlap[seq_id, trk_id, :] = (err_overlap.view(-1, 1) > threshold_set_overlap.view(1, -1)).sum(0).float() / seq_length
ave_success_rate_plot_center[seq_id, trk_id, :] = (err_center.view(-1, 1) <= threshold_set_center.view(1, -1)).sum(0).float() / seq_length
ave_success_rate_plot_center_norm[seq_id, trk_id, :] = (err_center_normalized.view(-1, 1) <= threshold_set_center_norm.view(1, -1)).sum(0).float() / seq_length
valid_sequence.append(seq_id)

the final metric are average across a total of 98036 sequences (8-frame each). The result I got is matching (and higher) to the paper's reported number.

auc:  tensor([57.9670])
prec_score:  tensor([62.4371])
norm_prec_score:  tensor([53.4776])
Screenshot 2024-10-24 at 15 25 06

To me, the SUC and Precision on the LaSOT evaluating this way may be highly influenced by different # of frames of the sequences in the LaSOT testing set (longer sequences will be more dominant in such evaluation).

Will this be a fair comparison against other VOT trackers as they are evaluating on the entire sequence (then average across 280 sequences)?

@Hon-Wong
Copy link
Owner

Hon-Wong commented Oct 25, 2024

Very good question!

Sorry that we did not give enough instructions. I just uploaded the script for merging clips and updated the readme for better instructions.

Please merge short clips into one video before you check everything. Also, make sure to get rid of any overlapping frames (the first frame of each clip except the first clip), so that you can evaluate Elysium in a way consistent with other VOT trackers. To achieve this, just run eval/merge_result.py following the instructions. Also, you can find the merged result here.

You are expected to get something like:

auc:  tensor([58.7632])
prec_score:  tensor([64.0076])
norm_prec_score:  tensor([54.4493])

If you're still having trouble reproducing the same result, feel free to ask me for more specific instructions, so that I can update the guidance in readme.

@yangchris11
Copy link
Author

yangchris11 commented Oct 26, 2024

Thank you for the merge file.

However I will like to raise another issue I found in the otb.py. I may be missing something, please kindly correct me if I am wrong.

In

Elysium/eval/otb.py

Lines 183 to 184 in d591904

err_overlap, err_center, err_center_normalized, valid_frame = calc_seq_err_robust(
tlbr_to_tlwh(pred_bb), tlbr_to_tlwh(anno_bb), "ours", target_visible=None)

the tlbr_to_tlwh function is

Elysium/eval/otb.py

Lines 131 to 138 in d591904

def tlbr_to_tlwh(tlbr):
# 计算边界框的宽度和高度
width = tlbr[:, 2] - tlbr[:, 0]
height = tlbr[:, 3] - tlbr[:, 1]
# 转换为TLWH表示
tlwh = torch.stack([tlbr[:, 0], tlbr[:, 1], width, height], dim=1).clamp(1, 100)
return tlwh

However I think the original tlbr had already gone through a scaling process in the earlier lines

Elysium/eval/otb.py

Lines 173 to 176 in d591904

w, h = item["image_size"]
scale_tenosr = torch.tensor([w, h, w, h]) / 100
pred_bb = torch.tensor(parse_box_from_raw_text(item["predict"])) * scale_tenosr
anno_bb = torch.tensor(parse_box_from_raw_text(item["gt"])) * scale_tenosr

therefore making the .clamp(1,100) a bit absurd as it should not be clamp to (1, 100) as it is now in pixel representation.

If remove the .clamp(1,100)clipping, I got

auc:  tensor([30.8999])
prec_score:  tensor([23.0065])
norm_prec_score:  tensor([27.7496])

from running the otb.py with the merged json.

@yangchris11 yangchris11 reopened this Oct 26, 2024
@yangchris11 yangchris11 changed the title Evaluation of LaSOT Incorrect Evaluation Script (of LaSOT) Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants