-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect Evaluation Script (of LaSOT) #14
Comments
Very good question! Sorry that we did not give enough instructions. I just uploaded the script for merging clips and updated the readme for better instructions. Please merge short clips into one video before you check everything. Also, make sure to get rid of any overlapping frames (the first frame of each clip except the first clip), so that you can evaluate Elysium in a way consistent with other VOT trackers. To achieve this, just run eval/merge_result.py following the instructions. Also, you can find the merged result here. You are expected to get something like:
If you're still having trouble reproducing the same result, feel free to ask me for more specific instructions, so that I can update the guidance in readme. |
Thank you for the merge file. However I will like to raise another issue I found in the In Lines 183 to 184 in d591904
the tlbr_to_tlwh function isLines 131 to 138 in d591904
However I think the original tlbr had already gone through a scaling process in the earlier lines Lines 173 to 176 in d591904
therefore making the .clamp(1,100) a bit absurd as it should not be clamp to (1, 100) as it is now in pixel representation.
If remove the
from running the otb.py with the merged json. |
Just want to confirm my understanding is right.
In the evaluation of SOT performance of Elysium from
otb.py
:Elysium/eval/otb.py
Lines 172 to 198 in 5e6d14e
the final metric are average across a total of 98036 sequences (8-frame each). The result I got is matching (and higher) to the paper's reported number.
To me, the SUC and Precision on the LaSOT evaluating this way may be highly influenced by different # of frames of the sequences in the LaSOT testing set (longer sequences will be more dominant in such evaluation).
Will this be a fair comparison against other VOT trackers as they are evaluating on the entire sequence (then average across 280 sequences)?
The text was updated successfully, but these errors were encountered: