Fine-tune on TVQA dataset #2

Curry-AI · 2021-06-19T03:41:52Z

Thank you very much for your work. May I ask if you can release the code for fine-tune on tvqa dataset

Curry-AI · 2021-06-25T13:09:19Z

I have some details about data processing that are not very clear. If you can help me, I would be very grateful

1.It is mentioned in the paper that in TVQA, each video sample 6 frames evenly. What is the text content of each frame? If it is dialogue text, how to select the corresponding dialogue text for each picture? If not, what is the content of the text?

2.Question and answer constitute five groups of hypotheses, and then through MLP, take the hypotheses CLS_TOKEN respectively, and then concat them with the image CLS_TOKEN? Or by some other way

I really hope to get confirmation of these details. Thank you very much

GloriaXimingLu · 2021-07-14T23:59:33Z

The text part is dialogue text (subtitle)
For each [images, context_i, question_i, answer_i], we feed into the model and MLP, and takes max over the N logits. Basically, we copied images part N time to concatenate them with N candidates separately.

Let us know if you have further questions!

Lee-Ft · 2021-07-21T08:00:42Z

This is an awesome work! Do you have the plan to release the pre-train model on TVQA+ and TVQA?

simon-ging · 2021-08-19T19:58:36Z

I also have some questions about TVQA finetuning as I am trying to reproduce your results.

Do you use the ground-truth timestamps of the question to select frames from the video, provided from the TVQA dataset?
How do you select the subtitles exactly? Subtitles are pretty long (like 260 tokens on average) so I can't fit them all into the input sequence.

It would be very helpful if you could give more detail on how the input to the model looks like for TVQA. Thanks!

GloriaXimingLu · 2021-08-30T12:38:33Z

Yes, we extract the frames corresponding to ground-truth timestamps.
We use all subtitles, and cut it if it's longer than 732 tokens.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tune on TVQA dataset #2

Fine-tune on TVQA dataset #2

Curry-AI commented Jun 19, 2021

Curry-AI commented Jun 25, 2021

GloriaXimingLu commented Jul 14, 2021 •

edited

Loading

Lee-Ft commented Jul 21, 2021

simon-ging commented Aug 19, 2021

GloriaXimingLu commented Aug 30, 2021

Fine-tune on TVQA dataset #2

Fine-tune on TVQA dataset #2

Comments

Curry-AI commented Jun 19, 2021

Curry-AI commented Jun 25, 2021

GloriaXimingLu commented Jul 14, 2021 • edited Loading

Lee-Ft commented Jul 21, 2021

simon-ging commented Aug 19, 2021

GloriaXimingLu commented Aug 30, 2021

GloriaXimingLu commented Jul 14, 2021 •

edited

Loading