Developing a text generating transformer based on the transformer architecture in the paper "Attention is all you need" by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Also taking reference from Andrej Karpathy's repository.
Dataset: "Friends" Serial transcript.
Here, the text is tokenised according to characters. Considering lowercase, uppercase and special characters, we have just under a 100 tokens.
Now, tokenisation is performed word-wise. now the entire transcript has 1154596 words. Since we are making each unique word a token, we have 27768 tokens.
[Cut to later Joey and Phoebe are pissed on Chandler
Ross Im sorry it's youre a friend if I that crying is what I got now in the world where I win [Scene Chandlers Office Building and Phoebe are serving for Rachel while Rachel is preparing and safety to come up by Rachel right in She puts her so he starts on his face and pushes Chandler away her off then tries to check her the tape in the duck Rachel Okay Phoebe where do I do There her from house race it comes for this whole building There she could be too Chandler coming down Oh my God are you gonna be sad Monica Because right Rachel You bet it is because youre in such good or something Monica noticing him Thanks I planned check it up Joey laughs Give me a handyman Joey Hey what is it here Chandler Hey okay To Ross Dr Geoffrey Gunther Ahh here someones more