diff --git a/Wine Reviews Classification/Dataset/README.md b/Wine Reviews Classification/Dataset/README.md new file mode 100644 index 000000000..27be8d30b --- /dev/null +++ b/Wine Reviews Classification/Dataset/README.md @@ -0,0 +1,101 @@ +# Wine Reviews Classification using DL + +## PROJECT TITLE + +Wine Reviews Classification using Deep Learning + +## GOAL + +To classify the quality of wine based on reviews. + +## DATASET + +The link for the dataset used in this project: https://www.kaggle.com/datasets/zynicide/wine-reviews + +## EDA +Shape of Dataset:(129971, 14) +![Dataset](Images/Input_Dataset.png) +![EDA](Images/EDA3.png) + +## DESCRIPTION + +This project aims to identify the quality points of wine based upon its reviews. + +## WHAT I HAD DONE + +1. Data collection: From the link of the dataset given above. +2. Data preprocessing: Preprocessed the news by combining title and text to create a new feature and did some augementation like tokeinizing and vectorising before passing them to model training +3. Model selection: Self Designed model having a Embedding Layer followed by Global Pooling Layer and then 2 Dense layers and then output layer.Second model had a Embedding layer followed by a RNN layer and a Dense output layer. +4. Comparative analysis: Compared the accuracy score of all the models. + +## MODELS SUMMARY + +Model: "sequential" +_________________________________________________________________ + Layer (type) Output Shape Param # +================================================================= + embedding (Embedding) (None, 89, 200) 12794200 + + global_average_pooling1d ( (None, 200) 0 + GlobalAveragePooling1D) + + dense (Dense) (None, 100) 20100 + + dense_1 (Dense) (None, 50) 5050 + + dense_2 (Dense) (None, 21) 1071 + +================================================================= +Total params: 12820421 (48.91 MB) +Trainable params: 12820421 (48.91 MB) +Non-trainable params: 0 (0.00 Byte) + +Model-2: "sequential_1" +_________________________________________________________________ + Layer (type) Output Shape Param # +================================================================= + embedding_1 (Embedding) (None, 89, 100) 6397100 + + simple_rnn (SimpleRNN) (None, 30) 3930 + + dense_3 (Dense) (None, 21) 651 + +================================================================= +Total params: 6401681 (24.42 MB) +Trainable params: 6401681 (24.42 MB) +Non-trainable params: 0 (0.00 Byte) + +## LIBRARIES NEEDED + +The following libraries are required to run this project: + +- nltk +- pandas +- matplotlib +- tensorflow +- keras +- sklearn + +## EVALUATION METRICS + +The evaluation metrics I used to assess the models: + +- Loss +- Accuracy + +It is shown using Confusion Matrix in the Images folder + +## RESULTS +Results on Val dataset: +For Model-1: +Accuracy:31% +loss: 3.1 + +For Model-2: +Accuracy:9% +loss:8.05 + +## CONCLUSION +Based on results we can draw following conclusions: + +1.The model-1 performed better than model 2. \ No newline at end of file diff --git a/Wine Reviews Classification/Images/Accuracy_Model1.png b/Wine Reviews Classification/Images/Accuracy_Model1.png new file mode 100644 index 000000000..2fa79a13a Binary files /dev/null and b/Wine Reviews Classification/Images/Accuracy_Model1.png differ diff --git a/Wine Reviews Classification/Images/Accuracy_Model2.png b/Wine Reviews Classification/Images/Accuracy_Model2.png new file mode 100644 index 000000000..50496e6e1 Binary files /dev/null and b/Wine Reviews Classification/Images/Accuracy_Model2.png differ diff --git a/Wine Reviews Classification/Images/EDA1.png b/Wine Reviews Classification/Images/EDA1.png new file mode 100644 index 000000000..753254cdc Binary files /dev/null and b/Wine Reviews Classification/Images/EDA1.png differ diff --git a/Wine Reviews Classification/Images/EDA2.png b/Wine Reviews Classification/Images/EDA2.png new file mode 100644 index 000000000..e4888cd01 Binary files /dev/null and b/Wine Reviews Classification/Images/EDA2.png differ diff --git a/Wine Reviews Classification/Images/EDA3.png b/Wine Reviews Classification/Images/EDA3.png new file mode 100644 index 000000000..422ee5db7 Binary files /dev/null and b/Wine Reviews Classification/Images/EDA3.png differ diff --git a/Wine Reviews Classification/Images/Input_Dataset.png b/Wine Reviews Classification/Images/Input_Dataset.png new file mode 100644 index 000000000..ddf45a08b Binary files /dev/null and b/Wine Reviews Classification/Images/Input_Dataset.png differ diff --git a/Wine Reviews Classification/Images/Model1.png b/Wine Reviews Classification/Images/Model1.png new file mode 100644 index 000000000..5cfbd5ad7 Binary files /dev/null and b/Wine Reviews Classification/Images/Model1.png differ diff --git a/Wine Reviews Classification/Images/Model2.png b/Wine Reviews Classification/Images/Model2.png new file mode 100644 index 000000000..b5ec5c4ad Binary files /dev/null and b/Wine Reviews Classification/Images/Model2.png differ diff --git a/Wine Reviews Classification/Model/PridictionModel.ipynb b/Wine Reviews Classification/Model/PridictionModel.ipynb new file mode 100644 index 000000000..1538ecea1 --- /dev/null +++ b/Wine Reviews Classification/Model/PridictionModel.ipynb @@ -0,0 +1,6190 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:29.636394Z", + "iopub.status.busy": "2021-05-25T06:50:29.636041Z", + "iopub.status.idle": "2021-05-25T06:50:29.643277Z", + "shell.execute_reply": "2021-05-25T06:50:29.642127Z", + "shell.execute_reply.started": "2021-05-25T06:50:29.636365Z" + } + }, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import sklearn\n", + "import itertools\n", + "import numpy as np\n", + "import seaborn as sb\n", + "import re\n", + "import nltk\n", + "import pickle\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.feature_extraction.text import TfidfVectorizer\n", + "from sklearn.metrics import accuracy_score\n", + "from sklearn.metrics import confusion_matrix\n", + "from matplotlib import pyplot as plt\n", + "from sklearn.linear_model import PassiveAggressiveClassifier,LogisticRegression\n", + "from nltk.stem import WordNetLemmatizer\n", + "from nltk.corpus import stopwords" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:29.656569Z", + "iopub.status.busy": "2021-05-25T06:50:29.656203Z", + "iopub.status.idle": "2021-05-25T06:50:32.048864Z", + "shell.execute_reply": "2021-05-25T06:50:32.047882Z", + "shell.execute_reply.started": "2021-05-25T06:50:29.65654Z" + } + }, + "outputs": [], + "source": [ + "train_df = pd.read_csv('train.csv')" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.05136Z", + "iopub.status.busy": "2021-05-25T06:50:32.051032Z", + "iopub.status.idle": "2021-05-25T06:50:32.089516Z", + "shell.execute_reply": "2021-05-25T06:50:32.088399Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.051329Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
00ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco87NaNSicily & SardiniaEtnaNaNKerin O’Keefe@kerinokeefeNicosia 2013 Vulkà Bianco (Etna)White BlendNicosia
11PortugalThis is ripe and fruity, a wine that is smooth...Avidagos8715.0DouroNaNNaNRoger Voss@vossrogerQuinta dos Avidagos 2011 Avidagos Red (Douro)Portuguese RedQuinta dos Avidagos
22USTart and snappy, the flavors of lime flesh and...NaN8714.0OregonWillamette ValleyWillamette ValleyPaul Gregutt@paulgwineRainstorm 2013 Pinot Gris (Willamette Valley)Pinot GrisRainstorm
33USPineapple rind, lemon pith and orange blossom ...Reserve Late Harvest8713.0MichiganLake Michigan ShoreNaNAlexander PeartreeNaNSt. Julian 2013 Reserve Late Harvest Riesling ...RieslingSt. Julian
44USMuch like the regular bottling from 2012, this...Vintner's Reserve Wild Child Block8765.0OregonWillamette ValleyWillamette ValleyPaul Gregutt@paulgwineSweet Cheeks 2012 Vintner's Reserve Wild Child...Pinot NoirSweet Cheeks
55SpainBlackberry and raspberry aromas show a typical...Ars In Vitro8715.0Northern SpainNavarraNaNMichael Schachner@wineschachTandem 2011 Ars In Vitro Tempranillo-Merlot (N...Tempranillo-MerlotTandem
66ItalyHere's a bright, informal red that opens with ...Belsito8716.0Sicily & SardiniaVittoriaNaNKerin O’Keefe@kerinokeefeTerre di Giurfo 2013 Belsito Frappato (Vittoria)FrappatoTerre di Giurfo
77FranceThis dry and restrained wine offers spice in p...NaN8724.0AlsaceAlsaceNaNRoger Voss@vossrogerTrimbach 2012 Gewurztraminer (Alsace)GewürztraminerTrimbach
88GermanySavory dried thyme notes accent sunnier flavor...Shine8712.0RheinhessenNaNNaNAnna Lee C. IijimaNaNHeinz Eifel 2013 Shine Gewürztraminer (Rheinhe...GewürztraminerHeinz Eifel
99FranceThis has great depth of flavor with its fresh ...Les Natures8727.0AlsaceAlsaceNaNRoger Voss@vossrogerJean-Baptiste Adam 2012 Les Natures Pinot Gris...Pinot GrisJean-Baptiste Adam
1010USSoft, supple plum envelopes an oaky structure ...Mountain Cuvée8719.0CaliforniaNapa ValleyNapaVirginie Boone@vbooneKirkland Signature 2011 Mountain Cuvée Caberne...Cabernet SauvignonKirkland Signature
1111FranceThis is a dry wine, very spicy, with a tight, ...NaN8730.0AlsaceAlsaceNaNRoger Voss@vossrogerLeon Beyer 2012 Gewurztraminer (Alsace)GewürztraminerLeon Beyer
1212USSlightly reduced, this wine offers a chalky, t...NaN8734.0CaliforniaAlexander ValleySonomaVirginie Boone@vbooneLouis M. Martini 2012 Cabernet Sauvignon (Alex...Cabernet SauvignonLouis M. Martini
1313ItalyThis is dominated by oak and oak-driven aromas...Rosso87NaNSicily & SardiniaEtnaNaNKerin O’Keefe@kerinokeefeMasseria Setteporte 2012 Rosso (Etna)Nerello MascaleseMasseria Setteporte
1414USBuilding on 150 years and six generations of w...NaN8712.0CaliforniaCentral CoastCentral CoastMatt Kettmann@mattkettmannMirassou 2012 Chardonnay (Central Coast)ChardonnayMirassou
\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 country description \\\n", + "0 0 Italy Aromas include tropical fruit, broom, brimston... \n", + "1 1 Portugal This is ripe and fruity, a wine that is smooth... \n", + "2 2 US Tart and snappy, the flavors of lime flesh and... \n", + "3 3 US Pineapple rind, lemon pith and orange blossom ... \n", + "4 4 US Much like the regular bottling from 2012, this... \n", + "5 5 Spain Blackberry and raspberry aromas show a typical... \n", + "6 6 Italy Here's a bright, informal red that opens with ... \n", + "7 7 France This dry and restrained wine offers spice in p... \n", + "8 8 Germany Savory dried thyme notes accent sunnier flavor... \n", + "9 9 France This has great depth of flavor with its fresh ... \n", + "10 10 US Soft, supple plum envelopes an oaky structure ... \n", + "11 11 France This is a dry wine, very spicy, with a tight, ... \n", + "12 12 US Slightly reduced, this wine offers a chalky, t... \n", + "13 13 Italy This is dominated by oak and oak-driven aromas... \n", + "14 14 US Building on 150 years and six generations of w... \n", + "\n", + " designation points price province \\\n", + "0 Vulkà Bianco 87 NaN Sicily & Sardinia \n", + "1 Avidagos 87 15.0 Douro \n", + "2 NaN 87 14.0 Oregon \n", + "3 Reserve Late Harvest 87 13.0 Michigan \n", + "4 Vintner's Reserve Wild Child Block 87 65.0 Oregon \n", + "5 Ars In Vitro 87 15.0 Northern Spain \n", + "6 Belsito 87 16.0 Sicily & Sardinia \n", + "7 NaN 87 24.0 Alsace \n", + "8 Shine 87 12.0 Rheinhessen \n", + "9 Les Natures 87 27.0 Alsace \n", + "10 Mountain Cuvée 87 19.0 California \n", + "11 NaN 87 30.0 Alsace \n", + "12 NaN 87 34.0 California \n", + "13 Rosso 87 NaN Sicily & Sardinia \n", + "14 NaN 87 12.0 California \n", + "\n", + " region_1 region_2 taster_name \\\n", + "0 Etna NaN Kerin O’Keefe \n", + "1 NaN NaN Roger Voss \n", + "2 Willamette Valley Willamette Valley Paul Gregutt \n", + "3 Lake Michigan Shore NaN Alexander Peartree \n", + "4 Willamette Valley Willamette Valley Paul Gregutt \n", + "5 Navarra NaN Michael Schachner \n", + "6 Vittoria NaN Kerin O’Keefe \n", + "7 Alsace NaN Roger Voss \n", + "8 NaN NaN Anna Lee C. Iijima \n", + "9 Alsace NaN Roger Voss \n", + "10 Napa Valley Napa Virginie Boone \n", + "11 Alsace NaN Roger Voss \n", + "12 Alexander Valley Sonoma Virginie Boone \n", + "13 Etna NaN Kerin O’Keefe \n", + "14 Central Coast Central Coast Matt Kettmann \n", + "\n", + " taster_twitter_handle title \\\n", + "0 @kerinokeefe Nicosia 2013 Vulkà Bianco (Etna) \n", + "1 @vossroger Quinta dos Avidagos 2011 Avidagos Red (Douro) \n", + "2 @paulgwine  Rainstorm 2013 Pinot Gris (Willamette Valley) \n", + "3 NaN St. Julian 2013 Reserve Late Harvest Riesling ... \n", + "4 @paulgwine  Sweet Cheeks 2012 Vintner's Reserve Wild Child... \n", + "5 @wineschach Tandem 2011 Ars In Vitro Tempranillo-Merlot (N... \n", + "6 @kerinokeefe Terre di Giurfo 2013 Belsito Frappato (Vittoria) \n", + "7 @vossroger Trimbach 2012 Gewurztraminer (Alsace) \n", + "8 NaN Heinz Eifel 2013 Shine Gewürztraminer (Rheinhe... \n", + "9 @vossroger Jean-Baptiste Adam 2012 Les Natures Pinot Gris... \n", + "10 @vboone Kirkland Signature 2011 Mountain Cuvée Caberne... \n", + "11 @vossroger Leon Beyer 2012 Gewurztraminer (Alsace) \n", + "12 @vboone Louis M. Martini 2012 Cabernet Sauvignon (Alex... \n", + "13 @kerinokeefe Masseria Setteporte 2012 Rosso (Etna) \n", + "14 @mattkettmann Mirassou 2012 Chardonnay (Central Coast) \n", + "\n", + " variety winery \n", + "0 White Blend Nicosia \n", + "1 Portuguese Red Quinta dos Avidagos \n", + "2 Pinot Gris Rainstorm \n", + "3 Riesling St. Julian \n", + "4 Pinot Noir Sweet Cheeks \n", + "5 Tempranillo-Merlot Tandem \n", + "6 Frappato Terre di Giurfo \n", + "7 Gewürztraminer Trimbach \n", + "8 Gewürztraminer Heinz Eifel \n", + "9 Pinot Gris Jean-Baptiste Adam \n", + "10 Cabernet Sauvignon Kirkland Signature \n", + "11 Gewürztraminer Leon Beyer \n", + "12 Cabernet Sauvignon Louis M. Martini \n", + "13 Nerello Mascalese Masseria Setteporte \n", + "14 Chardonnay Mirassou " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df.head(15)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.10674Z", + "iopub.status.busy": "2021-05-25T06:50:32.106434Z", + "iopub.status.idle": "2021-05-25T06:50:32.120541Z", + "shell.execute_reply": "2021-05-25T06:50:32.119386Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.106712Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(129971, 14)" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.124489Z", + "iopub.status.busy": "2021-05-25T06:50:32.12414Z", + "iopub.status.idle": "2021-05-25T06:50:32.140229Z", + "shell.execute_reply": "2021-05-25T06:50:32.139288Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.124461Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 129971 entries, 0 to 129970\n", + "Data columns (total 14 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 Unnamed: 0 129971 non-null int64 \n", + " 1 country 129908 non-null object \n", + " 2 description 129971 non-null object \n", + " 3 designation 92506 non-null object \n", + " 4 points 129971 non-null int64 \n", + " 5 price 120975 non-null float64\n", + " 6 province 129908 non-null object \n", + " 7 region_1 108724 non-null object \n", + " 8 region_2 50511 non-null object \n", + " 9 taster_name 103727 non-null object \n", + " 10 taster_twitter_handle 98758 non-null object \n", + " 11 title 129971 non-null object \n", + " 12 variety 129970 non-null object \n", + " 13 winery 129971 non-null object \n", + "dtypes: float64(1), int64(2), object(11)\n", + "memory usage: 13.9+ MB\n" + ] + } + ], + "source": [ + "train_df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Quality No of sample\n" + ] + }, + { + "data": { + "text/plain": [ + "88 17207\n", + "87 16933\n", + "90 15410\n", + "86 12600\n", + "89 12226\n", + "91 11359\n", + "92 9613\n", + "85 9530\n", + "93 6489\n", + "84 6480\n", + "94 3758\n", + "83 3025\n", + "82 1836\n", + "95 1535\n", + "81 692\n", + "96 523\n", + "80 397\n", + "97 229\n", + "98 77\n", + "99 33\n", + "100 19\n", + "Name: points, dtype: int64" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# def create_distribution(dataFile):\n", + "# return sb.countplot(x='label', data=dataFile, palette='hls')\n", + "\n", + "# #by calling below we can see that training, test and valid data seems to be failry evenly distributed between the classes\n", + "# create_distribution(train_df)\n", + "print(\"Quality\",end=' ')\n", + "print(\"No of sample\")\n", + "train_df['points'].value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "train_df=train_df.drop(['region_2'],axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.306146Z", + "iopub.status.busy": "2021-05-25T06:50:32.305826Z", + "iopub.status.idle": "2021-05-25T06:50:32.335357Z", + "shell.execute_reply": "2021-05-25T06:50:32.33417Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.306118Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[1mCOLUMN\u001b[0m \u001b[1mNULL VALUES COUNT\u001b[0m\n", + "Unnamed: 0 0\n", + "country 63\n", + "description 0\n", + "designation 37465\n", + "points 0\n", + "price 8996\n", + "province 63\n", + "region_1 21247\n", + "taster_name 26244\n", + "taster_twitter_handle 31213\n", + "title 0\n", + "variety 1\n", + "winery 0\n" + ] + } + ], + "source": [ + "def data_qualityCheck():\n", + " print(\"{:{}}\".format(\"\\033[1mCOLUMN\\033[0m\",38),end='')\n", + " print(\"{:{}}\".format(\"\\033[1mNULL VALUES COUNT\\033[0m\",18))\n", + " for x in train_df.columns:\n", + " print(\"{:{}}\".format(x,34),end='')\n", + " print(train_df[x].isnull().sum())\n", + "\n", + " \n", + "data_qualityCheck()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[1mCOLUMN\u001b[0m \u001b[1mUNIQUE VALUES COUNT\u001b[0m\n", + "Unnamed: 0 129971\n", + "country 44\n", + "description 119955\n", + "designation 37980\n", + "points 21\n", + "price 391\n", + "province 426\n", + "region_1 1230\n", + "taster_name 20\n", + "taster_twitter_handle 16\n", + "title 118840\n", + "variety 708\n", + "winery 16757\n" + ] + } + ], + "source": [ + "print(\"{:{}}\".format(\"\\033[1mCOLUMN\\033[0m\",38),end='')\n", + "print(\"{:{}}\".format(\"\\033[1mUNIQUE VALUES COUNT\\033[0m\",18))\n", + "for x in train_df.columns:\n", + " print(\"{:{}}\".format(x,34),end='')\n", + " print(len(train_df[x].unique()))" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.337061Z", + "iopub.status.busy": "2021-05-25T06:50:32.336735Z", + "iopub.status.idle": "2021-05-25T06:50:32.367948Z", + "shell.execute_reply": "2021-05-25T06:50:32.366933Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.33703Z" + } + }, + "outputs": [], + "source": [ + "train_df=train_df.drop([\"region_1\", \"taster_twitter_handle\",\"taster_name\"], axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[1mCOLUMN\u001b[0m \u001b[1mNULL VALUES COUNT\u001b[0m\n", + "Unnamed: 0 0\n", + "country 63\n", + "description 0\n", + "designation 37465\n", + "points 0\n", + "price 8996\n", + "province 63\n", + "title 0\n", + "variety 1\n", + "winery 0\n" + ] + } + ], + "source": [ + "data_qualityCheck()" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "def fill_data(data):\n", + " data[\"country\"] = data[\"country\"].fillna(\"No Country\")\n", + " data[\"designation\"] = data[\"designation\"].fillna(\"No Designation\")\n", + " data[\"price\"]=data[\"price\"].fillna(0)\n", + " data[\"province\"]=data[\"province\"].fillna(\"No Province\")\n", + " data[\"variety\"]=data[\"variety\"].fillna(\"No variety\")\n", + " return data" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "train_df=fill_data(train_df)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.401314Z", + "iopub.status.busy": "2021-05-25T06:50:32.400868Z", + "iopub.status.idle": "2021-05-25T06:50:32.407806Z", + "shell.execute_reply": "2021-05-25T06:50:32.406589Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.401272Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(129971, 10)" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "train_df=train_df.drop([\"Unnamed: 0\"],axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.409912Z", + "iopub.status.busy": "2021-05-25T06:50:32.409162Z", + "iopub.status.idle": "2021-05-25T06:50:32.426843Z", + "shell.execute_reply": "2021-05-25T06:50:32.425727Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.409868Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countrydescriptiondesignationpointspriceprovincetitlevarietywinery
0ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco870.0Sicily & SardiniaNicosia 2013 Vulkà Bianco (Etna)White BlendNicosia
1PortugalThis is ripe and fruity, a wine that is smooth...Avidagos8715.0DouroQuinta dos Avidagos 2011 Avidagos Red (Douro)Portuguese RedQuinta dos Avidagos
2USTart and snappy, the flavors of lime flesh and...No Designation8714.0OregonRainstorm 2013 Pinot Gris (Willamette Valley)Pinot GrisRainstorm
3USPineapple rind, lemon pith and orange blossom ...Reserve Late Harvest8713.0MichiganSt. Julian 2013 Reserve Late Harvest Riesling ...RieslingSt. Julian
4USMuch like the regular bottling from 2012, this...Vintner's Reserve Wild Child Block8765.0OregonSweet Cheeks 2012 Vintner's Reserve Wild Child...Pinot NoirSweet Cheeks
5SpainBlackberry and raspberry aromas show a typical...Ars In Vitro8715.0Northern SpainTandem 2011 Ars In Vitro Tempranillo-Merlot (N...Tempranillo-MerlotTandem
6ItalyHere's a bright, informal red that opens with ...Belsito8716.0Sicily & SardiniaTerre di Giurfo 2013 Belsito Frappato (Vittoria)FrappatoTerre di Giurfo
7FranceThis dry and restrained wine offers spice in p...No Designation8724.0AlsaceTrimbach 2012 Gewurztraminer (Alsace)GewürztraminerTrimbach
8GermanySavory dried thyme notes accent sunnier flavor...Shine8712.0RheinhessenHeinz Eifel 2013 Shine Gewürztraminer (Rheinhe...GewürztraminerHeinz Eifel
9FranceThis has great depth of flavor with its fresh ...Les Natures8727.0AlsaceJean-Baptiste Adam 2012 Les Natures Pinot Gris...Pinot GrisJean-Baptiste Adam
10USSoft, supple plum envelopes an oaky structure ...Mountain Cuvée8719.0CaliforniaKirkland Signature 2011 Mountain Cuvée Caberne...Cabernet SauvignonKirkland Signature
11FranceThis is a dry wine, very spicy, with a tight, ...No Designation8730.0AlsaceLeon Beyer 2012 Gewurztraminer (Alsace)GewürztraminerLeon Beyer
12USSlightly reduced, this wine offers a chalky, t...No Designation8734.0CaliforniaLouis M. Martini 2012 Cabernet Sauvignon (Alex...Cabernet SauvignonLouis M. Martini
13ItalyThis is dominated by oak and oak-driven aromas...Rosso870.0Sicily & SardiniaMasseria Setteporte 2012 Rosso (Etna)Nerello MascaleseMasseria Setteporte
14USBuilding on 150 years and six generations of w...No Designation8712.0CaliforniaMirassou 2012 Chardonnay (Central Coast)ChardonnayMirassou
15GermanyZesty orange peels and apple notes abound in t...Devon8724.0MoselRichard Böcking 2013 Devon Riesling (Mosel)RieslingRichard Böcking
16ArgentinaBaked plum, molasses, balsamic vinegar and che...Felix8730.0OtherFelix Lavaque 2010 Felix Malbec (Cafayate)MalbecFelix Lavaque
17ArgentinaRaw black-cherry aromas are direct and simple ...Winemaker Selection8713.0Mendoza ProvinceGaucho Andino 2011 Winemaker Selection Malbec ...MalbecGaucho Andino
18SpainDesiccated blackberry, leather, charred wood a...Vendimia Seleccionada Finca Valdelayegua Singl...8728.0Northern SpainPradorey 2010 Vendimia Seleccionada Finca Vald...Tempranillo BlendPradorey
19USRed fruit aromas pervade on the nose, with cig...No Designation8732.0VirginiaQuiévremont 2012 Meritage (Virginia)MeritageQuiévremont
\n", + "
" + ], + "text/plain": [ + " country description \\\n", + "0 Italy Aromas include tropical fruit, broom, brimston... \n", + "1 Portugal This is ripe and fruity, a wine that is smooth... \n", + "2 US Tart and snappy, the flavors of lime flesh and... \n", + "3 US Pineapple rind, lemon pith and orange blossom ... \n", + "4 US Much like the regular bottling from 2012, this... \n", + "5 Spain Blackberry and raspberry aromas show a typical... \n", + "6 Italy Here's a bright, informal red that opens with ... \n", + "7 France This dry and restrained wine offers spice in p... \n", + "8 Germany Savory dried thyme notes accent sunnier flavor... \n", + "9 France This has great depth of flavor with its fresh ... \n", + "10 US Soft, supple plum envelopes an oaky structure ... \n", + "11 France This is a dry wine, very spicy, with a tight, ... \n", + "12 US Slightly reduced, this wine offers a chalky, t... \n", + "13 Italy This is dominated by oak and oak-driven aromas... \n", + "14 US Building on 150 years and six generations of w... \n", + "15 Germany Zesty orange peels and apple notes abound in t... \n", + "16 Argentina Baked plum, molasses, balsamic vinegar and che... \n", + "17 Argentina Raw black-cherry aromas are direct and simple ... \n", + "18 Spain Desiccated blackberry, leather, charred wood a... \n", + "19 US Red fruit aromas pervade on the nose, with cig... \n", + "\n", + " designation points price \\\n", + "0 Vulkà Bianco 87 0.0 \n", + "1 Avidagos 87 15.0 \n", + "2 No Designation 87 14.0 \n", + "3 Reserve Late Harvest 87 13.0 \n", + "4 Vintner's Reserve Wild Child Block 87 65.0 \n", + "5 Ars In Vitro 87 15.0 \n", + "6 Belsito 87 16.0 \n", + "7 No Designation 87 24.0 \n", + "8 Shine 87 12.0 \n", + "9 Les Natures 87 27.0 \n", + "10 Mountain Cuvée 87 19.0 \n", + "11 No Designation 87 30.0 \n", + "12 No Designation 87 34.0 \n", + "13 Rosso 87 0.0 \n", + "14 No Designation 87 12.0 \n", + "15 Devon 87 24.0 \n", + "16 Felix 87 30.0 \n", + "17 Winemaker Selection 87 13.0 \n", + "18 Vendimia Seleccionada Finca Valdelayegua Singl... 87 28.0 \n", + "19 No Designation 87 32.0 \n", + "\n", + " province title \\\n", + "0 Sicily & Sardinia Nicosia 2013 Vulkà Bianco (Etna) \n", + "1 Douro Quinta dos Avidagos 2011 Avidagos Red (Douro) \n", + "2 Oregon Rainstorm 2013 Pinot Gris (Willamette Valley) \n", + "3 Michigan St. Julian 2013 Reserve Late Harvest Riesling ... \n", + "4 Oregon Sweet Cheeks 2012 Vintner's Reserve Wild Child... \n", + "5 Northern Spain Tandem 2011 Ars In Vitro Tempranillo-Merlot (N... \n", + "6 Sicily & Sardinia Terre di Giurfo 2013 Belsito Frappato (Vittoria) \n", + "7 Alsace Trimbach 2012 Gewurztraminer (Alsace) \n", + "8 Rheinhessen Heinz Eifel 2013 Shine Gewürztraminer (Rheinhe... \n", + "9 Alsace Jean-Baptiste Adam 2012 Les Natures Pinot Gris... \n", + "10 California Kirkland Signature 2011 Mountain Cuvée Caberne... \n", + "11 Alsace Leon Beyer 2012 Gewurztraminer (Alsace) \n", + "12 California Louis M. Martini 2012 Cabernet Sauvignon (Alex... \n", + "13 Sicily & Sardinia Masseria Setteporte 2012 Rosso (Etna) \n", + "14 California Mirassou 2012 Chardonnay (Central Coast) \n", + "15 Mosel Richard Böcking 2013 Devon Riesling (Mosel) \n", + "16 Other Felix Lavaque 2010 Felix Malbec (Cafayate) \n", + "17 Mendoza Province Gaucho Andino 2011 Winemaker Selection Malbec ... \n", + "18 Northern Spain Pradorey 2010 Vendimia Seleccionada Finca Vald... \n", + "19 Virginia Quiévremont 2012 Meritage (Virginia) \n", + "\n", + " variety winery \n", + "0 White Blend Nicosia \n", + "1 Portuguese Red Quinta dos Avidagos \n", + "2 Pinot Gris Rainstorm \n", + "3 Riesling St. Julian \n", + "4 Pinot Noir Sweet Cheeks \n", + "5 Tempranillo-Merlot Tandem \n", + "6 Frappato Terre di Giurfo \n", + "7 Gewürztraminer Trimbach \n", + "8 Gewürztraminer Heinz Eifel \n", + "9 Pinot Gris Jean-Baptiste Adam \n", + "10 Cabernet Sauvignon Kirkland Signature \n", + "11 Gewürztraminer Leon Beyer \n", + "12 Cabernet Sauvignon Louis M. Martini \n", + "13 Nerello Mascalese Masseria Setteporte \n", + "14 Chardonnay Mirassou \n", + "15 Riesling Richard Böcking \n", + "16 Malbec Felix Lavaque \n", + "17 Malbec Gaucho Andino \n", + "18 Tempranillo Blend Pradorey \n", + "19 Meritage Quiévremont " + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df.head(20)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
pointsprice
count129971.000000129971.000000
mean88.44713832.915697
std3.03973040.582167
min80.0000000.000000
25%86.00000015.000000
50%88.00000025.000000
75%91.00000040.000000
max100.0000003300.000000
\n", + "
" + ], + "text/plain": [ + " points price\n", + "count 129971.000000 129971.000000\n", + "mean 88.447138 32.915697\n", + "std 3.039730 40.582167\n", + "min 80.000000 0.000000\n", + "25% 86.000000 15.000000\n", + "50% 88.000000 25.000000\n", + "75% 91.000000 40.000000\n", + "max 100.000000 3300.000000" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df.describe()" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.457112Z", + "iopub.status.busy": "2021-05-25T06:50:32.45653Z", + "iopub.status.idle": "2021-05-25T06:50:32.46346Z", + "shell.execute_reply": "2021-05-25T06:50:32.461467Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.457067Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0 87\n", + "1 87\n", + "2 87\n", + "3 87\n", + "4 87\n", + " ..\n", + "129966 90\n", + "129967 90\n", + "129968 90\n", + "129969 90\n", + "129970 90\n", + "Name: points, Length: 129971, dtype: int64" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_train = train_df['points']\n", + "label_train" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.46513Z", + "iopub.status.busy": "2021-05-25T06:50:32.46484Z", + "iopub.status.idle": "2021-05-25T06:50:32.479833Z", + "shell.execute_reply": "2021-05-25T06:50:32.478601Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.465102Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0 87\n", + "1 87\n", + "2 87\n", + "3 87\n", + "4 87\n", + "5 87\n", + "6 87\n", + "7 87\n", + "8 87\n", + "9 87\n", + "Name: points, dtype: int64" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_train.head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.481757Z", + "iopub.status.busy": "2021-05-25T06:50:32.481439Z", + "iopub.status.idle": "2021-05-25T06:50:32.493571Z", + "shell.execute_reply": "2021-05-25T06:50:32.492736Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.481728Z" + } + }, + "outputs": [], + "source": [ + "train_df = train_df.drop(\"points\", axis = 1)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.495566Z", + "iopub.status.busy": "2021-05-25T06:50:32.495116Z", + "iopub.status.idle": "2021-05-25T06:50:32.513957Z", + "shell.execute_reply": "2021-05-25T06:50:32.51265Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.495526Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countrydescriptiondesignationpriceprovincetitlevarietywinery
0ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco0.0Sicily & SardiniaNicosia 2013 Vulkà Bianco (Etna)White BlendNicosia
1PortugalThis is ripe and fruity, a wine that is smooth...Avidagos15.0DouroQuinta dos Avidagos 2011 Avidagos Red (Douro)Portuguese RedQuinta dos Avidagos
2USTart and snappy, the flavors of lime flesh and...No Designation14.0OregonRainstorm 2013 Pinot Gris (Willamette Valley)Pinot GrisRainstorm
3USPineapple rind, lemon pith and orange blossom ...Reserve Late Harvest13.0MichiganSt. Julian 2013 Reserve Late Harvest Riesling ...RieslingSt. Julian
4USMuch like the regular bottling from 2012, this...Vintner's Reserve Wild Child Block65.0OregonSweet Cheeks 2012 Vintner's Reserve Wild Child...Pinot NoirSweet Cheeks
5SpainBlackberry and raspberry aromas show a typical...Ars In Vitro15.0Northern SpainTandem 2011 Ars In Vitro Tempranillo-Merlot (N...Tempranillo-MerlotTandem
6ItalyHere's a bright, informal red that opens with ...Belsito16.0Sicily & SardiniaTerre di Giurfo 2013 Belsito Frappato (Vittoria)FrappatoTerre di Giurfo
7FranceThis dry and restrained wine offers spice in p...No Designation24.0AlsaceTrimbach 2012 Gewurztraminer (Alsace)GewürztraminerTrimbach
8GermanySavory dried thyme notes accent sunnier flavor...Shine12.0RheinhessenHeinz Eifel 2013 Shine Gewürztraminer (Rheinhe...GewürztraminerHeinz Eifel
9FranceThis has great depth of flavor with its fresh ...Les Natures27.0AlsaceJean-Baptiste Adam 2012 Les Natures Pinot Gris...Pinot GrisJean-Baptiste Adam
\n", + "
" + ], + "text/plain": [ + " country description \\\n", + "0 Italy Aromas include tropical fruit, broom, brimston... \n", + "1 Portugal This is ripe and fruity, a wine that is smooth... \n", + "2 US Tart and snappy, the flavors of lime flesh and... \n", + "3 US Pineapple rind, lemon pith and orange blossom ... \n", + "4 US Much like the regular bottling from 2012, this... \n", + "5 Spain Blackberry and raspberry aromas show a typical... \n", + "6 Italy Here's a bright, informal red that opens with ... \n", + "7 France This dry and restrained wine offers spice in p... \n", + "8 Germany Savory dried thyme notes accent sunnier flavor... \n", + "9 France This has great depth of flavor with its fresh ... \n", + "\n", + " designation price province \\\n", + "0 Vulkà Bianco 0.0 Sicily & Sardinia \n", + "1 Avidagos 15.0 Douro \n", + "2 No Designation 14.0 Oregon \n", + "3 Reserve Late Harvest 13.0 Michigan \n", + "4 Vintner's Reserve Wild Child Block 65.0 Oregon \n", + "5 Ars In Vitro 15.0 Northern Spain \n", + "6 Belsito 16.0 Sicily & Sardinia \n", + "7 No Designation 24.0 Alsace \n", + "8 Shine 12.0 Rheinhessen \n", + "9 Les Natures 27.0 Alsace \n", + "\n", + " title variety \\\n", + "0 Nicosia 2013 Vulkà Bianco (Etna) White Blend \n", + "1 Quinta dos Avidagos 2011 Avidagos Red (Douro) Portuguese Red \n", + "2 Rainstorm 2013 Pinot Gris (Willamette Valley) Pinot Gris \n", + "3 St. Julian 2013 Reserve Late Harvest Riesling ... Riesling \n", + "4 Sweet Cheeks 2012 Vintner's Reserve Wild Child... Pinot Noir \n", + "5 Tandem 2011 Ars In Vitro Tempranillo-Merlot (N... Tempranillo-Merlot \n", + "6 Terre di Giurfo 2013 Belsito Frappato (Vittoria) Frappato \n", + "7 Trimbach 2012 Gewurztraminer (Alsace) Gewürztraminer \n", + "8 Heinz Eifel 2013 Shine Gewürztraminer (Rheinhe... Gewürztraminer \n", + "9 Jean-Baptiste Adam 2012 Les Natures Pinot Gris... Pinot Gris \n", + "\n", + " winery \n", + "0 Nicosia \n", + "1 Quinta dos Avidagos \n", + "2 Rainstorm \n", + "3 St. Julian \n", + "4 Sweet Cheeks \n", + "5 Tandem \n", + "6 Terre di Giurfo \n", + "7 Trimbach \n", + "8 Heinz Eifel \n", + "9 Jean-Baptiste Adam " + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df.head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[1mCOLUMN\u001b[0m \u001b[1mNULL VALUES COUNT\u001b[0m\n", + "country 0\n", + "description 0\n", + "designation 0\n", + "price 0\n", + "province 0\n", + "title 0\n", + "variety 0\n", + "winery 0\n" + ] + } + ], + "source": [ + "data_qualityCheck()" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "train_df[\"text\"]=train_df[\"country\"]+\" \"+train_df[\"description\"]+\" \"+train_df[\"designation\"]+\" \"+train_df[\"province\"]+\" \"+train_df[\"title\"]+\" \"+train_df[\"variety\"]+\" \"+train_df[\"winery\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "train_df=train_df.drop([\"designation\",\"country\",\"province\",\"description\",\"title\",\"variety\",\"winery\"],axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
pricetext
00.0Italy Aromas include tropical fruit, broom, br...
115.0Portugal This is ripe and fruity, a wine that ...
214.0US Tart and snappy, the flavors of lime flesh ...
313.0US Pineapple rind, lemon pith and orange bloss...
465.0US Much like the regular bottling from 2012, t...
515.0Spain Blackberry and raspberry aromas show a t...
616.0Italy Here's a bright, informal red that opens...
724.0France This dry and restrained wine offers spi...
812.0Germany Savory dried thyme notes accent sunnie...
927.0France This has great depth of flavor with its...
1019.0US Soft, supple plum envelopes an oaky structu...
1130.0France This is a dry wine, very spicy, with a ...
1234.0US Slightly reduced, this wine offers a chalky...
130.0Italy This is dominated by oak and oak-driven ...
1412.0US Building on 150 years and six generations o...
1524.0Germany Zesty orange peels and apple notes abo...
1630.0Argentina Baked plum, molasses, balsamic vineg...
1713.0Argentina Raw black-cherry aromas are direct a...
1828.0Spain Desiccated blackberry, leather, charred ...
1932.0US Red fruit aromas pervade on the nose, with ...
\n", + "
" + ], + "text/plain": [ + " price text\n", + "0 0.0 Italy Aromas include tropical fruit, broom, br...\n", + "1 15.0 Portugal This is ripe and fruity, a wine that ...\n", + "2 14.0 US Tart and snappy, the flavors of lime flesh ...\n", + "3 13.0 US Pineapple rind, lemon pith and orange bloss...\n", + "4 65.0 US Much like the regular bottling from 2012, t...\n", + "5 15.0 Spain Blackberry and raspberry aromas show a t...\n", + "6 16.0 Italy Here's a bright, informal red that opens...\n", + "7 24.0 France This dry and restrained wine offers spi...\n", + "8 12.0 Germany Savory dried thyme notes accent sunnie...\n", + "9 27.0 France This has great depth of flavor with its...\n", + "10 19.0 US Soft, supple plum envelopes an oaky structu...\n", + "11 30.0 France This is a dry wine, very spicy, with a ...\n", + "12 34.0 US Slightly reduced, this wine offers a chalky...\n", + "13 0.0 Italy This is dominated by oak and oak-driven ...\n", + "14 12.0 US Building on 150 years and six generations o...\n", + "15 24.0 Germany Zesty orange peels and apple notes abo...\n", + "16 30.0 Argentina Baked plum, molasses, balsamic vineg...\n", + "17 13.0 Argentina Raw black-cherry aromas are direct a...\n", + "18 28.0 Spain Desiccated blackberry, leather, charred ...\n", + "19 32.0 US Red fruit aromas pervade on the nose, with ..." + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df.head(20)" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [], + "source": [ + "price=train_df[\"price\"]\n", + "train_df=train_df.drop(\"price\",axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "custom_download_dir = \"C:\\\\Users\\\\ysach/nltk\"\n", + "nltk.data.path.append(custom_download_dir)" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[nltk_data] Downloading package stopwords to C:\\Users\\ysach/nltk...\n", + "[nltk_data] Package stopwords is already up-to-date!\n" + ] + }, + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nltk.download('stopwords',download_dir=custom_download_dir)" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.51602Z", + "iopub.status.busy": "2021-05-25T06:50:32.515411Z", + "iopub.status.idle": "2021-05-25T06:50:32.531829Z", + "shell.execute_reply": "2021-05-25T06:50:32.530895Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.515972Z" + } + }, + "outputs": [], + "source": [ + "lemmatizer = WordNetLemmatizer()\n", + "stpwrds = list(stopwords.words('english'))" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['i',\n", + " 'me',\n", + " 'my',\n", + " 'myself',\n", + " 'we',\n", + " 'our',\n", + " 'ours',\n", + " 'ourselves',\n", + " 'you',\n", + " \"you're\",\n", + " \"you've\",\n", + " \"you'll\",\n", + " \"you'd\",\n", + " 'your',\n", + " 'yours',\n", + " 'yourself',\n", + " 'yourselves',\n", + " 'he',\n", + " 'him',\n", + " 'his',\n", + " 'himself',\n", + " 'she',\n", + " \"she's\",\n", + " 'her',\n", + " 'hers',\n", + " 'herself',\n", + " 'it',\n", + " \"it's\",\n", + " 'its',\n", + " 'itself',\n", + " 'they',\n", + " 'them',\n", + " 'their',\n", + " 'theirs',\n", + " 'themselves',\n", + " 'what',\n", + " 'which',\n", + " 'who',\n", + " 'whom',\n", + " 'this',\n", + " 'that',\n", + " \"that'll\",\n", + " 'these',\n", + " 'those',\n", + " 'am',\n", + " 'is',\n", + " 'are',\n", + " 'was',\n", + " 'were',\n", + " 'be',\n", + " 'been',\n", + " 'being',\n", + " 'have',\n", + " 'has',\n", + " 'had',\n", + " 'having',\n", + " 'do',\n", + " 'does',\n", + " 'did',\n", + " 'doing',\n", + " 'a',\n", + " 'an',\n", + " 'the',\n", + " 'and',\n", + " 'but',\n", + " 'if',\n", + " 'or',\n", + " 'because',\n", + " 'as',\n", + " 'until',\n", + " 'while',\n", + " 'of',\n", + " 'at',\n", + " 'by',\n", + " 'for',\n", + " 'with',\n", + " 'about',\n", + " 'against',\n", + " 'between',\n", + " 'into',\n", + " 'through',\n", + " 'during',\n", + " 'before',\n", + " 'after',\n", + " 'above',\n", + " 'below',\n", + " 'to',\n", + " 'from',\n", + " 'up',\n", + " 'down',\n", + " 'in',\n", + " 'out',\n", + " 'on',\n", + " 'off',\n", + " 'over',\n", + " 'under',\n", + " 'again',\n", + " 'further',\n", + " 'then',\n", + " 'once',\n", + " 'here',\n", + " 'there',\n", + " 'when',\n", + " 'where',\n", + " 'why',\n", + " 'how',\n", + " 'all',\n", + " 'any',\n", + " 'both',\n", + " 'each',\n", + " 'few',\n", + " 'more',\n", + " 'most',\n", + " 'other',\n", + " 'some',\n", + " 'such',\n", + " 'no',\n", + " 'nor',\n", + " 'not',\n", + " 'only',\n", + " 'own',\n", + " 'same',\n", + " 'so',\n", + " 'than',\n", + " 'too',\n", + " 'very',\n", + " 's',\n", + " 't',\n", + " 'can',\n", + " 'will',\n", + " 'just',\n", + " 'don',\n", + " \"don't\",\n", + " 'should',\n", + " \"should've\",\n", + " 'now',\n", + " 'd',\n", + " 'll',\n", + " 'm',\n", + " 'o',\n", + " 're',\n", + " 've',\n", + " 'y',\n", + " 'ain',\n", + " 'aren',\n", + " \"aren't\",\n", + " 'couldn',\n", + " \"couldn't\",\n", + " 'didn',\n", + " \"didn't\",\n", + " 'doesn',\n", + " \"doesn't\",\n", + " 'hadn',\n", + " \"hadn't\",\n", + " 'hasn',\n", + " \"hasn't\",\n", + " 'haven',\n", + " \"haven't\",\n", + " 'isn',\n", + " \"isn't\",\n", + " 'ma',\n", + " 'mightn',\n", + " \"mightn't\",\n", + " 'mustn',\n", + " \"mustn't\",\n", + " 'needn',\n", + " \"needn't\",\n", + " 'shan',\n", + " \"shan't\",\n", + " 'shouldn',\n", + " \"shouldn't\",\n", + " 'wasn',\n", + " \"wasn't\",\n", + " 'weren',\n", + " \"weren't\",\n", + " 'won',\n", + " \"won't\",\n", + " 'wouldn',\n", + " \"wouldn't\"]" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "stpwrds" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[nltk_data] Downloading package punkt to C:\\Users\\ysach/nltk...\n", + "[nltk_data] Package punkt is already up-to-date!\n", + "[nltk_data] Downloading package wordnet to C:\\Users\\ysach/nltk...\n", + "[nltk_data] Package wordnet is already up-to-date!\n" + ] + }, + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nltk.download('punkt',download_dir=custom_download_dir)\n", + "nltk.download('wordnet',download_dir=custom_download_dir)" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[nltk_data] Downloading package omw-1.4 to C:\\Users\\ysach/nltk...\n", + "[nltk_data] Package omw-1.4 is already up-to-date!\n" + ] + }, + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nltk.download('omw-1.4',download_dir=custom_download_dir)" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.54905Z", + "iopub.status.busy": "2021-05-25T06:50:32.548517Z", + "iopub.status.idle": "2021-05-25T06:53:51.648153Z", + "shell.execute_reply": "2021-05-25T06:53:51.647283Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.549015Z" + } + }, + "outputs": [], + "source": [ + "for x in range(len(train_df)) :\n", + " corpus = []\n", + " review = train_df['text'][x]\n", + " review = re.sub(r'[^a-zA-Z\\s]', '', review)\n", + " review = review.lower()\n", + " review = nltk.word_tokenize(review)\n", + " for y in review :\n", + " if y not in stpwrds :\n", + " corpus.append(lemmatizer.lemmatize(y))\n", + " review = ' '.join(corpus)\n", + " train_df['text'][x] = review" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T07:14:51.798724Z", + "iopub.status.busy": "2021-05-25T07:14:51.798361Z", + "iopub.status.idle": "2021-05-25T07:14:51.805617Z", + "shell.execute_reply": "2021-05-25T07:14:51.804946Z", + "shell.execute_reply.started": "2021-05-25T07:14:51.798694Z" + }, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'u touch riesling accentuates fresh citrusy backbone cabernet sauvignon ro dry style sprightly lightfooted tone offer load concentrated cherry berry flavor finish brisk clean dry new york osprey dominion dry ro north fork long island ro osprey dominion'" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df['text'][2188]" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "89" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_train[2188]" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T07:16:37.152728Z", + "iopub.status.busy": "2021-05-25T07:16:37.152216Z", + "iopub.status.idle": "2021-05-25T07:16:37.163059Z", + "shell.execute_reply": "2021-05-25T07:16:37.161884Z", + "shell.execute_reply.started": "2021-05-25T07:16:37.152696Z" + } + }, + "outputs": [], + "source": [ + "X_train= train_df['text']" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0 italy aroma include tropical fruit broom brims...\n", + "1 portugal ripe fruity wine smooth still structu...\n", + "2 u tart snappy flavor lime flesh rind dominate ...\n", + "3 u pineapple rind lemon pith orange blossom sta...\n", + "4 u much like regular bottling come across rathe...\n", + " ... \n", + "129966 germany note honeysuckle cantaloupe sweeten de...\n", + "129967 u citation given much decade bottle age prior ...\n", + "129968 france welldrained gravel soil give wine crisp...\n", + "129969 france dry style pinot gris crisp acidity also...\n", + "129970 france big rich offdry powered intense spicine...\n", + "Name: text, Length: 129971, dtype: object" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "X_train" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T07:17:50.592597Z", + "iopub.status.busy": "2021-05-25T07:17:50.592095Z", + "iopub.status.idle": "2021-05-25T07:17:50.598862Z", + "shell.execute_reply": "2021-05-25T07:17:50.597641Z", + "shell.execute_reply.started": "2021-05-25T07:17:50.592566Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(129971,)" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "X_train.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T07:18:05.89317Z", + "iopub.status.busy": "2021-05-25T07:18:05.892651Z", + "iopub.status.idle": "2021-05-25T07:18:05.902743Z", + "shell.execute_reply": "2021-05-25T07:18:05.901523Z", + "shell.execute_reply.started": "2021-05-25T07:18:05.893127Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(129971,)" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_train.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [], + "source": [ + "from keras.preprocessing.text import Tokenizer\n", + "from keras.preprocessing.sequence import pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The Padding Sequance Shape is --> (129971, 89)\n" + ] + } + ], + "source": [ + "tokenize = Tokenizer(oov_token=\"\")\n", + "tokenize.fit_on_texts(X_train)\n", + "word_idx = tokenize.word_index\n", + "\n", + "text2seq = tokenize.texts_to_sequences(X_train)\n", + "\n", + "# pad_seq = pad_sequences(text2seq, maxlen=150, padding=\"pre\", truncating=\"pre\")\n", + "\n", + "pad_seq = pad_sequences(text2seq, padding=\"pre\", truncating=\"pre\")\n", + "\n", + "\n", + "print(\"The Padding Sequance Shape is --> \", pad_seq.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [], + "source": [ + "input_length = max(len(seq) for seq in text2seq)\n", + "\n", + "vocabulary_size = len(word_idx) + 1" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The maximum Sequance Length is --> 89\n", + "The vocabulary size of dataset is --> 63971\n" + ] + } + ], + "source": [ + "print(\"The maximum Sequance Length is --> \", input_length)\n", + "print(\"The vocabulary size of dataset is --> \", vocabulary_size)" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [], + "source": [ + "df=pd.DataFrame(pad_seq)" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0123456789...79808182838485868788
00000000000...8244294491138628260824166430811386
10000000000...1377113871138752772155249137711387
20000000000...151049316112253739112259316
30000000000...594888264829326084718485236353
40000000000...7384393112337391123505969
50000000000...36628186587385672936628145842103145847385
60000000000...5717411791235853139385331395717411791
70000000000...5229044015172603110641724466031
80000000000...855113576307922855446113544676307922
90000000000...214177724112251721122531242141
\n", + "

10 rows × 89 columns

\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4 5 6 7 8 9 ... 79 80 81 82 \\\n", + "0 0 0 0 0 0 0 0 0 0 0 ... 824 429 449 11386 \n", + "1 0 0 0 0 0 0 0 0 0 0 ... 1377 11387 11387 5 \n", + "2 0 0 0 0 0 0 0 0 0 0 ... 15 104 9316 11 \n", + "3 0 0 0 0 0 0 0 0 0 0 ... 59 488 826 48 \n", + "4 0 0 0 0 0 0 0 0 0 0 ... 7384 393 11 23 \n", + "5 0 0 0 0 0 0 0 0 0 0 ... 36628 186 58 7385 \n", + "6 0 0 0 0 0 0 0 0 0 0 ... 571 74 11791 23585 \n", + "7 0 0 0 0 0 0 0 0 0 0 ... 52 290 440 15 \n", + "8 0 0 0 0 0 0 0 0 0 0 ... 855 1135 7630 7922 \n", + "9 0 0 0 0 0 0 0 0 0 0 ... 2141 77 724 11 \n", + "\n", + " 83 84 85 86 87 88 \n", + "0 28260 824 1664 30 8 11386 \n", + "1 277 215 5 249 1377 11387 \n", + "2 225 373 9 11 225 9316 \n", + "3 293 2608 4718 48 523 6353 \n", + "4 373 9 11 23 50 5969 \n", + "5 6729 36628 14584 2103 14584 7385 \n", + "6 3139 3853 3139 571 74 11791 \n", + "7 172 6031 1064 172 446 6031 \n", + "8 855 446 1135 446 7630 7922 \n", + "9 225 172 11 225 3124 2141 \n", + "\n", + "[10 rows x 89 columns]" + ] + }, + "execution_count": 50, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [], + "source": [ + "df['89']=price" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0123456789...80818283848586878889
00000000000...42944911386282608241664308113860.0
10000000000...11387113875277215524913771138715.0
20000000000...104931611225373911225931614.0
30000000000...488826482932608471848523635313.0
40000000000...39311233739112350596965.0
50000000000...18658738567293662814584210314584738515.0
60000000000...741179123585313938533139571741179116.0
70000000000...2904401517260311064172446603124.0
80000000000...11357630792285544611354467630792212.0
90000000000...7772411225172112253124214127.0
100000000000...158181613291816252385719.0
110000000000...151725006385410641724465006385430.0
120000000000...3434181669291816580343434.0
130000000000...4493784235865411664173419253784235860.0
140000000000...1233151051052620926026510512.0
150000000000...40032441204211388484004832441204224.0
160000000000...561656162041456169237349256162041430.0
170000000000...603220415354360921359260322041513.0
180000000000...423136037901688862128692628.0
190000000000...16516711554023587107054010702358732.0
\n", + "

20 rows × 90 columns

\n", + "
" + ], + "text/plain": [ + " 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 \\\n", + "0 0 0 0 0 0 0 0 0 0 0 ... 429 449 11386 28260 824 \n", + "1 0 0 0 0 0 0 0 0 0 0 ... 11387 11387 5 277 215 \n", + "2 0 0 0 0 0 0 0 0 0 0 ... 104 9316 11 225 373 \n", + "3 0 0 0 0 0 0 0 0 0 0 ... 488 826 48 293 2608 \n", + "4 0 0 0 0 0 0 0 0 0 0 ... 393 11 23 373 9 \n", + "5 0 0 0 0 0 0 0 0 0 0 ... 186 58 7385 6729 36628 \n", + "6 0 0 0 0 0 0 0 0 0 0 ... 74 11791 23585 3139 3853 \n", + "7 0 0 0 0 0 0 0 0 0 0 ... 290 440 15 172 6031 \n", + "8 0 0 0 0 0 0 0 0 0 0 ... 1135 7630 7922 855 446 \n", + "9 0 0 0 0 0 0 0 0 0 0 ... 77 724 11 225 172 \n", + "10 0 0 0 0 0 0 0 0 0 0 ... 158 18 16 132 9 \n", + "11 0 0 0 0 0 0 0 0 0 0 ... 15 172 5006 3854 1064 \n", + "12 0 0 0 0 0 0 0 0 0 0 ... 3434 18 16 692 9 \n", + "13 0 0 0 0 0 0 0 0 0 0 ... 449 3784 23586 541 1664 \n", + "14 0 0 0 0 0 0 0 0 0 0 ... 1233 15 10 5105 26 \n", + "15 0 0 0 0 0 0 0 0 0 0 ... 400 3244 12042 11388 48 \n", + "16 0 0 0 0 0 0 0 0 0 0 ... 5616 5616 20414 5616 92 \n", + "17 0 0 0 0 0 0 0 0 0 0 ... 6032 20415 354 360 92 \n", + "18 0 0 0 0 0 0 0 0 0 0 ... 423 13 603 790 168 \n", + "19 0 0 0 0 0 0 0 0 0 0 ... 165 1671 15 540 23587 \n", + "\n", + " 85 86 87 88 89 \n", + "0 1664 30 8 11386 0.0 \n", + "1 5 249 1377 11387 15.0 \n", + "2 9 11 225 9316 14.0 \n", + "3 4718 48 523 6353 13.0 \n", + "4 11 23 50 5969 65.0 \n", + "5 14584 2103 14584 7385 15.0 \n", + "6 3139 571 74 11791 16.0 \n", + "7 1064 172 446 6031 24.0 \n", + "8 1135 446 7630 7922 12.0 \n", + "9 11 225 3124 2141 27.0 \n", + "10 18 16 2523 857 19.0 \n", + "11 172 446 5006 3854 30.0 \n", + "12 18 16 580 3434 34.0 \n", + "13 1734 1925 3784 23586 0.0 \n", + "14 209 260 26 5105 12.0 \n", + "15 400 48 3244 12042 24.0 \n", + "16 3734 92 5616 20414 30.0 \n", + "17 135 92 6032 20415 13.0 \n", + "18 886 212 8 6926 28.0 \n", + "19 1070 540 1070 23587 32.0 \n", + "\n", + "[20 rows x 90 columns]" + ] + }, + "execution_count": 52, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(20)" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Index([], dtype='object')\n" + ] + } + ], + "source": [ + "zero_columns = df.columns[df.eq(0).all()]\n", + "print(zero_columns)" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(129971, 90)" + ] + }, + "execution_count": 54, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.feature_extraction.text import TfidfVectorizer\n", + "from sklearn.feature_extraction.text import CountVectorizer\n", + "vectorizer = CountVectorizer(\n", + " ngram_range=(1,1),\n", + " max_features=25\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [], + "source": [ + "df3=pd.get_dummies(label_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
80818283848586878889...919293949596979899100
00000000100...0000000000
10000000100...0000000000
20000000100...0000000000
30000000100...0000000000
40000000100...0000000000
50000000100...0000000000
60000000100...0000000000
70000000100...0000000000
80000000100...0000000000
90000000100...0000000000
\n", + "

10 rows × 21 columns

\n", + "
" + ], + "text/plain": [ + " 80 81 82 83 84 85 86 87 88 89 ... 91 92 93 94 \\\n", + "0 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n", + "1 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n", + "2 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n", + "3 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n", + "4 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n", + "5 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n", + "6 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n", + "7 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n", + "8 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n", + "9 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n", + "\n", + " 95 96 97 98 99 100 \n", + "0 0 0 0 0 0 0 \n", + "1 0 0 0 0 0 0 \n", + "2 0 0 0 0 0 0 \n", + "3 0 0 0 0 0 0 \n", + "4 0 0 0 0 0 0 \n", + "5 0 0 0 0 0 0 \n", + "6 0 0 0 0 0 0 \n", + "7 0 0 0 0 0 0 \n", + "8 0 0 0 0 0 0 \n", + "9 0 0 0 0 0 0 \n", + "\n", + "[10 rows x 21 columns]" + ] + }, + "execution_count": 60, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df3.head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: tensorflow in c:\\users\\ysach\\anaconda3\\lib\\site-packages (2.15.0)\n", + "Requirement already satisfied: tensorflow-intel==2.15.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow) (2.15.0)\n", + "Requirement already satisfied: tensorboard<2.16,>=2.15 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (2.15.1)\n", + "Requirement already satisfied: tensorflow-estimator<2.16,>=2.15.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (2.15.0)\n", + "Requirement already satisfied: numpy<2.0.0,>=1.23.5 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (1.23.5)\n", + "Requirement already satisfied: six>=1.12.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (1.16.0)\n", + "Requirement already satisfied: google-pasta>=0.1.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (0.2.0)\n", + "Requirement already satisfied: absl-py>=1.0.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (1.4.0)\n", + "Requirement already satisfied: setuptools in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (65.6.3)\n", + "Requirement already satisfied: h5py>=2.9.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (3.7.0)\n", + "Requirement already satisfied: libclang>=13.0.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (16.0.6)\n", + "Requirement already satisfied: wrapt<1.15,>=1.11.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (1.14.1)\n", + "Requirement already satisfied: typing-extensions>=3.6.6 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (4.4.0)\n", + "Requirement already satisfied: keras<2.16,>=2.15.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (2.15.0)\n", + "Requirement already satisfied: termcolor>=1.1.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (2.4.0)\n", + "Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (0.5.4)\n", + "Requirement already satisfied: flatbuffers>=23.5.26 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (23.5.26)\n", + "Requirement already satisfied: grpcio<2.0,>=1.24.3 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (1.60.0)\n", + "Requirement already satisfied: opt-einsum>=2.3.2 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (3.3.0)\n", + "Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (0.31.0)\n", + "Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (3.20.3)\n", + "Requirement already satisfied: ml-dtypes~=0.2.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (0.2.0)\n", + "Requirement already satisfied: astunparse>=1.6.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (1.6.3)\n", + "Requirement already satisfied: packaging in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (22.0)\n", + "Requirement already satisfied: wheel<1.0,>=0.23.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from astunparse>=1.6.0->tensorflow-intel==2.15.0->tensorflow) (0.38.4)\n", + "Requirement already satisfied: google-auth-oauthlib<2,>=0.5 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (1.2.0)\n", + "Requirement already satisfied: requests<3,>=2.21.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2.28.1)\n", + "Requirement already satisfied: werkzeug>=1.0.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2.2.2)\n", + "Requirement already satisfied: markdown>=2.6.8 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (3.4.1)\n", + "Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (0.7.2)\n", + "Requirement already satisfied: google-auth<3,>=1.6.3 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2.25.2)\n", + "Requirement already satisfied: pyasn1-modules>=0.2.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (0.2.8)\n", + "Requirement already satisfied: rsa<5,>=3.1.4 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (4.9)\n", + "Requirement already satisfied: cachetools<6.0,>=2.0.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (5.3.2)\n", + "Requirement already satisfied: requests-oauthlib>=0.7.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from google-auth-oauthlib<2,>=0.5->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (1.3.1)\n", + "Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2023.11.17)\n", + "Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (1.26.14)\n", + "Requirement already satisfied: charset-normalizer<3,>=2 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2.0.4)\n", + "Requirement already satisfied: idna<4,>=2.5 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (3.4)\n", + "Requirement already satisfied: MarkupSafe>=2.1.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from werkzeug>=1.0.1->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2.1.1)\n", + "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (0.4.8)\n", + "Requirement already satisfied: oauthlib>=3.0.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<2,>=0.5->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (3.2.2)\n" + ] + } + ], + "source": [ + "!pip install tensorflow" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "metadata": {}, + "outputs": [], + "source": [ + "df1=df['89']" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": {}, + "outputs": [], + "source": [ + "import keras\n", + "from keras.models import Sequential\n", + "from keras.utils import to_categorical\n", + "from keras import metrics as metrics1\n", + "from keras.layers import LeakyReLU\n", + "from keras.layers import Dense, Embedding, GlobalAveragePooling1D, LSTM, Bidirectional,InputLayer" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": {}, + "outputs": [], + "source": [ + "x_train1, x_test, y_train1, y_test = train_test_split(pad_seq, df3, train_size=0.7)" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From c:\\Users\\ysach\\anaconda3\\lib\\site-packages\\keras\\src\\backend.py:873: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.\n", + "\n", + "WARNING:tensorflow:From c:\\Users\\ysach\\anaconda3\\lib\\site-packages\\keras\\src\\optimizers\\__init__.py:309: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n" + ] + } + ], + "source": [ + "classifier = Sequential()\n", + "classifier.add(Embedding(vocabulary_size, 200, input_length=89))\n", + "classifier.add(GlobalAveragePooling1D())\n", + "classifier.add(Dense(100, activation='relu'))\n", + "classifier.add(Dense(50, activation='relu'))\n", + "classifier.add(Dense(21, activation='sigmoid'))\n", + "\n", + "# Compile the model\n", + "classifier.compile(optimizer='adam',\n", + " loss='categorical_crossentropy',\n", + " metrics=['accuracy'])" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Model: \"sequential\"\n", + "_________________________________________________________________\n", + " Layer (type) Output Shape Param # \n", + "=================================================================\n", + " embedding (Embedding) (None, 89, 200) 12794200 \n", + " \n", + " global_average_pooling1d ( (None, 200) 0 \n", + " GlobalAveragePooling1D) \n", + " \n", + " dense (Dense) (None, 100) 20100 \n", + " \n", + " dense_1 (Dense) (None, 50) 5050 \n", + " \n", + " dense_2 (Dense) (None, 21) 1071 \n", + " \n", + "=================================================================\n", + "Total params: 12820421 (48.91 MB)\n", + "Trainable params: 12820421 (48.91 MB)\n", + "Non-trainable params: 0 (0.00 Byte)\n", + "_________________________________________________________________\n" + ] + } + ], + "source": [ + "classifier.summary()" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1/10\n", + "WARNING:tensorflow:From c:\\Users\\ysach\\anaconda3\\lib\\site-packages\\keras\\src\\utils\\tf_utils.py:492: The name tf.ragged.RaggedTensorValue is deprecated. Please use tf.compat.v1.ragged.RaggedTensorValue instead.\n", + "\n", + "WARNING:tensorflow:From c:\\Users\\ysach\\anaconda3\\lib\\site-packages\\keras\\src\\engine\\base_layer_utils.py:384: The name tf.executing_eagerly_outside_functions is deprecated. Please use tf.compat.v1.executing_eagerly_outside_functions instead.\n", + "\n", + "2844/2844 [==============================] - 700s 246ms/step - loss: 1.9759 - accuracy: 0.2405 - val_loss: 1.8103 - val_accuracy: 0.2790\n", + "Epoch 2/10\n", + "2844/2844 [==============================] - 680s 239ms/step - loss: 1.6719 - accuracy: 0.3298 - val_loss: 1.7734 - val_accuracy: 0.2931\n", + "Epoch 3/10\n", + "2844/2844 [==============================] - 716s 252ms/step - loss: 1.5275 - accuracy: 0.3928 - val_loss: 1.7976 - val_accuracy: 0.3054\n", + "Epoch 4/10\n", + "2844/2844 [==============================] - 744s 262ms/step - loss: 1.3634 - accuracy: 0.4765 - val_loss: 1.8590 - val_accuracy: 0.3115\n", + "Epoch 5/10\n", + "2844/2844 [==============================] - 702s 247ms/step - loss: 1.1777 - accuracy: 0.5636 - val_loss: 2.0084 - val_accuracy: 0.3067\n", + "Epoch 6/10\n", + "2844/2844 [==============================] - 651s 229ms/step - loss: 1.0069 - accuracy: 0.6370 - val_loss: 2.1700 - val_accuracy: 0.3127\n", + "Epoch 7/10\n", + "2844/2844 [==============================] - 636s 224ms/step - loss: 0.8697 - accuracy: 0.6905 - val_loss: 2.3802 - val_accuracy: 0.3126\n", + "Epoch 8/10\n", + "2844/2844 [==============================] - 779s 274ms/step - loss: 0.7557 - accuracy: 0.7336 - val_loss: 2.5939 - val_accuracy: 0.3108\n", + "Epoch 9/10\n", + "2844/2844 [==============================] - 635s 223ms/step - loss: 0.6597 - accuracy: 0.7682 - val_loss: 2.8150 - val_accuracy: 0.3061\n", + "Epoch 10/10\n", + "2844/2844 [==============================] - 677s 238ms/step - loss: 0.5769 - accuracy: 0.7990 - val_loss: 3.1260 - val_accuracy: 0.3100\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 67, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "classifier.fit(x_train1,y_train1,epochs=10,validation_data=(x_test, y_test))" + ] + }, + { + "cell_type": "code", + "execution_count": 71, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1219/1219 [==============================] - 1s 917us/step\n" + ] + } + ], + "source": [ + "Y_pred = classifier.predict(x_test)\n", + "a=[]\n", + "for x in Y_pred:\n", + " a.append(80 +np.argmax(x))" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[85,\n", + " 85,\n", + " 90,\n", + " 88,\n", + " 87,\n", + " 86,\n", + " 84,\n", + " 91,\n", + " 91,\n", + " 89,\n", + " 92,\n", + " 92,\n", + " 90,\n", + " 87,\n", + " 86,\n", + " 89,\n", + " 87,\n", + " 99,\n", + " 95,\n", + " 86,\n", + " 90,\n", + " 85,\n", + " 94,\n", + " 93,\n", + " 88,\n", + " 85,\n", + " 88,\n", + " 89,\n", + " 87,\n", + " 90,\n", + " 86,\n", + " 89,\n", + " 85,\n", + " 84,\n", + " 90,\n", + " 85,\n", + " 90,\n", + " 89,\n", + " 85,\n", + " 89,\n", + " 87,\n", + " 86,\n", + " 86,\n", + " 89,\n", + " 84,\n", + " 92,\n", + " 90,\n", + " 92,\n", + " 92,\n", + " 87,\n", + " 85,\n", + " 89,\n", + " 84,\n", + " 90,\n", + " 90,\n", + " 88,\n", + " 86,\n", + " 91,\n", + " 88,\n", + " 93,\n", + " 89,\n", + " 88,\n", + " 85,\n", + " 88,\n", + " 87,\n", + " 90,\n", + " 83,\n", + " 88,\n", + " 91,\n", + " 84,\n", + " 94,\n", + " 87,\n", + " 91,\n", + " 89,\n", + " 88,\n", + " 92,\n", + " 86,\n", + " 92,\n", + " 84,\n", + " 93,\n", + " 89,\n", + " 87,\n", + " 84,\n", + " 89,\n", + " 93,\n", + " 89,\n", + " 88,\n", + " 88,\n", + " 93,\n", + " 94,\n", + " 91,\n", + " 92,\n", + " 93,\n", + " 87,\n", + " 87,\n", + " 82,\n", + " 92,\n", + " 84,\n", + " 92,\n", + " 90,\n", + " 90,\n", + " 93,\n", + " 91,\n", + " 94,\n", + " 93,\n", + " 93,\n", + " 90,\n", + " 90,\n", + " 88,\n", + " 87,\n", + " 83,\n", + " 85,\n", + " 85,\n", + " 92,\n", + " 90,\n", + " 85,\n", + " 93,\n", + " 87,\n", + " 86,\n", + " 89,\n", + " 88,\n", + " 85,\n", + " 92,\n", + " 90,\n", + " 85,\n", + " 94,\n", + " 90,\n", + " 86,\n", + " 91,\n", + " 89,\n", + " 88,\n", + " 88,\n", + " 84,\n", + " 82,\n", + " 85,\n", + " 91,\n", + " 87,\n", + " 88,\n", + " 89,\n", + " 92,\n", + " 89,\n", + " 85,\n", + " 91,\n", + " 88,\n", + " 87,\n", + " 89,\n", + " 88,\n", + " 88,\n", + " 87,\n", + " 95,\n", + " 86,\n", + " 96,\n", + " 88,\n", + " 86,\n", + " 90,\n", + " 90,\n", + " 84,\n", + " 85,\n", + " 89,\n", + " 88,\n", + " 81,\n", + " 88,\n", + " 89,\n", + " 88,\n", + " 88,\n", + " 96,\n", + " 88,\n", + " 91,\n", + " 86,\n", + " 87,\n", + " 92,\n", + " 91,\n", + " 82,\n", + " 90,\n", + " 94,\n", + " 92,\n", + " 90,\n", + " 84,\n", + " 87,\n", + " 89,\n", + " 90,\n", + " 88,\n", + " 87,\n", + " 86,\n", + " 90,\n", + " 92,\n", + " 88,\n", + " 93,\n", + " 89,\n", + " 84,\n", + " 87,\n", + " 87,\n", + " 91,\n", + " 88,\n", + " 84,\n", + " 89,\n", + " 93,\n", + " 88,\n", + " 92,\n", + " 88,\n", + " 86,\n", + " 92,\n", + " 95,\n", + " 92,\n", + " 86,\n", + " 92,\n", + " 87,\n", + " 85,\n", + " 86,\n", + " 84,\n", + " 92,\n", + " 88,\n", + " 90,\n", + " 89,\n", + " 86,\n", + " 93,\n", + " 86,\n", + " 89,\n", + " 90,\n", + " 88,\n", + " 90,\n", + " 90,\n", + " 90,\n", + " 87,\n", + " 87,\n", + " 87,\n", + " 89,\n", + " 87,\n", + " 87,\n", + " 90,\n", + " 84,\n", + " 92,\n", + " 92,\n", + " 85,\n", + " 85,\n", + " 89,\n", + " 90,\n", + " 87,\n", + " 84,\n", + " 93,\n", + " 90,\n", + " 89,\n", + " 86,\n", + " 87,\n", + " 90,\n", + " 86,\n", + " 88,\n", + " 90,\n", + " 89,\n", + " 94,\n", + " 86,\n", + " 84,\n", + " 90,\n", + " 85,\n", + " 90,\n", + " 88,\n", + " 92,\n", + " 90,\n", + " 87,\n", + " 94,\n", + " 88,\n", + " 89,\n", + " 90,\n", + " 91,\n", + " 88,\n", + " 85,\n", + " 88,\n", + " 89,\n", + " 85,\n", + " 95,\n", + " 90,\n", + " 91,\n", + " 88,\n", + " 92,\n", + " 90,\n", + " 92,\n", + " 91,\n", + " 88,\n", + " 90,\n", + " 86,\n", + " 88,\n", + " 90,\n", + " 87,\n", + " 86,\n", + " 84,\n", + " 85,\n", + " 85,\n", + " 85,\n", + " 87,\n", + " 86,\n", + " 87,\n", + " 85,\n", + " 94,\n", + " 90,\n", + " 84,\n", + " 93,\n", + " 88,\n", + " 85,\n", + " 88,\n", + " 84,\n", + " 85,\n", + " 88,\n", + " 89,\n", + " 88,\n", + " 89,\n", + " 94,\n", + " 90,\n", + " 94,\n", + " 86,\n", + " 90,\n", + " 87,\n", + " 86,\n", + " 91,\n", + " 92,\n", + " 85,\n", + " 90,\n", + " 85,\n", + " 93,\n", + " 91,\n", + " 94,\n", + " 89,\n", + " 85,\n", + " 88,\n", + " 95,\n", + " 88,\n", + " 92,\n", + " 90,\n", + " 90,\n", + " 90,\n", + " 93,\n", + " 90,\n", + " 93,\n", + " 90,\n", + " 91,\n", + " 86,\n", + " 84,\n", + " 86,\n", + " 91,\n", + " 84,\n", + " 86,\n", + " 88,\n", + " 89,\n", + " 88,\n", + " 87,\n", + " 85,\n", + " 94,\n", + " 90,\n", + " 88,\n", + " 85,\n", + " 90,\n", + " 86,\n", + " 88,\n", + " 84,\n", + " 89,\n", + " 87,\n", + " 91,\n", + " 83,\n", + " 90,\n", + " 89,\n", + " 88,\n", + " 85,\n", + " 87,\n", + " 84,\n", + " 89,\n", + " 86,\n", + " 88,\n", + " 91,\n", + " 85,\n", + " 88,\n", + " 90,\n", + " 92,\n", + " 85,\n", + " 89,\n", + " 85,\n", + " 95,\n", + " 90,\n", + " 86,\n", + " 95,\n", + " 90,\n", + " 88,\n", + " 90,\n", + " 95,\n", + " 87,\n", + " 88,\n", + " 84,\n", + " 85,\n", + " 87,\n", + " 84,\n", + " 85,\n", + " 91,\n", + " 90,\n", + " 85,\n", + " 87,\n", + " 88,\n", + " 89,\n", + " 87,\n", + " 94,\n", + " 90,\n", + " 86,\n", + " 86,\n", + " 91,\n", + " 90,\n", + " 89,\n", + " 90,\n", + " 88,\n", + " 88,\n", + " 85,\n", + " 90,\n", + " 93,\n", + " 91,\n", + " 84,\n", + " 85,\n", + " 92,\n", + " 95,\n", + " 89,\n", + " 86,\n", + " 85,\n", + " 90,\n", + " 87,\n", + " 93,\n", + " 86,\n", + " 84,\n", + " 88,\n", + " 85,\n", + " 92,\n", + " 89,\n", + " 95,\n", + " 88,\n", + " 89,\n", + " 91,\n", + " 89,\n", + " 89,\n", + " 86,\n", + " 85,\n", + " 86,\n", + " 87,\n", + " 91,\n", + " 88,\n", + " 91,\n", + " 96,\n", + " 85,\n", + " 93,\n", + " 87,\n", + " 90,\n", + " 91,\n", + " 85,\n", + " 89,\n", + " 90,\n", + " 90,\n", + " 89,\n", + " 86,\n", + " 86,\n", + " 91,\n", + " 84,\n", + " 82,\n", + " 90,\n", + " 88,\n", + " 88,\n", + " 95,\n", + " 92,\n", + " 88,\n", + " 90,\n", + " 94,\n", + " 88,\n", + " 93,\n", + " 84,\n", + " 90,\n", + " 90,\n", + " 91,\n", + " 88,\n", + " 91,\n", + " 93,\n", + " 90,\n", + " 88,\n", + " 85,\n", + " 84,\n", + " 92,\n", + " 87,\n", + " 87,\n", + " 85,\n", + " 90,\n", + " 86,\n", + " 91,\n", + " 88,\n", + " 83,\n", + " 85,\n", + " 92,\n", + " 88,\n", + " 85,\n", + " 88,\n", + " 92,\n", + " 87,\n", + " 95,\n", + " 90,\n", + " 90,\n", + " 84,\n", + " 89,\n", + " 86,\n", + " 85,\n", + " 97,\n", + " 92,\n", + " 89,\n", + " 92,\n", + " 89,\n", + " 89,\n", + " 90,\n", + " 89,\n", + " 91,\n", + " 92,\n", + " 90,\n", + " 86,\n", + " 88,\n", + " 88,\n", + " 88,\n", + " 87,\n", + " 89,\n", + " 89,\n", + " 88,\n", + " 89,\n", + " 90,\n", + " 86,\n", + " 90,\n", + " 91,\n", + " 92,\n", + " 94,\n", + " 87,\n", + " 90,\n", + " 88,\n", + " 88,\n", + " 90,\n", + " 90,\n", + " 82,\n", + " 90,\n", + " 84,\n", + " 93,\n", + " 92,\n", + " 91,\n", + " 87,\n", + " 88,\n", + " 90,\n", + " 90,\n", + " 86,\n", + " 91,\n", + " 89,\n", + " 84,\n", + " 83,\n", + " 89,\n", + " 85,\n", + " 90,\n", + " 90,\n", + " 90,\n", + " 90,\n", + " 84,\n", + " 90,\n", + " 90,\n", + " 84,\n", + " 87,\n", + " 92,\n", + " 87,\n", + " 83,\n", + " 89,\n", + " 87,\n", + " 85,\n", + " 85,\n", + " 90,\n", + " 91,\n", + " 94,\n", + " 91,\n", + " 87,\n", + " 85,\n", + " 85,\n", + " 93,\n", + " 88,\n", + " 85,\n", + " 85,\n", + " 88,\n", + " 88,\n", + " 87,\n", + " 92,\n", + " 90,\n", + " 90,\n", + " 89,\n", + " 88,\n", + " 90,\n", + " 90,\n", + " 84,\n", + " 89,\n", + " 85,\n", + " 90,\n", + " 89,\n", + " 94,\n", + " 85,\n", + " 88,\n", + " 88,\n", + " 90,\n", + " 93,\n", + " 94,\n", + " 88,\n", + " 88,\n", + " 92,\n", + " 91,\n", + " 92,\n", + " 91,\n", + " 87,\n", + " 91,\n", + " 95,\n", + " 88,\n", + " 83,\n", + " 84,\n", + " 91,\n", + " 84,\n", + " 90,\n", + " 87,\n", + " 91,\n", + " 88,\n", + " 85,\n", + " 86,\n", + " 92,\n", + " 87,\n", + " 88,\n", + " 90,\n", + " 86,\n", + " 87,\n", + " 87,\n", + " 88,\n", + " 91,\n", + " 88,\n", + " 89,\n", + " 88,\n", + " 87,\n", + " 91,\n", + " 84,\n", + " 92,\n", + " 86,\n", + " 91,\n", + " 92,\n", + " 93,\n", + " 87,\n", + " 85,\n", + " 86,\n", + " 87,\n", + " 90,\n", + " 86,\n", + " 94,\n", + " 92,\n", + " 84,\n", + " 85,\n", + " 91,\n", + " 89,\n", + " 89,\n", + " 84,\n", + " 90,\n", + " 90,\n", + " 85,\n", + " 90,\n", + " 88,\n", + " 89,\n", + " 94,\n", + " 87,\n", + " 89,\n", + " 90,\n", + " 87,\n", + " 88,\n", + " 89,\n", + " 92,\n", + " 89,\n", + " 91,\n", + " 91,\n", + " 85,\n", + " 85,\n", + " 94,\n", + " 89,\n", + " 86,\n", + " 87,\n", + " 82,\n", + " 89,\n", + " 85,\n", + " 90,\n", + " 88,\n", + " 90,\n", + " 93,\n", + " 87,\n", + " 90,\n", + " 93,\n", + " 87,\n", + " 86,\n", + " 84,\n", + " 91,\n", + " 88,\n", + " 89,\n", + " 87,\n", + " 86,\n", + " 86,\n", + " 85,\n", + " 85,\n", + " 87,\n", + " 88,\n", + " 90,\n", + " 92,\n", + " 88,\n", + " 92,\n", + " 92,\n", + " 84,\n", + " 93,\n", + " 90,\n", + " 85,\n", + " 87,\n", + " 85,\n", + " 84,\n", + " 92,\n", + " 87,\n", + " 87,\n", + " 89,\n", + " 84,\n", + " 88,\n", + " 84,\n", + " 87,\n", + " 87,\n", + " 87,\n", + " 90,\n", + " 87,\n", + " 88,\n", + " 85,\n", + " 86,\n", + " 90,\n", + " 92,\n", + " 87,\n", + " 90,\n", + " 88,\n", + " 90,\n", + " 89,\n", + " 92,\n", + " 85,\n", + " 88,\n", + " 87,\n", + " 88,\n", + " 88,\n", + " 89,\n", + " 94,\n", + " 89,\n", + " 92,\n", + " 85,\n", + " 87,\n", + " 94,\n", + " 92,\n", + " 85,\n", + " 90,\n", + " 89,\n", + " 90,\n", + " 90,\n", + " 87,\n", + " 92,\n", + " 89,\n", + " 90,\n", + " 88,\n", + " 88,\n", + " 87,\n", + " 93,\n", + " 90,\n", + " 87,\n", + " 86,\n", + " 90,\n", + " 93,\n", + " 90,\n", + " 94,\n", + " 84,\n", + " 90,\n", + " 92,\n", + " 88,\n", + " 84,\n", + " 90,\n", + " 85,\n", + " 85,\n", + " 92,\n", + " 91,\n", + " 90,\n", + " 91,\n", + " 91,\n", + " 90,\n", + " 93,\n", + " 86,\n", + " 88,\n", + " 94,\n", + " 90,\n", + " 84,\n", + " 86,\n", + " 88,\n", + " 88,\n", + " 92,\n", + " 93,\n", + " 84,\n", + " 86,\n", + " 89,\n", + " 87,\n", + " 92,\n", + " 90,\n", + " 95,\n", + " 92,\n", + " 92,\n", + " 90,\n", + " 87,\n", + " 93,\n", + " 90,\n", + " 84,\n", + " 86,\n", + " 85,\n", + " 85,\n", + " 87,\n", + " 87,\n", + " 89,\n", + " 93,\n", + " 90,\n", + " 92,\n", + " 92,\n", + " 90,\n", + " 85,\n", + " 87,\n", + " 88,\n", + " 85,\n", + " 89,\n", + " 96,\n", + " 91,\n", + " 88,\n", + " 85,\n", + " 87,\n", + " 86,\n", + " 90,\n", + " 89,\n", + " 99,\n", + " 93,\n", + " 93,\n", + " 87,\n", + " 86,\n", + " 94,\n", + " 91,\n", + " 90,\n", + " 85,\n", + " 90,\n", + " 92,\n", + " 88,\n", + " 90,\n", + " 85,\n", + " 88,\n", + " 91,\n", + " 85,\n", + " 91,\n", + " 90,\n", + " 91,\n", + " 90,\n", + " 89,\n", + " 85,\n", + " 83,\n", + " 91,\n", + " 90,\n", + " 90,\n", + " 93,\n", + " 86,\n", + " 84,\n", + " 87,\n", + " 93,\n", + " 90,\n", + " 92,\n", + " 84,\n", + " 90,\n", + " 88,\n", + " 90,\n", + " 94,\n", + " 94,\n", + " 90,\n", + " 88,\n", + " 87,\n", + " 87,\n", + " 85,\n", + " 86,\n", + " 86,\n", + " 92,\n", + " 85,\n", + " 89,\n", + " 86,\n", + " 87,\n", + " 88,\n", + " 85,\n", + " 89,\n", + " 91,\n", + " 90,\n", + " 84,\n", + " 92,\n", + " 88,\n", + " 92,\n", + " 85,\n", + " 91,\n", + " 84,\n", + " 90,\n", + " 93,\n", + " 92,\n", + " 85,\n", + " 85,\n", + " 88,\n", + " 85,\n", + " 90,\n", + " 91,\n", + " 83,\n", + " 95,\n", + " 87,\n", + " 85,\n", + " 94,\n", + " 91,\n", + " 94,\n", + " 86,\n", + " 85,\n", + " 94,\n", + " 90,\n", + " 89,\n", + " 84,\n", + " 88,\n", + " 89,\n", + " 89,\n", + " 88,\n", + " 90,\n", + " 87,\n", + " 88,\n", + " 87,\n", + " 95,\n", + " 92,\n", + " 87,\n", + " 90,\n", + " 90,\n", + " 92,\n", + " 84,\n", + " 84,\n", + " 83,\n", + " 91,\n", + " 87,\n", + " 92,\n", + " 90,\n", + " 89,\n", + " 88,\n", + " 90,\n", + " 90,\n", + " 86,\n", + " 86,\n", + " 88,\n", + " 85,\n", + " 85,\n", + " 87,\n", + " 85,\n", + " 87,\n", + " 98,\n", + " 90,\n", + " 87,\n", + " 88,\n", + " 82,\n", + " 90,\n", + " 85,\n", + " 90,\n", + " 84,\n", + " 88,\n", + " 88,\n", + " 87,\n", + " 88,\n", + " 88,\n", + " 90,\n", + " 90,\n", + " 89,\n", + " 88,\n", + " 87,\n", + " 84,\n", + " 87,\n", + " 85,\n", + " 92,\n", + " 88,\n", + " 95,\n", + " 88,\n", + " 85,\n", + " 89,\n", + " 87,\n", + " 91,\n", + " 90,\n", + " 88,\n", + " 89,\n", + " 87,\n", + " 90,\n", + " 93,\n", + " 90,\n", + " 89,\n", + " 94,\n", + " 86,\n", + " 87,\n", + " 89,\n", + " 92,\n", + " 90,\n", + " 87,\n", + " 89,\n", + " 84,\n", + " 92,\n", + " 95,\n", + " 93,\n", + " 85,\n", + " 90,\n", + " 83,\n", + " ...]" + ] + }, + "execution_count": 72, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a" + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "metadata": {}, + "outputs": [], + "source": [ + "from keras.layers import SimpleRNN,LSTM" + ] + }, + { + "cell_type": "code", + "execution_count": 75, + "metadata": {}, + "outputs": [], + "source": [ + "model = Sequential()\n", + "model.add(Embedding(vocabulary_size, 100, input_length=input_length))\n", + "model.add(SimpleRNN(units=30, return_sequences=False))\n", + "model.add(Dense(units=21))" + ] + }, + { + "cell_type": "code", + "execution_count": 76, + "metadata": {}, + "outputs": [], + "source": [ + "model.compile(optimizer='adam',\n", + " loss='categorical_crossentropy',\n", + " metrics=['accuracy'])" + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Model: \"sequential_1\"\n", + "_________________________________________________________________\n", + " Layer (type) Output Shape Param # \n", + "=================================================================\n", + " embedding_1 (Embedding) (None, 89, 100) 6397100 \n", + " \n", + " simple_rnn (SimpleRNN) (None, 30) 3930 \n", + " \n", + " dense_3 (Dense) (None, 21) 651 \n", + " \n", + "=================================================================\n", + "Total params: 6401681 (24.42 MB)\n", + "Trainable params: 6401681 (24.42 MB)\n", + "Non-trainable params: 0 (0.00 Byte)\n", + "_________________________________________________________________\n" + ] + } + ], + "source": [ + "model.summary()" + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1/10\n", + "2844/2844 [==============================] - 352s 124ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n", + "Epoch 2/10\n", + "2844/2844 [==============================] - 413s 145ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n", + "Epoch 3/10\n", + "2844/2844 [==============================] - 342s 120ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n", + "Epoch 4/10\n", + "2844/2844 [==============================] - 344s 121ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n", + "Epoch 5/10\n", + "2844/2844 [==============================] - 342s 120ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n", + "Epoch 6/10\n", + "2844/2844 [==============================] - 353s 124ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n", + "Epoch 7/10\n", + "2844/2844 [==============================] - 378s 133ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n", + "Epoch 8/10\n", + "2844/2844 [==============================] - 397s 140ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n", + "Epoch 9/10\n", + "2844/2844 [==============================] - 347s 122ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n", + "Epoch 10/10\n", + "2844/2844 [==============================] - 357s 126ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 81, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model.fit(x_train1,y_train1,epochs=10,validation_data=(x_test, y_test))" + ] + }, + { + "cell_type": "code", + "execution_count": 82, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1219/1219 [==============================] - 1s 889us/step\n" + ] + } + ], + "source": [ + "Y_pred1 = classifier.predict(x_test)\n", + "a1=[]\n", + "for x in Y_pred1:\n", + " a1.append(80 +np.argmax(x))" + ] + }, + { + "cell_type": "code", + "execution_count": 83, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[85,\n", + " 85,\n", + " 90,\n", + " 88,\n", + " 87,\n", + " 86,\n", + " 84,\n", + " 91,\n", + " 91,\n", + " 89,\n", + " 92,\n", + " 92,\n", + " 90,\n", + " 87,\n", + " 86,\n", + " 89,\n", + " 87,\n", + " 99,\n", + " 95,\n", + " 86,\n", + " 90,\n", + " 85,\n", + " 94,\n", + " 93,\n", + " 88,\n", + " 85,\n", + " 88,\n", + " 89,\n", + " 87,\n", + " 90,\n", + " 86,\n", + " 89,\n", + " 85,\n", + " 84,\n", + " 90,\n", + " 85,\n", + " 90,\n", + " 89,\n", + " 85,\n", + " 89,\n", + " 87,\n", + " 86,\n", + " 86,\n", + " 89,\n", + " 84,\n", + " 92,\n", + " 90,\n", + " 92,\n", + " 92,\n", + " 87,\n", + " 85,\n", + " 89,\n", + " 84,\n", + " 90,\n", + " 90,\n", + " 88,\n", + " 86,\n", + " 91,\n", + " 88,\n", + " 93,\n", + " 89,\n", + " 88,\n", + " 85,\n", + " 88,\n", + " 87,\n", + " 90,\n", + " 83,\n", + " 88,\n", + " 91,\n", + " 84,\n", + " 94,\n", + " 87,\n", + " 91,\n", + " 89,\n", + " 88,\n", + " 92,\n", + " 86,\n", + " 92,\n", + " 84,\n", + " 93,\n", + " 89,\n", + " 87,\n", + " 84,\n", + " 89,\n", + " 93,\n", + " 89,\n", + " 88,\n", + " 88,\n", + " 93,\n", + " 94,\n", + " 91,\n", + " 92,\n", + " 93,\n", + " 87,\n", + " 87,\n", + " 82,\n", + " 92,\n", + " 84,\n", + " 92,\n", + " 90,\n", + " 90,\n", + " 93,\n", + " 91,\n", + " 94,\n", + " 93,\n", + " 93,\n", + " 90,\n", + " 90,\n", + " 88,\n", + " 87,\n", + " 83,\n", + " 85,\n", + " 85,\n", + " 92,\n", + " 90,\n", + " 85,\n", + " 93,\n", + " 87,\n", + " 86,\n", + " 89,\n", + " 88,\n", + " 85,\n", + " 92,\n", + " 90,\n", + " 85,\n", + " 94,\n", + " 90,\n", + " 86,\n", + " 91,\n", + " 89,\n", + " 88,\n", + " 88,\n", + " 84,\n", + " 82,\n", + " 85,\n", + " 91,\n", + " 87,\n", + " 88,\n", + " 89,\n", + " 92,\n", + " 89,\n", + " 85,\n", + " 91,\n", + " 88,\n", + " 87,\n", + " 89,\n", + " 88,\n", + " 88,\n", + " 87,\n", + " 95,\n", + " 86,\n", + " 96,\n", + " 88,\n", + " 86,\n", + " 90,\n", + " 90,\n", + " 84,\n", + " 85,\n", + " 89,\n", + " 88,\n", + " 81,\n", + " 88,\n", + " 89,\n", + " 88,\n", + " 88,\n", + " 96,\n", + " 88,\n", + " 91,\n", + " 86,\n", + " 87,\n", + " 92,\n", + " 91,\n", + " 82,\n", + " 90,\n", + " 94,\n", + " 92,\n", + " 90,\n", + " 84,\n", + " 87,\n", + " 89,\n", + " 90,\n", + " 88,\n", + " 87,\n", + " 86,\n", + " 90,\n", + " 92,\n", + " 88,\n", + " 93,\n", + " 89,\n", + " 84,\n", + " 87,\n", + " 87,\n", + " 91,\n", + " 88,\n", + " 84,\n", + " 89,\n", + " 93,\n", + " 88,\n", + " 92,\n", + " 88,\n", + " 86,\n", + " 92,\n", + " 95,\n", + " 92,\n", + " 86,\n", + " 92,\n", + " 87,\n", + " 85,\n", + " 86,\n", + " 84,\n", + " 92,\n", + " 88,\n", + " 90,\n", + " 89,\n", + " 86,\n", + " 93,\n", + " 86,\n", + " 89,\n", + " 90,\n", + " 88,\n", + " 90,\n", + " 90,\n", + " 90,\n", + " 87,\n", + " 87,\n", + " 87,\n", + " 89,\n", + " 87,\n", + " 87,\n", + " 90,\n", + " 84,\n", + " 92,\n", + " 92,\n", + " 85,\n", + " 85,\n", + " 89,\n", + " 90,\n", + " 87,\n", + " 84,\n", + " 93,\n", + " 90,\n", + " 89,\n", + " 86,\n", + " 87,\n", + " 90,\n", + " 86,\n", + " 88,\n", + " 90,\n", + " 89,\n", + " 94,\n", + " 86,\n", + " 84,\n", + " 90,\n", + " 85,\n", + " 90,\n", + " 88,\n", + " 92,\n", + " 90,\n", + " 87,\n", + " 94,\n", + " 88,\n", + " 89,\n", + " 90,\n", + " 91,\n", + " 88,\n", + " 85,\n", + " 88,\n", + " 89,\n", + " 85,\n", + " 95,\n", + " 90,\n", + " 91,\n", + " 88,\n", + " 92,\n", + " 90,\n", + " 92,\n", + " 91,\n", + " 88,\n", + " 90,\n", + " 86,\n", + " 88,\n", + " 90,\n", + " 87,\n", + " 86,\n", + " 84,\n", + " 85,\n", + " 85,\n", + " 85,\n", + " 87,\n", + " 86,\n", + " 87,\n", + " 85,\n", + " 94,\n", + " 90,\n", + " 84,\n", + " 93,\n", + " 88,\n", + " 85,\n", + " 88,\n", + " 84,\n", + " 85,\n", + " 88,\n", + " 89,\n", + " 88,\n", + " 89,\n", + " 94,\n", + " 90,\n", + " 94,\n", + " 86,\n", + " 90,\n", + " 87,\n", + " 86,\n", + " 91,\n", + " 92,\n", + " 85,\n", + " 90,\n", + " 85,\n", + " 93,\n", + " 91,\n", + " 94,\n", + " 89,\n", + " 85,\n", + " 88,\n", + " 95,\n", + " 88,\n", + " 92,\n", + " 90,\n", + " 90,\n", + " 90,\n", + " 93,\n", + " 90,\n", + " 93,\n", + " 90,\n", + " 91,\n", + " 86,\n", + " 84,\n", + " 86,\n", + " 91,\n", + " 84,\n", + " 86,\n", + " 88,\n", + " 89,\n", + " 88,\n", + " 87,\n", + " 85,\n", + " 94,\n", + " 90,\n", + " 88,\n", + " 85,\n", + " 90,\n", + " 86,\n", + " 88,\n", + " 84,\n", + " 89,\n", + " 87,\n", + " 91,\n", + " 83,\n", + " 90,\n", + " 89,\n", + " 88,\n", + " 85,\n", + " 87,\n", + " 84,\n", + " 89,\n", + " 86,\n", + " 88,\n", + " 91,\n", + " 85,\n", + " 88,\n", + " 90,\n", + " 92,\n", + " 85,\n", + " 89,\n", + " 85,\n", + " 95,\n", + " 90,\n", + " 86,\n", + " 95,\n", + " 90,\n", + " 88,\n", + " 90,\n", + " 95,\n", + " 87,\n", + " 88,\n", + " 84,\n", + " 85,\n", + " 87,\n", + " 84,\n", + " 85,\n", + " 91,\n", + " 90,\n", + " 85,\n", + " 87,\n", + " 88,\n", + " 89,\n", + " 87,\n", + " 94,\n", + " 90,\n", + " 86,\n", + " 86,\n", + " 91,\n", + " 90,\n", + " 89,\n", + " 90,\n", + " 88,\n", + " 88,\n", + " 85,\n", + " 90,\n", + " 93,\n", + " 91,\n", + " 84,\n", + " 85,\n", + " 92,\n", + " 95,\n", + " 89,\n", + " 86,\n", + " 85,\n", + " 90,\n", + " 87,\n", + " 93,\n", + " 86,\n", + " 84,\n", + " 88,\n", + " 85,\n", + " 92,\n", + " 89,\n", + " 95,\n", + " 88,\n", + " 89,\n", + " 91,\n", + " 89,\n", + " 89,\n", + " 86,\n", + " 85,\n", + " 86,\n", + " 87,\n", + " 91,\n", + " 88,\n", + " 91,\n", + " 96,\n", + " 85,\n", + " 93,\n", + " 87,\n", + " 90,\n", + " 91,\n", + " 85,\n", + " 89,\n", + " 90,\n", + " 90,\n", + " 89,\n", + " 86,\n", + " 86,\n", + " 91,\n", + " 84,\n", + " 82,\n", + " 90,\n", + " 88,\n", + " 88,\n", + " 95,\n", + " 92,\n", + " 88,\n", + " 90,\n", + " 94,\n", + " 88,\n", + " 93,\n", + " 84,\n", + " 90,\n", + " 90,\n", + " 91,\n", + " 88,\n", + " 91,\n", + " 93,\n", + " 90,\n", + " 88,\n", + " 85,\n", + " 84,\n", + " 92,\n", + " 87,\n", + " 87,\n", + " 85,\n", + " 90,\n", + " 86,\n", + " 91,\n", + " 88,\n", + " 83,\n", + " 85,\n", + " 92,\n", + " 88,\n", + " 85,\n", + " 88,\n", + " 92,\n", + " 87,\n", + " 95,\n", + " 90,\n", + " 90,\n", + " 84,\n", + " 89,\n", + " 86,\n", + " 85,\n", + " 97,\n", + " 92,\n", + " 89,\n", + " 92,\n", + " 89,\n", + " 89,\n", + " 90,\n", + " 89,\n", + " 91,\n", + " 92,\n", + " 90,\n", + " 86,\n", + " 88,\n", + " 88,\n", + " 88,\n", + " 87,\n", + " 89,\n", + " 89,\n", + " 88,\n", + " 89,\n", + " 90,\n", + " 86,\n", + " 90,\n", + " 91,\n", + " 92,\n", + " 94,\n", + " 87,\n", + " 90,\n", + " 88,\n", + " 88,\n", + " 90,\n", + " 90,\n", + " 82,\n", + " 90,\n", + " 84,\n", + " 93,\n", + " 92,\n", + " 91,\n", + " 87,\n", + " 88,\n", + " 90,\n", + " 90,\n", + " 86,\n", + " 91,\n", + " 89,\n", + " 84,\n", + " 83,\n", + " 89,\n", + " 85,\n", + " 90,\n", + " 90,\n", + " 90,\n", + " 90,\n", + " 84,\n", + " 90,\n", + " 90,\n", + " 84,\n", + " 87,\n", + " 92,\n", + " 87,\n", + " 83,\n", + " 89,\n", + " 87,\n", + " 85,\n", + " 85,\n", + " 90,\n", + " 91,\n", + " 94,\n", + " 91,\n", + " 87,\n", + " 85,\n", + " 85,\n", + " 93,\n", + " 88,\n", + " 85,\n", + " 85,\n", + " 88,\n", + " 88,\n", + " 87,\n", + " 92,\n", + " 90,\n", + " 90,\n", + " 89,\n", + " 88,\n", + " 90,\n", + " 90,\n", + " 84,\n", + " 89,\n", + " 85,\n", + " 90,\n", + " 89,\n", + " 94,\n", + " 85,\n", + " 88,\n", + " 88,\n", + " 90,\n", + " 93,\n", + " 94,\n", + " 88,\n", + " 88,\n", + " 92,\n", + " 91,\n", + " 92,\n", + " 91,\n", + " 87,\n", + " 91,\n", + " 95,\n", + " 88,\n", + " 83,\n", + " 84,\n", + " 91,\n", + " 84,\n", + " 90,\n", + " 87,\n", + " 91,\n", + " 88,\n", + " 85,\n", + " 86,\n", + " 92,\n", + " 87,\n", + " 88,\n", + " 90,\n", + " 86,\n", + " 87,\n", + " 87,\n", + " 88,\n", + " 91,\n", + " 88,\n", + " 89,\n", + " 88,\n", + " 87,\n", + " 91,\n", + " 84,\n", + " 92,\n", + " 86,\n", + " 91,\n", + " 92,\n", + " 93,\n", + " 87,\n", + " 85,\n", + " 86,\n", + " 87,\n", + " 90,\n", + " 86,\n", + " 94,\n", + " 92,\n", + " 84,\n", + " 85,\n", + " 91,\n", + " 89,\n", + " 89,\n", + " 84,\n", + " 90,\n", + " 90,\n", + " 85,\n", + " 90,\n", + " 88,\n", + " 89,\n", + " 94,\n", + " 87,\n", + " 89,\n", + " 90,\n", + " 87,\n", + " 88,\n", + " 89,\n", + " 92,\n", + " 89,\n", + " 91,\n", + " 91,\n", + " 85,\n", + " 85,\n", + " 94,\n", + " 89,\n", + " 86,\n", + " 87,\n", + " 82,\n", + " 89,\n", + " 85,\n", + " 90,\n", + " 88,\n", + " 90,\n", + " 93,\n", + " 87,\n", + " 90,\n", + " 93,\n", + " 87,\n", + " 86,\n", + " 84,\n", + " 91,\n", + " 88,\n", + " 89,\n", + " 87,\n", + " 86,\n", + " 86,\n", + " 85,\n", + " 85,\n", + " 87,\n", + " 88,\n", + " 90,\n", + " 92,\n", + " 88,\n", + " 92,\n", + " 92,\n", + " 84,\n", + " 93,\n", + " 90,\n", + " 85,\n", + " 87,\n", + " 85,\n", + " 84,\n", + " 92,\n", + " 87,\n", + " 87,\n", + " 89,\n", + " 84,\n", + " 88,\n", + " 84,\n", + " 87,\n", + " 87,\n", + " 87,\n", + " 90,\n", + " 87,\n", + " 88,\n", + " 85,\n", + " 86,\n", + " 90,\n", + " 92,\n", + " 87,\n", + " 90,\n", + " 88,\n", + " 90,\n", + " 89,\n", + " 92,\n", + " 85,\n", + " 88,\n", + " 87,\n", + " 88,\n", + " 88,\n", + " 89,\n", + " 94,\n", + " 89,\n", + " 92,\n", + " 85,\n", + " 87,\n", + " 94,\n", + " 92,\n", + " 85,\n", + " 90,\n", + " 89,\n", + " 90,\n", + " 90,\n", + " 87,\n", + " 92,\n", + " 89,\n", + " 90,\n", + " 88,\n", + " 88,\n", + " 87,\n", + " 93,\n", + " 90,\n", + " 87,\n", + " 86,\n", + " 90,\n", + " 93,\n", + " 90,\n", + " 94,\n", + " 84,\n", + " 90,\n", + " 92,\n", + " 88,\n", + " 84,\n", + " 90,\n", + " 85,\n", + " 85,\n", + " 92,\n", + " 91,\n", + " 90,\n", + " 91,\n", + " 91,\n", + " 90,\n", + " 93,\n", + " 86,\n", + " 88,\n", + " 94,\n", + " 90,\n", + " 84,\n", + " 86,\n", + " 88,\n", + " 88,\n", + " 92,\n", + " 93,\n", + " 84,\n", + " 86,\n", + " 89,\n", + " 87,\n", + " 92,\n", + " 90,\n", + " 95,\n", + " 92,\n", + " 92,\n", + " 90,\n", + " 87,\n", + " 93,\n", + " 90,\n", + " 84,\n", + " 86,\n", + " 85,\n", + " 85,\n", + " 87,\n", + " 87,\n", + " 89,\n", + " 93,\n", + " 90,\n", + " 92,\n", + " 92,\n", + " 90,\n", + " 85,\n", + " 87,\n", + " 88,\n", + " 85,\n", + " 89,\n", + " 96,\n", + " 91,\n", + " 88,\n", + " 85,\n", + " 87,\n", + " 86,\n", + " 90,\n", + " 89,\n", + " 99,\n", + " 93,\n", + " 93,\n", + " 87,\n", + " 86,\n", + " 94,\n", + " 91,\n", + " 90,\n", + " 85,\n", + " 90,\n", + " 92,\n", + " 88,\n", + " 90,\n", + " 85,\n", + " 88,\n", + " 91,\n", + " 85,\n", + " 91,\n", + " 90,\n", + " 91,\n", + " 90,\n", + " 89,\n", + " 85,\n", + " 83,\n", + " 91,\n", + " 90,\n", + " 90,\n", + " 93,\n", + " 86,\n", + " 84,\n", + " 87,\n", + " 93,\n", + " 90,\n", + " 92,\n", + " 84,\n", + " 90,\n", + " 88,\n", + " 90,\n", + " 94,\n", + " 94,\n", + " 90,\n", + " 88,\n", + " 87,\n", + " 87,\n", + " 85,\n", + " 86,\n", + " 86,\n", + " 92,\n", + " 85,\n", + " 89,\n", + " 86,\n", + " 87,\n", + " 88,\n", + " 85,\n", + " 89,\n", + " 91,\n", + " 90,\n", + " 84,\n", + " 92,\n", + " 88,\n", + " 92,\n", + " 85,\n", + " 91,\n", + " 84,\n", + " 90,\n", + " 93,\n", + " 92,\n", + " 85,\n", + " 85,\n", + " 88,\n", + " 85,\n", + " 90,\n", + " 91,\n", + " 83,\n", + " 95,\n", + " 87,\n", + " 85,\n", + " 94,\n", + " 91,\n", + " 94,\n", + " 86,\n", + " 85,\n", + " 94,\n", + " 90,\n", + " 89,\n", + " 84,\n", + " 88,\n", + " 89,\n", + " 89,\n", + " 88,\n", + " 90,\n", + " 87,\n", + " 88,\n", + " 87,\n", + " 95,\n", + " 92,\n", + " 87,\n", + " 90,\n", + " 90,\n", + " 92,\n", + " 84,\n", + " 84,\n", + " 83,\n", + " 91,\n", + " 87,\n", + " 92,\n", + " 90,\n", + " 89,\n", + " 88,\n", + " 90,\n", + " 90,\n", + " 86,\n", + " 86,\n", + " 88,\n", + " 85,\n", + " 85,\n", + " 87,\n", + " 85,\n", + " 87,\n", + " 98,\n", + " 90,\n", + " 87,\n", + " 88,\n", + " 82,\n", + " 90,\n", + " 85,\n", + " 90,\n", + " 84,\n", + " 88,\n", + " 88,\n", + " 87,\n", + " 88,\n", + " 88,\n", + " 90,\n", + " 90,\n", + " 89,\n", + " 88,\n", + " 87,\n", + " 84,\n", + " 87,\n", + " 85,\n", + " 92,\n", + " 88,\n", + " 95,\n", + " 88,\n", + " 85,\n", + " 89,\n", + " 87,\n", + " 91,\n", + " 90,\n", + " 88,\n", + " 89,\n", + " 87,\n", + " 90,\n", + " 93,\n", + " 90,\n", + " 89,\n", + " 94,\n", + " 86,\n", + " 87,\n", + " 89,\n", + " 92,\n", + " 90,\n", + " 87,\n", + " 89,\n", + " 84,\n", + " 92,\n", + " 95,\n", + " 93,\n", + " 85,\n", + " 90,\n", + " 83,\n", + " ...]" + ] + }, + "execution_count": 83, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#The first model performed better." + ] + }, + { + "cell_type": "code", + "execution_count": 84, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: tensorflow-hub in c:\\users\\ysach\\anaconda3\\lib\\site-packages (0.15.0)\n", + "Requirement already satisfied: numpy>=1.12.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-hub) (1.23.5)\n", + "Requirement already satisfied: protobuf>=3.19.6 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-hub) (3.20.3)\n" + ] + } + ], + "source": [ + "!pip install tensorflow-hub" + ] + }, + { + "cell_type": "code", + "execution_count": 100, + "metadata": {}, + "outputs": [], + "source": [ + "def wine_quality_det(news):\n", + " review = news\n", + " review = re.sub(r'[^a-zA-Z\\s]', '', review)\n", + " review = review.lower()\n", + " review = nltk.word_tokenize(review)\n", + " for y in review :\n", + " if y not in stpwrds :\n", + " corpus.append(lemmatizer.lemmatize(y)) \n", + " input_data = [' '.join(corpus)]\n", + " vectorized_input_data_pre = tokenize.texts_to_sequences(input_data)\n", + " vectorized_input_data=pad_sequences(vectorized_input_data_pre, padding=\"pre\", truncating=\"pre\")\n", + " prediction = classifier.predict(vectorized_input_data)\n", + " print(80 +np.argmax(prediction))" + ] + }, + { + "cell_type": "code", + "execution_count": 104, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1/1 [==============================] - 0s 23ms/step\n", + "89\n" + ] + } + ], + "source": [ + "wine_quality_det(\"u touch riesling accentuates fresh citrusy backbone cabernet sauvignon ro dry style sprightly lightfooted tone offer load concentrated cherry berry flavor finish brisk clean dry new york osprey dominion dry ro north fork long island ro osprey dominion\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/Wine Reviews Classification/README.md b/Wine Reviews Classification/README.md new file mode 100644 index 000000000..27be8d30b --- /dev/null +++ b/Wine Reviews Classification/README.md @@ -0,0 +1,101 @@ +# Wine Reviews Classification using DL + +## PROJECT TITLE + +Wine Reviews Classification using Deep Learning + +## GOAL + +To classify the quality of wine based on reviews. + +## DATASET + +The link for the dataset used in this project: https://www.kaggle.com/datasets/zynicide/wine-reviews + +## EDA +Shape of Dataset:(129971, 14) +![Dataset](Images/Input_Dataset.png) +![EDA](Images/EDA3.png) + +## DESCRIPTION + +This project aims to identify the quality points of wine based upon its reviews. + +## WHAT I HAD DONE + +1. Data collection: From the link of the dataset given above. +2. Data preprocessing: Preprocessed the news by combining title and text to create a new feature and did some augementation like tokeinizing and vectorising before passing them to model training +3. Model selection: Self Designed model having a Embedding Layer followed by Global Pooling Layer and then 2 Dense layers and then output layer.Second model had a Embedding layer followed by a RNN layer and a Dense output layer. +4. Comparative analysis: Compared the accuracy score of all the models. + +## MODELS SUMMARY + +Model: "sequential" +_________________________________________________________________ + Layer (type) Output Shape Param # +================================================================= + embedding (Embedding) (None, 89, 200) 12794200 + + global_average_pooling1d ( (None, 200) 0 + GlobalAveragePooling1D) + + dense (Dense) (None, 100) 20100 + + dense_1 (Dense) (None, 50) 5050 + + dense_2 (Dense) (None, 21) 1071 + +================================================================= +Total params: 12820421 (48.91 MB) +Trainable params: 12820421 (48.91 MB) +Non-trainable params: 0 (0.00 Byte) + +Model-2: "sequential_1" +_________________________________________________________________ + Layer (type) Output Shape Param # +================================================================= + embedding_1 (Embedding) (None, 89, 100) 6397100 + + simple_rnn (SimpleRNN) (None, 30) 3930 + + dense_3 (Dense) (None, 21) 651 + +================================================================= +Total params: 6401681 (24.42 MB) +Trainable params: 6401681 (24.42 MB) +Non-trainable params: 0 (0.00 Byte) + +## LIBRARIES NEEDED + +The following libraries are required to run this project: + +- nltk +- pandas +- matplotlib +- tensorflow +- keras +- sklearn + +## EVALUATION METRICS + +The evaluation metrics I used to assess the models: + +- Loss +- Accuracy + +It is shown using Confusion Matrix in the Images folder + +## RESULTS +Results on Val dataset: +For Model-1: +Accuracy:31% +loss: 3.1 + +For Model-2: +Accuracy:9% +loss:8.05 + +## CONCLUSION +Based on results we can draw following conclusions: + +1.The model-1 performed better than model 2. \ No newline at end of file