diff --git a/Wine Reviews Classification/Dataset/README.md b/Wine Reviews Classification/Dataset/README.md
new file mode 100644
index 000000000..27be8d30b
--- /dev/null
+++ b/Wine Reviews Classification/Dataset/README.md
@@ -0,0 +1,101 @@
+# Wine Reviews Classification using DL
+
+## PROJECT TITLE
+
+Wine Reviews Classification using Deep Learning
+
+## GOAL
+
+To classify the quality of wine based on reviews.
+
+## DATASET
+
+The link for the dataset used in this project: https://www.kaggle.com/datasets/zynicide/wine-reviews
+
+## EDA
+Shape of Dataset:(129971, 14)
+
+
+
+## DESCRIPTION
+
+This project aims to identify the quality points of wine based upon its reviews.
+
+## WHAT I HAD DONE
+
+1. Data collection: From the link of the dataset given above.
+2. Data preprocessing: Preprocessed the news by combining title and text to create a new feature and did some augementation like tokeinizing and vectorising before passing them to model training
+3. Model selection: Self Designed model having a Embedding Layer followed by Global Pooling Layer and then 2 Dense layers and then output layer.Second model had a Embedding layer followed by a RNN layer and a Dense output layer.
+4. Comparative analysis: Compared the accuracy score of all the models.
+
+## MODELS SUMMARY
+
+Model: "sequential"
+_________________________________________________________________
+ Layer (type) Output Shape Param #
+=================================================================
+ embedding (Embedding) (None, 89, 200) 12794200
+
+ global_average_pooling1d ( (None, 200) 0
+ GlobalAveragePooling1D)
+
+ dense (Dense) (None, 100) 20100
+
+ dense_1 (Dense) (None, 50) 5050
+
+ dense_2 (Dense) (None, 21) 1071
+
+=================================================================
+Total params: 12820421 (48.91 MB)
+Trainable params: 12820421 (48.91 MB)
+Non-trainable params: 0 (0.00 Byte)
+
+Model-2: "sequential_1"
+_________________________________________________________________
+ Layer (type) Output Shape Param #
+=================================================================
+ embedding_1 (Embedding) (None, 89, 100) 6397100
+
+ simple_rnn (SimpleRNN) (None, 30) 3930
+
+ dense_3 (Dense) (None, 21) 651
+
+=================================================================
+Total params: 6401681 (24.42 MB)
+Trainable params: 6401681 (24.42 MB)
+Non-trainable params: 0 (0.00 Byte)
+
+## LIBRARIES NEEDED
+
+The following libraries are required to run this project:
+
+- nltk
+- pandas
+- matplotlib
+- tensorflow
+- keras
+- sklearn
+
+## EVALUATION METRICS
+
+The evaluation metrics I used to assess the models:
+
+- Loss
+- Accuracy
+
+It is shown using Confusion Matrix in the Images folder
+
+## RESULTS
+Results on Val dataset:
+For Model-1:
+Accuracy:31%
+loss: 3.1
+
+For Model-2:
+Accuracy:9%
+loss:8.05
+
+## CONCLUSION
+Based on results we can draw following conclusions:
+
+1.The model-1 performed better than model 2.
\ No newline at end of file
diff --git a/Wine Reviews Classification/Images/Accuracy_Model1.png b/Wine Reviews Classification/Images/Accuracy_Model1.png
new file mode 100644
index 000000000..2fa79a13a
Binary files /dev/null and b/Wine Reviews Classification/Images/Accuracy_Model1.png differ
diff --git a/Wine Reviews Classification/Images/Accuracy_Model2.png b/Wine Reviews Classification/Images/Accuracy_Model2.png
new file mode 100644
index 000000000..50496e6e1
Binary files /dev/null and b/Wine Reviews Classification/Images/Accuracy_Model2.png differ
diff --git a/Wine Reviews Classification/Images/EDA1.png b/Wine Reviews Classification/Images/EDA1.png
new file mode 100644
index 000000000..753254cdc
Binary files /dev/null and b/Wine Reviews Classification/Images/EDA1.png differ
diff --git a/Wine Reviews Classification/Images/EDA2.png b/Wine Reviews Classification/Images/EDA2.png
new file mode 100644
index 000000000..e4888cd01
Binary files /dev/null and b/Wine Reviews Classification/Images/EDA2.png differ
diff --git a/Wine Reviews Classification/Images/EDA3.png b/Wine Reviews Classification/Images/EDA3.png
new file mode 100644
index 000000000..422ee5db7
Binary files /dev/null and b/Wine Reviews Classification/Images/EDA3.png differ
diff --git a/Wine Reviews Classification/Images/Input_Dataset.png b/Wine Reviews Classification/Images/Input_Dataset.png
new file mode 100644
index 000000000..ddf45a08b
Binary files /dev/null and b/Wine Reviews Classification/Images/Input_Dataset.png differ
diff --git a/Wine Reviews Classification/Images/Model1.png b/Wine Reviews Classification/Images/Model1.png
new file mode 100644
index 000000000..5cfbd5ad7
Binary files /dev/null and b/Wine Reviews Classification/Images/Model1.png differ
diff --git a/Wine Reviews Classification/Images/Model2.png b/Wine Reviews Classification/Images/Model2.png
new file mode 100644
index 000000000..b5ec5c4ad
Binary files /dev/null and b/Wine Reviews Classification/Images/Model2.png differ
diff --git a/Wine Reviews Classification/Model/PridictionModel.ipynb b/Wine Reviews Classification/Model/PridictionModel.ipynb
new file mode 100644
index 000000000..1538ecea1
--- /dev/null
+++ b/Wine Reviews Classification/Model/PridictionModel.ipynb
@@ -0,0 +1,6190 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T06:50:29.636394Z",
+ "iopub.status.busy": "2021-05-25T06:50:29.636041Z",
+ "iopub.status.idle": "2021-05-25T06:50:29.643277Z",
+ "shell.execute_reply": "2021-05-25T06:50:29.642127Z",
+ "shell.execute_reply.started": "2021-05-25T06:50:29.636365Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "import sklearn\n",
+ "import itertools\n",
+ "import numpy as np\n",
+ "import seaborn as sb\n",
+ "import re\n",
+ "import nltk\n",
+ "import pickle\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+ "from sklearn.metrics import accuracy_score\n",
+ "from sklearn.metrics import confusion_matrix\n",
+ "from matplotlib import pyplot as plt\n",
+ "from sklearn.linear_model import PassiveAggressiveClassifier,LogisticRegression\n",
+ "from nltk.stem import WordNetLemmatizer\n",
+ "from nltk.corpus import stopwords"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T06:50:29.656569Z",
+ "iopub.status.busy": "2021-05-25T06:50:29.656203Z",
+ "iopub.status.idle": "2021-05-25T06:50:32.048864Z",
+ "shell.execute_reply": "2021-05-25T06:50:32.047882Z",
+ "shell.execute_reply.started": "2021-05-25T06:50:29.65654Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "train_df = pd.read_csv('train.csv')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T06:50:32.05136Z",
+ "iopub.status.busy": "2021-05-25T06:50:32.051032Z",
+ "iopub.status.idle": "2021-05-25T06:50:32.089516Z",
+ "shell.execute_reply": "2021-05-25T06:50:32.088399Z",
+ "shell.execute_reply.started": "2021-05-25T06:50:32.051329Z"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Unnamed: 0 | \n",
+ " country | \n",
+ " description | \n",
+ " designation | \n",
+ " points | \n",
+ " price | \n",
+ " province | \n",
+ " region_1 | \n",
+ " region_2 | \n",
+ " taster_name | \n",
+ " taster_twitter_handle | \n",
+ " title | \n",
+ " variety | \n",
+ " winery | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 0 | \n",
+ " Italy | \n",
+ " Aromas include tropical fruit, broom, brimston... | \n",
+ " Vulkà Bianco | \n",
+ " 87 | \n",
+ " NaN | \n",
+ " Sicily & Sardinia | \n",
+ " Etna | \n",
+ " NaN | \n",
+ " Kerin O’Keefe | \n",
+ " @kerinokeefe | \n",
+ " Nicosia 2013 Vulkà Bianco (Etna) | \n",
+ " White Blend | \n",
+ " Nicosia | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 1 | \n",
+ " Portugal | \n",
+ " This is ripe and fruity, a wine that is smooth... | \n",
+ " Avidagos | \n",
+ " 87 | \n",
+ " 15.0 | \n",
+ " Douro | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " Roger Voss | \n",
+ " @vossroger | \n",
+ " Quinta dos Avidagos 2011 Avidagos Red (Douro) | \n",
+ " Portuguese Red | \n",
+ " Quinta dos Avidagos | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 2 | \n",
+ " US | \n",
+ " Tart and snappy, the flavors of lime flesh and... | \n",
+ " NaN | \n",
+ " 87 | \n",
+ " 14.0 | \n",
+ " Oregon | \n",
+ " Willamette Valley | \n",
+ " Willamette Valley | \n",
+ " Paul Gregutt | \n",
+ " @paulgwine | \n",
+ " Rainstorm 2013 Pinot Gris (Willamette Valley) | \n",
+ " Pinot Gris | \n",
+ " Rainstorm | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 3 | \n",
+ " US | \n",
+ " Pineapple rind, lemon pith and orange blossom ... | \n",
+ " Reserve Late Harvest | \n",
+ " 87 | \n",
+ " 13.0 | \n",
+ " Michigan | \n",
+ " Lake Michigan Shore | \n",
+ " NaN | \n",
+ " Alexander Peartree | \n",
+ " NaN | \n",
+ " St. Julian 2013 Reserve Late Harvest Riesling ... | \n",
+ " Riesling | \n",
+ " St. Julian | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 4 | \n",
+ " US | \n",
+ " Much like the regular bottling from 2012, this... | \n",
+ " Vintner's Reserve Wild Child Block | \n",
+ " 87 | \n",
+ " 65.0 | \n",
+ " Oregon | \n",
+ " Willamette Valley | \n",
+ " Willamette Valley | \n",
+ " Paul Gregutt | \n",
+ " @paulgwine | \n",
+ " Sweet Cheeks 2012 Vintner's Reserve Wild Child... | \n",
+ " Pinot Noir | \n",
+ " Sweet Cheeks | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " 5 | \n",
+ " Spain | \n",
+ " Blackberry and raspberry aromas show a typical... | \n",
+ " Ars In Vitro | \n",
+ " 87 | \n",
+ " 15.0 | \n",
+ " Northern Spain | \n",
+ " Navarra | \n",
+ " NaN | \n",
+ " Michael Schachner | \n",
+ " @wineschach | \n",
+ " Tandem 2011 Ars In Vitro Tempranillo-Merlot (N... | \n",
+ " Tempranillo-Merlot | \n",
+ " Tandem | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " 6 | \n",
+ " Italy | \n",
+ " Here's a bright, informal red that opens with ... | \n",
+ " Belsito | \n",
+ " 87 | \n",
+ " 16.0 | \n",
+ " Sicily & Sardinia | \n",
+ " Vittoria | \n",
+ " NaN | \n",
+ " Kerin O’Keefe | \n",
+ " @kerinokeefe | \n",
+ " Terre di Giurfo 2013 Belsito Frappato (Vittoria) | \n",
+ " Frappato | \n",
+ " Terre di Giurfo | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " 7 | \n",
+ " France | \n",
+ " This dry and restrained wine offers spice in p... | \n",
+ " NaN | \n",
+ " 87 | \n",
+ " 24.0 | \n",
+ " Alsace | \n",
+ " Alsace | \n",
+ " NaN | \n",
+ " Roger Voss | \n",
+ " @vossroger | \n",
+ " Trimbach 2012 Gewurztraminer (Alsace) | \n",
+ " Gewürztraminer | \n",
+ " Trimbach | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " 8 | \n",
+ " Germany | \n",
+ " Savory dried thyme notes accent sunnier flavor... | \n",
+ " Shine | \n",
+ " 87 | \n",
+ " 12.0 | \n",
+ " Rheinhessen | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " Anna Lee C. Iijima | \n",
+ " NaN | \n",
+ " Heinz Eifel 2013 Shine Gewürztraminer (Rheinhe... | \n",
+ " Gewürztraminer | \n",
+ " Heinz Eifel | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " 9 | \n",
+ " France | \n",
+ " This has great depth of flavor with its fresh ... | \n",
+ " Les Natures | \n",
+ " 87 | \n",
+ " 27.0 | \n",
+ " Alsace | \n",
+ " Alsace | \n",
+ " NaN | \n",
+ " Roger Voss | \n",
+ " @vossroger | \n",
+ " Jean-Baptiste Adam 2012 Les Natures Pinot Gris... | \n",
+ " Pinot Gris | \n",
+ " Jean-Baptiste Adam | \n",
+ "
\n",
+ " \n",
+ " 10 | \n",
+ " 10 | \n",
+ " US | \n",
+ " Soft, supple plum envelopes an oaky structure ... | \n",
+ " Mountain Cuvée | \n",
+ " 87 | \n",
+ " 19.0 | \n",
+ " California | \n",
+ " Napa Valley | \n",
+ " Napa | \n",
+ " Virginie Boone | \n",
+ " @vboone | \n",
+ " Kirkland Signature 2011 Mountain Cuvée Caberne... | \n",
+ " Cabernet Sauvignon | \n",
+ " Kirkland Signature | \n",
+ "
\n",
+ " \n",
+ " 11 | \n",
+ " 11 | \n",
+ " France | \n",
+ " This is a dry wine, very spicy, with a tight, ... | \n",
+ " NaN | \n",
+ " 87 | \n",
+ " 30.0 | \n",
+ " Alsace | \n",
+ " Alsace | \n",
+ " NaN | \n",
+ " Roger Voss | \n",
+ " @vossroger | \n",
+ " Leon Beyer 2012 Gewurztraminer (Alsace) | \n",
+ " Gewürztraminer | \n",
+ " Leon Beyer | \n",
+ "
\n",
+ " \n",
+ " 12 | \n",
+ " 12 | \n",
+ " US | \n",
+ " Slightly reduced, this wine offers a chalky, t... | \n",
+ " NaN | \n",
+ " 87 | \n",
+ " 34.0 | \n",
+ " California | \n",
+ " Alexander Valley | \n",
+ " Sonoma | \n",
+ " Virginie Boone | \n",
+ " @vboone | \n",
+ " Louis M. Martini 2012 Cabernet Sauvignon (Alex... | \n",
+ " Cabernet Sauvignon | \n",
+ " Louis M. Martini | \n",
+ "
\n",
+ " \n",
+ " 13 | \n",
+ " 13 | \n",
+ " Italy | \n",
+ " This is dominated by oak and oak-driven aromas... | \n",
+ " Rosso | \n",
+ " 87 | \n",
+ " NaN | \n",
+ " Sicily & Sardinia | \n",
+ " Etna | \n",
+ " NaN | \n",
+ " Kerin O’Keefe | \n",
+ " @kerinokeefe | \n",
+ " Masseria Setteporte 2012 Rosso (Etna) | \n",
+ " Nerello Mascalese | \n",
+ " Masseria Setteporte | \n",
+ "
\n",
+ " \n",
+ " 14 | \n",
+ " 14 | \n",
+ " US | \n",
+ " Building on 150 years and six generations of w... | \n",
+ " NaN | \n",
+ " 87 | \n",
+ " 12.0 | \n",
+ " California | \n",
+ " Central Coast | \n",
+ " Central Coast | \n",
+ " Matt Kettmann | \n",
+ " @mattkettmann | \n",
+ " Mirassou 2012 Chardonnay (Central Coast) | \n",
+ " Chardonnay | \n",
+ " Mirassou | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Unnamed: 0 country description \\\n",
+ "0 0 Italy Aromas include tropical fruit, broom, brimston... \n",
+ "1 1 Portugal This is ripe and fruity, a wine that is smooth... \n",
+ "2 2 US Tart and snappy, the flavors of lime flesh and... \n",
+ "3 3 US Pineapple rind, lemon pith and orange blossom ... \n",
+ "4 4 US Much like the regular bottling from 2012, this... \n",
+ "5 5 Spain Blackberry and raspberry aromas show a typical... \n",
+ "6 6 Italy Here's a bright, informal red that opens with ... \n",
+ "7 7 France This dry and restrained wine offers spice in p... \n",
+ "8 8 Germany Savory dried thyme notes accent sunnier flavor... \n",
+ "9 9 France This has great depth of flavor with its fresh ... \n",
+ "10 10 US Soft, supple plum envelopes an oaky structure ... \n",
+ "11 11 France This is a dry wine, very spicy, with a tight, ... \n",
+ "12 12 US Slightly reduced, this wine offers a chalky, t... \n",
+ "13 13 Italy This is dominated by oak and oak-driven aromas... \n",
+ "14 14 US Building on 150 years and six generations of w... \n",
+ "\n",
+ " designation points price province \\\n",
+ "0 Vulkà Bianco 87 NaN Sicily & Sardinia \n",
+ "1 Avidagos 87 15.0 Douro \n",
+ "2 NaN 87 14.0 Oregon \n",
+ "3 Reserve Late Harvest 87 13.0 Michigan \n",
+ "4 Vintner's Reserve Wild Child Block 87 65.0 Oregon \n",
+ "5 Ars In Vitro 87 15.0 Northern Spain \n",
+ "6 Belsito 87 16.0 Sicily & Sardinia \n",
+ "7 NaN 87 24.0 Alsace \n",
+ "8 Shine 87 12.0 Rheinhessen \n",
+ "9 Les Natures 87 27.0 Alsace \n",
+ "10 Mountain Cuvée 87 19.0 California \n",
+ "11 NaN 87 30.0 Alsace \n",
+ "12 NaN 87 34.0 California \n",
+ "13 Rosso 87 NaN Sicily & Sardinia \n",
+ "14 NaN 87 12.0 California \n",
+ "\n",
+ " region_1 region_2 taster_name \\\n",
+ "0 Etna NaN Kerin O’Keefe \n",
+ "1 NaN NaN Roger Voss \n",
+ "2 Willamette Valley Willamette Valley Paul Gregutt \n",
+ "3 Lake Michigan Shore NaN Alexander Peartree \n",
+ "4 Willamette Valley Willamette Valley Paul Gregutt \n",
+ "5 Navarra NaN Michael Schachner \n",
+ "6 Vittoria NaN Kerin O’Keefe \n",
+ "7 Alsace NaN Roger Voss \n",
+ "8 NaN NaN Anna Lee C. Iijima \n",
+ "9 Alsace NaN Roger Voss \n",
+ "10 Napa Valley Napa Virginie Boone \n",
+ "11 Alsace NaN Roger Voss \n",
+ "12 Alexander Valley Sonoma Virginie Boone \n",
+ "13 Etna NaN Kerin O’Keefe \n",
+ "14 Central Coast Central Coast Matt Kettmann \n",
+ "\n",
+ " taster_twitter_handle title \\\n",
+ "0 @kerinokeefe Nicosia 2013 Vulkà Bianco (Etna) \n",
+ "1 @vossroger Quinta dos Avidagos 2011 Avidagos Red (Douro) \n",
+ "2 @paulgwine Rainstorm 2013 Pinot Gris (Willamette Valley) \n",
+ "3 NaN St. Julian 2013 Reserve Late Harvest Riesling ... \n",
+ "4 @paulgwine Sweet Cheeks 2012 Vintner's Reserve Wild Child... \n",
+ "5 @wineschach Tandem 2011 Ars In Vitro Tempranillo-Merlot (N... \n",
+ "6 @kerinokeefe Terre di Giurfo 2013 Belsito Frappato (Vittoria) \n",
+ "7 @vossroger Trimbach 2012 Gewurztraminer (Alsace) \n",
+ "8 NaN Heinz Eifel 2013 Shine Gewürztraminer (Rheinhe... \n",
+ "9 @vossroger Jean-Baptiste Adam 2012 Les Natures Pinot Gris... \n",
+ "10 @vboone Kirkland Signature 2011 Mountain Cuvée Caberne... \n",
+ "11 @vossroger Leon Beyer 2012 Gewurztraminer (Alsace) \n",
+ "12 @vboone Louis M. Martini 2012 Cabernet Sauvignon (Alex... \n",
+ "13 @kerinokeefe Masseria Setteporte 2012 Rosso (Etna) \n",
+ "14 @mattkettmann Mirassou 2012 Chardonnay (Central Coast) \n",
+ "\n",
+ " variety winery \n",
+ "0 White Blend Nicosia \n",
+ "1 Portuguese Red Quinta dos Avidagos \n",
+ "2 Pinot Gris Rainstorm \n",
+ "3 Riesling St. Julian \n",
+ "4 Pinot Noir Sweet Cheeks \n",
+ "5 Tempranillo-Merlot Tandem \n",
+ "6 Frappato Terre di Giurfo \n",
+ "7 Gewürztraminer Trimbach \n",
+ "8 Gewürztraminer Heinz Eifel \n",
+ "9 Pinot Gris Jean-Baptiste Adam \n",
+ "10 Cabernet Sauvignon Kirkland Signature \n",
+ "11 Gewürztraminer Leon Beyer \n",
+ "12 Cabernet Sauvignon Louis M. Martini \n",
+ "13 Nerello Mascalese Masseria Setteporte \n",
+ "14 Chardonnay Mirassou "
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "train_df.head(15)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T06:50:32.10674Z",
+ "iopub.status.busy": "2021-05-25T06:50:32.106434Z",
+ "iopub.status.idle": "2021-05-25T06:50:32.120541Z",
+ "shell.execute_reply": "2021-05-25T06:50:32.119386Z",
+ "shell.execute_reply.started": "2021-05-25T06:50:32.106712Z"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(129971, 14)"
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "train_df.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T06:50:32.124489Z",
+ "iopub.status.busy": "2021-05-25T06:50:32.12414Z",
+ "iopub.status.idle": "2021-05-25T06:50:32.140229Z",
+ "shell.execute_reply": "2021-05-25T06:50:32.139288Z",
+ "shell.execute_reply.started": "2021-05-25T06:50:32.124461Z"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "RangeIndex: 129971 entries, 0 to 129970\n",
+ "Data columns (total 14 columns):\n",
+ " # Column Non-Null Count Dtype \n",
+ "--- ------ -------------- ----- \n",
+ " 0 Unnamed: 0 129971 non-null int64 \n",
+ " 1 country 129908 non-null object \n",
+ " 2 description 129971 non-null object \n",
+ " 3 designation 92506 non-null object \n",
+ " 4 points 129971 non-null int64 \n",
+ " 5 price 120975 non-null float64\n",
+ " 6 province 129908 non-null object \n",
+ " 7 region_1 108724 non-null object \n",
+ " 8 region_2 50511 non-null object \n",
+ " 9 taster_name 103727 non-null object \n",
+ " 10 taster_twitter_handle 98758 non-null object \n",
+ " 11 title 129971 non-null object \n",
+ " 12 variety 129970 non-null object \n",
+ " 13 winery 129971 non-null object \n",
+ "dtypes: float64(1), int64(2), object(11)\n",
+ "memory usage: 13.9+ MB\n"
+ ]
+ }
+ ],
+ "source": [
+ "train_df.info()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Quality No of sample\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "88 17207\n",
+ "87 16933\n",
+ "90 15410\n",
+ "86 12600\n",
+ "89 12226\n",
+ "91 11359\n",
+ "92 9613\n",
+ "85 9530\n",
+ "93 6489\n",
+ "84 6480\n",
+ "94 3758\n",
+ "83 3025\n",
+ "82 1836\n",
+ "95 1535\n",
+ "81 692\n",
+ "96 523\n",
+ "80 397\n",
+ "97 229\n",
+ "98 77\n",
+ "99 33\n",
+ "100 19\n",
+ "Name: points, dtype: int64"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# def create_distribution(dataFile):\n",
+ "# return sb.countplot(x='label', data=dataFile, palette='hls')\n",
+ "\n",
+ "# #by calling below we can see that training, test and valid data seems to be failry evenly distributed between the classes\n",
+ "# create_distribution(train_df)\n",
+ "print(\"Quality\",end=' ')\n",
+ "print(\"No of sample\")\n",
+ "train_df['points'].value_counts()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_df=train_df.drop(['region_2'],axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T06:50:32.306146Z",
+ "iopub.status.busy": "2021-05-25T06:50:32.305826Z",
+ "iopub.status.idle": "2021-05-25T06:50:32.335357Z",
+ "shell.execute_reply": "2021-05-25T06:50:32.33417Z",
+ "shell.execute_reply.started": "2021-05-25T06:50:32.306118Z"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\u001b[1mCOLUMN\u001b[0m \u001b[1mNULL VALUES COUNT\u001b[0m\n",
+ "Unnamed: 0 0\n",
+ "country 63\n",
+ "description 0\n",
+ "designation 37465\n",
+ "points 0\n",
+ "price 8996\n",
+ "province 63\n",
+ "region_1 21247\n",
+ "taster_name 26244\n",
+ "taster_twitter_handle 31213\n",
+ "title 0\n",
+ "variety 1\n",
+ "winery 0\n"
+ ]
+ }
+ ],
+ "source": [
+ "def data_qualityCheck():\n",
+ " print(\"{:{}}\".format(\"\\033[1mCOLUMN\\033[0m\",38),end='')\n",
+ " print(\"{:{}}\".format(\"\\033[1mNULL VALUES COUNT\\033[0m\",18))\n",
+ " for x in train_df.columns:\n",
+ " print(\"{:{}}\".format(x,34),end='')\n",
+ " print(train_df[x].isnull().sum())\n",
+ "\n",
+ " \n",
+ "data_qualityCheck()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\u001b[1mCOLUMN\u001b[0m \u001b[1mUNIQUE VALUES COUNT\u001b[0m\n",
+ "Unnamed: 0 129971\n",
+ "country 44\n",
+ "description 119955\n",
+ "designation 37980\n",
+ "points 21\n",
+ "price 391\n",
+ "province 426\n",
+ "region_1 1230\n",
+ "taster_name 20\n",
+ "taster_twitter_handle 16\n",
+ "title 118840\n",
+ "variety 708\n",
+ "winery 16757\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(\"{:{}}\".format(\"\\033[1mCOLUMN\\033[0m\",38),end='')\n",
+ "print(\"{:{}}\".format(\"\\033[1mUNIQUE VALUES COUNT\\033[0m\",18))\n",
+ "for x in train_df.columns:\n",
+ " print(\"{:{}}\".format(x,34),end='')\n",
+ " print(len(train_df[x].unique()))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T06:50:32.337061Z",
+ "iopub.status.busy": "2021-05-25T06:50:32.336735Z",
+ "iopub.status.idle": "2021-05-25T06:50:32.367948Z",
+ "shell.execute_reply": "2021-05-25T06:50:32.366933Z",
+ "shell.execute_reply.started": "2021-05-25T06:50:32.33703Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "train_df=train_df.drop([\"region_1\", \"taster_twitter_handle\",\"taster_name\"], axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\u001b[1mCOLUMN\u001b[0m \u001b[1mNULL VALUES COUNT\u001b[0m\n",
+ "Unnamed: 0 0\n",
+ "country 63\n",
+ "description 0\n",
+ "designation 37465\n",
+ "points 0\n",
+ "price 8996\n",
+ "province 63\n",
+ "title 0\n",
+ "variety 1\n",
+ "winery 0\n"
+ ]
+ }
+ ],
+ "source": [
+ "data_qualityCheck()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def fill_data(data):\n",
+ " data[\"country\"] = data[\"country\"].fillna(\"No Country\")\n",
+ " data[\"designation\"] = data[\"designation\"].fillna(\"No Designation\")\n",
+ " data[\"price\"]=data[\"price\"].fillna(0)\n",
+ " data[\"province\"]=data[\"province\"].fillna(\"No Province\")\n",
+ " data[\"variety\"]=data[\"variety\"].fillna(\"No variety\")\n",
+ " return data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_df=fill_data(train_df)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T06:50:32.401314Z",
+ "iopub.status.busy": "2021-05-25T06:50:32.400868Z",
+ "iopub.status.idle": "2021-05-25T06:50:32.407806Z",
+ "shell.execute_reply": "2021-05-25T06:50:32.406589Z",
+ "shell.execute_reply.started": "2021-05-25T06:50:32.401272Z"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(129971, 10)"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "train_df.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_df=train_df.drop([\"Unnamed: 0\"],axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T06:50:32.409912Z",
+ "iopub.status.busy": "2021-05-25T06:50:32.409162Z",
+ "iopub.status.idle": "2021-05-25T06:50:32.426843Z",
+ "shell.execute_reply": "2021-05-25T06:50:32.425727Z",
+ "shell.execute_reply.started": "2021-05-25T06:50:32.409868Z"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " country | \n",
+ " description | \n",
+ " designation | \n",
+ " points | \n",
+ " price | \n",
+ " province | \n",
+ " title | \n",
+ " variety | \n",
+ " winery | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " Italy | \n",
+ " Aromas include tropical fruit, broom, brimston... | \n",
+ " Vulkà Bianco | \n",
+ " 87 | \n",
+ " 0.0 | \n",
+ " Sicily & Sardinia | \n",
+ " Nicosia 2013 Vulkà Bianco (Etna) | \n",
+ " White Blend | \n",
+ " Nicosia | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " Portugal | \n",
+ " This is ripe and fruity, a wine that is smooth... | \n",
+ " Avidagos | \n",
+ " 87 | \n",
+ " 15.0 | \n",
+ " Douro | \n",
+ " Quinta dos Avidagos 2011 Avidagos Red (Douro) | \n",
+ " Portuguese Red | \n",
+ " Quinta dos Avidagos | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " US | \n",
+ " Tart and snappy, the flavors of lime flesh and... | \n",
+ " No Designation | \n",
+ " 87 | \n",
+ " 14.0 | \n",
+ " Oregon | \n",
+ " Rainstorm 2013 Pinot Gris (Willamette Valley) | \n",
+ " Pinot Gris | \n",
+ " Rainstorm | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " US | \n",
+ " Pineapple rind, lemon pith and orange blossom ... | \n",
+ " Reserve Late Harvest | \n",
+ " 87 | \n",
+ " 13.0 | \n",
+ " Michigan | \n",
+ " St. Julian 2013 Reserve Late Harvest Riesling ... | \n",
+ " Riesling | \n",
+ " St. Julian | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " US | \n",
+ " Much like the regular bottling from 2012, this... | \n",
+ " Vintner's Reserve Wild Child Block | \n",
+ " 87 | \n",
+ " 65.0 | \n",
+ " Oregon | \n",
+ " Sweet Cheeks 2012 Vintner's Reserve Wild Child... | \n",
+ " Pinot Noir | \n",
+ " Sweet Cheeks | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " Spain | \n",
+ " Blackberry and raspberry aromas show a typical... | \n",
+ " Ars In Vitro | \n",
+ " 87 | \n",
+ " 15.0 | \n",
+ " Northern Spain | \n",
+ " Tandem 2011 Ars In Vitro Tempranillo-Merlot (N... | \n",
+ " Tempranillo-Merlot | \n",
+ " Tandem | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " Italy | \n",
+ " Here's a bright, informal red that opens with ... | \n",
+ " Belsito | \n",
+ " 87 | \n",
+ " 16.0 | \n",
+ " Sicily & Sardinia | \n",
+ " Terre di Giurfo 2013 Belsito Frappato (Vittoria) | \n",
+ " Frappato | \n",
+ " Terre di Giurfo | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " France | \n",
+ " This dry and restrained wine offers spice in p... | \n",
+ " No Designation | \n",
+ " 87 | \n",
+ " 24.0 | \n",
+ " Alsace | \n",
+ " Trimbach 2012 Gewurztraminer (Alsace) | \n",
+ " Gewürztraminer | \n",
+ " Trimbach | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " Germany | \n",
+ " Savory dried thyme notes accent sunnier flavor... | \n",
+ " Shine | \n",
+ " 87 | \n",
+ " 12.0 | \n",
+ " Rheinhessen | \n",
+ " Heinz Eifel 2013 Shine Gewürztraminer (Rheinhe... | \n",
+ " Gewürztraminer | \n",
+ " Heinz Eifel | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " France | \n",
+ " This has great depth of flavor with its fresh ... | \n",
+ " Les Natures | \n",
+ " 87 | \n",
+ " 27.0 | \n",
+ " Alsace | \n",
+ " Jean-Baptiste Adam 2012 Les Natures Pinot Gris... | \n",
+ " Pinot Gris | \n",
+ " Jean-Baptiste Adam | \n",
+ "
\n",
+ " \n",
+ " 10 | \n",
+ " US | \n",
+ " Soft, supple plum envelopes an oaky structure ... | \n",
+ " Mountain Cuvée | \n",
+ " 87 | \n",
+ " 19.0 | \n",
+ " California | \n",
+ " Kirkland Signature 2011 Mountain Cuvée Caberne... | \n",
+ " Cabernet Sauvignon | \n",
+ " Kirkland Signature | \n",
+ "
\n",
+ " \n",
+ " 11 | \n",
+ " France | \n",
+ " This is a dry wine, very spicy, with a tight, ... | \n",
+ " No Designation | \n",
+ " 87 | \n",
+ " 30.0 | \n",
+ " Alsace | \n",
+ " Leon Beyer 2012 Gewurztraminer (Alsace) | \n",
+ " Gewürztraminer | \n",
+ " Leon Beyer | \n",
+ "
\n",
+ " \n",
+ " 12 | \n",
+ " US | \n",
+ " Slightly reduced, this wine offers a chalky, t... | \n",
+ " No Designation | \n",
+ " 87 | \n",
+ " 34.0 | \n",
+ " California | \n",
+ " Louis M. Martini 2012 Cabernet Sauvignon (Alex... | \n",
+ " Cabernet Sauvignon | \n",
+ " Louis M. Martini | \n",
+ "
\n",
+ " \n",
+ " 13 | \n",
+ " Italy | \n",
+ " This is dominated by oak and oak-driven aromas... | \n",
+ " Rosso | \n",
+ " 87 | \n",
+ " 0.0 | \n",
+ " Sicily & Sardinia | \n",
+ " Masseria Setteporte 2012 Rosso (Etna) | \n",
+ " Nerello Mascalese | \n",
+ " Masseria Setteporte | \n",
+ "
\n",
+ " \n",
+ " 14 | \n",
+ " US | \n",
+ " Building on 150 years and six generations of w... | \n",
+ " No Designation | \n",
+ " 87 | \n",
+ " 12.0 | \n",
+ " California | \n",
+ " Mirassou 2012 Chardonnay (Central Coast) | \n",
+ " Chardonnay | \n",
+ " Mirassou | \n",
+ "
\n",
+ " \n",
+ " 15 | \n",
+ " Germany | \n",
+ " Zesty orange peels and apple notes abound in t... | \n",
+ " Devon | \n",
+ " 87 | \n",
+ " 24.0 | \n",
+ " Mosel | \n",
+ " Richard Böcking 2013 Devon Riesling (Mosel) | \n",
+ " Riesling | \n",
+ " Richard Böcking | \n",
+ "
\n",
+ " \n",
+ " 16 | \n",
+ " Argentina | \n",
+ " Baked plum, molasses, balsamic vinegar and che... | \n",
+ " Felix | \n",
+ " 87 | \n",
+ " 30.0 | \n",
+ " Other | \n",
+ " Felix Lavaque 2010 Felix Malbec (Cafayate) | \n",
+ " Malbec | \n",
+ " Felix Lavaque | \n",
+ "
\n",
+ " \n",
+ " 17 | \n",
+ " Argentina | \n",
+ " Raw black-cherry aromas are direct and simple ... | \n",
+ " Winemaker Selection | \n",
+ " 87 | \n",
+ " 13.0 | \n",
+ " Mendoza Province | \n",
+ " Gaucho Andino 2011 Winemaker Selection Malbec ... | \n",
+ " Malbec | \n",
+ " Gaucho Andino | \n",
+ "
\n",
+ " \n",
+ " 18 | \n",
+ " Spain | \n",
+ " Desiccated blackberry, leather, charred wood a... | \n",
+ " Vendimia Seleccionada Finca Valdelayegua Singl... | \n",
+ " 87 | \n",
+ " 28.0 | \n",
+ " Northern Spain | \n",
+ " Pradorey 2010 Vendimia Seleccionada Finca Vald... | \n",
+ " Tempranillo Blend | \n",
+ " Pradorey | \n",
+ "
\n",
+ " \n",
+ " 19 | \n",
+ " US | \n",
+ " Red fruit aromas pervade on the nose, with cig... | \n",
+ " No Designation | \n",
+ " 87 | \n",
+ " 32.0 | \n",
+ " Virginia | \n",
+ " Quiévremont 2012 Meritage (Virginia) | \n",
+ " Meritage | \n",
+ " Quiévremont | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " country description \\\n",
+ "0 Italy Aromas include tropical fruit, broom, brimston... \n",
+ "1 Portugal This is ripe and fruity, a wine that is smooth... \n",
+ "2 US Tart and snappy, the flavors of lime flesh and... \n",
+ "3 US Pineapple rind, lemon pith and orange blossom ... \n",
+ "4 US Much like the regular bottling from 2012, this... \n",
+ "5 Spain Blackberry and raspberry aromas show a typical... \n",
+ "6 Italy Here's a bright, informal red that opens with ... \n",
+ "7 France This dry and restrained wine offers spice in p... \n",
+ "8 Germany Savory dried thyme notes accent sunnier flavor... \n",
+ "9 France This has great depth of flavor with its fresh ... \n",
+ "10 US Soft, supple plum envelopes an oaky structure ... \n",
+ "11 France This is a dry wine, very spicy, with a tight, ... \n",
+ "12 US Slightly reduced, this wine offers a chalky, t... \n",
+ "13 Italy This is dominated by oak and oak-driven aromas... \n",
+ "14 US Building on 150 years and six generations of w... \n",
+ "15 Germany Zesty orange peels and apple notes abound in t... \n",
+ "16 Argentina Baked plum, molasses, balsamic vinegar and che... \n",
+ "17 Argentina Raw black-cherry aromas are direct and simple ... \n",
+ "18 Spain Desiccated blackberry, leather, charred wood a... \n",
+ "19 US Red fruit aromas pervade on the nose, with cig... \n",
+ "\n",
+ " designation points price \\\n",
+ "0 Vulkà Bianco 87 0.0 \n",
+ "1 Avidagos 87 15.0 \n",
+ "2 No Designation 87 14.0 \n",
+ "3 Reserve Late Harvest 87 13.0 \n",
+ "4 Vintner's Reserve Wild Child Block 87 65.0 \n",
+ "5 Ars In Vitro 87 15.0 \n",
+ "6 Belsito 87 16.0 \n",
+ "7 No Designation 87 24.0 \n",
+ "8 Shine 87 12.0 \n",
+ "9 Les Natures 87 27.0 \n",
+ "10 Mountain Cuvée 87 19.0 \n",
+ "11 No Designation 87 30.0 \n",
+ "12 No Designation 87 34.0 \n",
+ "13 Rosso 87 0.0 \n",
+ "14 No Designation 87 12.0 \n",
+ "15 Devon 87 24.0 \n",
+ "16 Felix 87 30.0 \n",
+ "17 Winemaker Selection 87 13.0 \n",
+ "18 Vendimia Seleccionada Finca Valdelayegua Singl... 87 28.0 \n",
+ "19 No Designation 87 32.0 \n",
+ "\n",
+ " province title \\\n",
+ "0 Sicily & Sardinia Nicosia 2013 Vulkà Bianco (Etna) \n",
+ "1 Douro Quinta dos Avidagos 2011 Avidagos Red (Douro) \n",
+ "2 Oregon Rainstorm 2013 Pinot Gris (Willamette Valley) \n",
+ "3 Michigan St. Julian 2013 Reserve Late Harvest Riesling ... \n",
+ "4 Oregon Sweet Cheeks 2012 Vintner's Reserve Wild Child... \n",
+ "5 Northern Spain Tandem 2011 Ars In Vitro Tempranillo-Merlot (N... \n",
+ "6 Sicily & Sardinia Terre di Giurfo 2013 Belsito Frappato (Vittoria) \n",
+ "7 Alsace Trimbach 2012 Gewurztraminer (Alsace) \n",
+ "8 Rheinhessen Heinz Eifel 2013 Shine Gewürztraminer (Rheinhe... \n",
+ "9 Alsace Jean-Baptiste Adam 2012 Les Natures Pinot Gris... \n",
+ "10 California Kirkland Signature 2011 Mountain Cuvée Caberne... \n",
+ "11 Alsace Leon Beyer 2012 Gewurztraminer (Alsace) \n",
+ "12 California Louis M. Martini 2012 Cabernet Sauvignon (Alex... \n",
+ "13 Sicily & Sardinia Masseria Setteporte 2012 Rosso (Etna) \n",
+ "14 California Mirassou 2012 Chardonnay (Central Coast) \n",
+ "15 Mosel Richard Böcking 2013 Devon Riesling (Mosel) \n",
+ "16 Other Felix Lavaque 2010 Felix Malbec (Cafayate) \n",
+ "17 Mendoza Province Gaucho Andino 2011 Winemaker Selection Malbec ... \n",
+ "18 Northern Spain Pradorey 2010 Vendimia Seleccionada Finca Vald... \n",
+ "19 Virginia Quiévremont 2012 Meritage (Virginia) \n",
+ "\n",
+ " variety winery \n",
+ "0 White Blend Nicosia \n",
+ "1 Portuguese Red Quinta dos Avidagos \n",
+ "2 Pinot Gris Rainstorm \n",
+ "3 Riesling St. Julian \n",
+ "4 Pinot Noir Sweet Cheeks \n",
+ "5 Tempranillo-Merlot Tandem \n",
+ "6 Frappato Terre di Giurfo \n",
+ "7 Gewürztraminer Trimbach \n",
+ "8 Gewürztraminer Heinz Eifel \n",
+ "9 Pinot Gris Jean-Baptiste Adam \n",
+ "10 Cabernet Sauvignon Kirkland Signature \n",
+ "11 Gewürztraminer Leon Beyer \n",
+ "12 Cabernet Sauvignon Louis M. Martini \n",
+ "13 Nerello Mascalese Masseria Setteporte \n",
+ "14 Chardonnay Mirassou \n",
+ "15 Riesling Richard Böcking \n",
+ "16 Malbec Felix Lavaque \n",
+ "17 Malbec Gaucho Andino \n",
+ "18 Tempranillo Blend Pradorey \n",
+ "19 Meritage Quiévremont "
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "train_df.head(20)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " points | \n",
+ " price | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " count | \n",
+ " 129971.000000 | \n",
+ " 129971.000000 | \n",
+ "
\n",
+ " \n",
+ " mean | \n",
+ " 88.447138 | \n",
+ " 32.915697 | \n",
+ "
\n",
+ " \n",
+ " std | \n",
+ " 3.039730 | \n",
+ " 40.582167 | \n",
+ "
\n",
+ " \n",
+ " min | \n",
+ " 80.000000 | \n",
+ " 0.000000 | \n",
+ "
\n",
+ " \n",
+ " 25% | \n",
+ " 86.000000 | \n",
+ " 15.000000 | \n",
+ "
\n",
+ " \n",
+ " 50% | \n",
+ " 88.000000 | \n",
+ " 25.000000 | \n",
+ "
\n",
+ " \n",
+ " 75% | \n",
+ " 91.000000 | \n",
+ " 40.000000 | \n",
+ "
\n",
+ " \n",
+ " max | \n",
+ " 100.000000 | \n",
+ " 3300.000000 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " points price\n",
+ "count 129971.000000 129971.000000\n",
+ "mean 88.447138 32.915697\n",
+ "std 3.039730 40.582167\n",
+ "min 80.000000 0.000000\n",
+ "25% 86.000000 15.000000\n",
+ "50% 88.000000 25.000000\n",
+ "75% 91.000000 40.000000\n",
+ "max 100.000000 3300.000000"
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "train_df.describe()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T06:50:32.457112Z",
+ "iopub.status.busy": "2021-05-25T06:50:32.45653Z",
+ "iopub.status.idle": "2021-05-25T06:50:32.46346Z",
+ "shell.execute_reply": "2021-05-25T06:50:32.461467Z",
+ "shell.execute_reply.started": "2021-05-25T06:50:32.457067Z"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 87\n",
+ "1 87\n",
+ "2 87\n",
+ "3 87\n",
+ "4 87\n",
+ " ..\n",
+ "129966 90\n",
+ "129967 90\n",
+ "129968 90\n",
+ "129969 90\n",
+ "129970 90\n",
+ "Name: points, Length: 129971, dtype: int64"
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "label_train = train_df['points']\n",
+ "label_train"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T06:50:32.46513Z",
+ "iopub.status.busy": "2021-05-25T06:50:32.46484Z",
+ "iopub.status.idle": "2021-05-25T06:50:32.479833Z",
+ "shell.execute_reply": "2021-05-25T06:50:32.478601Z",
+ "shell.execute_reply.started": "2021-05-25T06:50:32.465102Z"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 87\n",
+ "1 87\n",
+ "2 87\n",
+ "3 87\n",
+ "4 87\n",
+ "5 87\n",
+ "6 87\n",
+ "7 87\n",
+ "8 87\n",
+ "9 87\n",
+ "Name: points, dtype: int64"
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "label_train.head(10)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T06:50:32.481757Z",
+ "iopub.status.busy": "2021-05-25T06:50:32.481439Z",
+ "iopub.status.idle": "2021-05-25T06:50:32.493571Z",
+ "shell.execute_reply": "2021-05-25T06:50:32.492736Z",
+ "shell.execute_reply.started": "2021-05-25T06:50:32.481728Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "train_df = train_df.drop(\"points\", axis = 1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T06:50:32.495566Z",
+ "iopub.status.busy": "2021-05-25T06:50:32.495116Z",
+ "iopub.status.idle": "2021-05-25T06:50:32.513957Z",
+ "shell.execute_reply": "2021-05-25T06:50:32.51265Z",
+ "shell.execute_reply.started": "2021-05-25T06:50:32.495526Z"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " country | \n",
+ " description | \n",
+ " designation | \n",
+ " price | \n",
+ " province | \n",
+ " title | \n",
+ " variety | \n",
+ " winery | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " Italy | \n",
+ " Aromas include tropical fruit, broom, brimston... | \n",
+ " Vulkà Bianco | \n",
+ " 0.0 | \n",
+ " Sicily & Sardinia | \n",
+ " Nicosia 2013 Vulkà Bianco (Etna) | \n",
+ " White Blend | \n",
+ " Nicosia | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " Portugal | \n",
+ " This is ripe and fruity, a wine that is smooth... | \n",
+ " Avidagos | \n",
+ " 15.0 | \n",
+ " Douro | \n",
+ " Quinta dos Avidagos 2011 Avidagos Red (Douro) | \n",
+ " Portuguese Red | \n",
+ " Quinta dos Avidagos | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " US | \n",
+ " Tart and snappy, the flavors of lime flesh and... | \n",
+ " No Designation | \n",
+ " 14.0 | \n",
+ " Oregon | \n",
+ " Rainstorm 2013 Pinot Gris (Willamette Valley) | \n",
+ " Pinot Gris | \n",
+ " Rainstorm | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " US | \n",
+ " Pineapple rind, lemon pith and orange blossom ... | \n",
+ " Reserve Late Harvest | \n",
+ " 13.0 | \n",
+ " Michigan | \n",
+ " St. Julian 2013 Reserve Late Harvest Riesling ... | \n",
+ " Riesling | \n",
+ " St. Julian | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " US | \n",
+ " Much like the regular bottling from 2012, this... | \n",
+ " Vintner's Reserve Wild Child Block | \n",
+ " 65.0 | \n",
+ " Oregon | \n",
+ " Sweet Cheeks 2012 Vintner's Reserve Wild Child... | \n",
+ " Pinot Noir | \n",
+ " Sweet Cheeks | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " Spain | \n",
+ " Blackberry and raspberry aromas show a typical... | \n",
+ " Ars In Vitro | \n",
+ " 15.0 | \n",
+ " Northern Spain | \n",
+ " Tandem 2011 Ars In Vitro Tempranillo-Merlot (N... | \n",
+ " Tempranillo-Merlot | \n",
+ " Tandem | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " Italy | \n",
+ " Here's a bright, informal red that opens with ... | \n",
+ " Belsito | \n",
+ " 16.0 | \n",
+ " Sicily & Sardinia | \n",
+ " Terre di Giurfo 2013 Belsito Frappato (Vittoria) | \n",
+ " Frappato | \n",
+ " Terre di Giurfo | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " France | \n",
+ " This dry and restrained wine offers spice in p... | \n",
+ " No Designation | \n",
+ " 24.0 | \n",
+ " Alsace | \n",
+ " Trimbach 2012 Gewurztraminer (Alsace) | \n",
+ " Gewürztraminer | \n",
+ " Trimbach | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " Germany | \n",
+ " Savory dried thyme notes accent sunnier flavor... | \n",
+ " Shine | \n",
+ " 12.0 | \n",
+ " Rheinhessen | \n",
+ " Heinz Eifel 2013 Shine Gewürztraminer (Rheinhe... | \n",
+ " Gewürztraminer | \n",
+ " Heinz Eifel | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " France | \n",
+ " This has great depth of flavor with its fresh ... | \n",
+ " Les Natures | \n",
+ " 27.0 | \n",
+ " Alsace | \n",
+ " Jean-Baptiste Adam 2012 Les Natures Pinot Gris... | \n",
+ " Pinot Gris | \n",
+ " Jean-Baptiste Adam | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " country description \\\n",
+ "0 Italy Aromas include tropical fruit, broom, brimston... \n",
+ "1 Portugal This is ripe and fruity, a wine that is smooth... \n",
+ "2 US Tart and snappy, the flavors of lime flesh and... \n",
+ "3 US Pineapple rind, lemon pith and orange blossom ... \n",
+ "4 US Much like the regular bottling from 2012, this... \n",
+ "5 Spain Blackberry and raspberry aromas show a typical... \n",
+ "6 Italy Here's a bright, informal red that opens with ... \n",
+ "7 France This dry and restrained wine offers spice in p... \n",
+ "8 Germany Savory dried thyme notes accent sunnier flavor... \n",
+ "9 France This has great depth of flavor with its fresh ... \n",
+ "\n",
+ " designation price province \\\n",
+ "0 Vulkà Bianco 0.0 Sicily & Sardinia \n",
+ "1 Avidagos 15.0 Douro \n",
+ "2 No Designation 14.0 Oregon \n",
+ "3 Reserve Late Harvest 13.0 Michigan \n",
+ "4 Vintner's Reserve Wild Child Block 65.0 Oregon \n",
+ "5 Ars In Vitro 15.0 Northern Spain \n",
+ "6 Belsito 16.0 Sicily & Sardinia \n",
+ "7 No Designation 24.0 Alsace \n",
+ "8 Shine 12.0 Rheinhessen \n",
+ "9 Les Natures 27.0 Alsace \n",
+ "\n",
+ " title variety \\\n",
+ "0 Nicosia 2013 Vulkà Bianco (Etna) White Blend \n",
+ "1 Quinta dos Avidagos 2011 Avidagos Red (Douro) Portuguese Red \n",
+ "2 Rainstorm 2013 Pinot Gris (Willamette Valley) Pinot Gris \n",
+ "3 St. Julian 2013 Reserve Late Harvest Riesling ... Riesling \n",
+ "4 Sweet Cheeks 2012 Vintner's Reserve Wild Child... Pinot Noir \n",
+ "5 Tandem 2011 Ars In Vitro Tempranillo-Merlot (N... Tempranillo-Merlot \n",
+ "6 Terre di Giurfo 2013 Belsito Frappato (Vittoria) Frappato \n",
+ "7 Trimbach 2012 Gewurztraminer (Alsace) Gewürztraminer \n",
+ "8 Heinz Eifel 2013 Shine Gewürztraminer (Rheinhe... Gewürztraminer \n",
+ "9 Jean-Baptiste Adam 2012 Les Natures Pinot Gris... Pinot Gris \n",
+ "\n",
+ " winery \n",
+ "0 Nicosia \n",
+ "1 Quinta dos Avidagos \n",
+ "2 Rainstorm \n",
+ "3 St. Julian \n",
+ "4 Sweet Cheeks \n",
+ "5 Tandem \n",
+ "6 Terre di Giurfo \n",
+ "7 Trimbach \n",
+ "8 Heinz Eifel \n",
+ "9 Jean-Baptiste Adam "
+ ]
+ },
+ "execution_count": 21,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "train_df.head(10)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\u001b[1mCOLUMN\u001b[0m \u001b[1mNULL VALUES COUNT\u001b[0m\n",
+ "country 0\n",
+ "description 0\n",
+ "designation 0\n",
+ "price 0\n",
+ "province 0\n",
+ "title 0\n",
+ "variety 0\n",
+ "winery 0\n"
+ ]
+ }
+ ],
+ "source": [
+ "data_qualityCheck()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_df[\"text\"]=train_df[\"country\"]+\" \"+train_df[\"description\"]+\" \"+train_df[\"designation\"]+\" \"+train_df[\"province\"]+\" \"+train_df[\"title\"]+\" \"+train_df[\"variety\"]+\" \"+train_df[\"winery\"]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_df=train_df.drop([\"designation\",\"country\",\"province\",\"description\",\"title\",\"variety\",\"winery\"],axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " price | \n",
+ " text | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 0.0 | \n",
+ " Italy Aromas include tropical fruit, broom, br... | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 15.0 | \n",
+ " Portugal This is ripe and fruity, a wine that ... | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 14.0 | \n",
+ " US Tart and snappy, the flavors of lime flesh ... | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 13.0 | \n",
+ " US Pineapple rind, lemon pith and orange bloss... | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 65.0 | \n",
+ " US Much like the regular bottling from 2012, t... | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " 15.0 | \n",
+ " Spain Blackberry and raspberry aromas show a t... | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " 16.0 | \n",
+ " Italy Here's a bright, informal red that opens... | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " 24.0 | \n",
+ " France This dry and restrained wine offers spi... | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " 12.0 | \n",
+ " Germany Savory dried thyme notes accent sunnie... | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " 27.0 | \n",
+ " France This has great depth of flavor with its... | \n",
+ "
\n",
+ " \n",
+ " 10 | \n",
+ " 19.0 | \n",
+ " US Soft, supple plum envelopes an oaky structu... | \n",
+ "
\n",
+ " \n",
+ " 11 | \n",
+ " 30.0 | \n",
+ " France This is a dry wine, very spicy, with a ... | \n",
+ "
\n",
+ " \n",
+ " 12 | \n",
+ " 34.0 | \n",
+ " US Slightly reduced, this wine offers a chalky... | \n",
+ "
\n",
+ " \n",
+ " 13 | \n",
+ " 0.0 | \n",
+ " Italy This is dominated by oak and oak-driven ... | \n",
+ "
\n",
+ " \n",
+ " 14 | \n",
+ " 12.0 | \n",
+ " US Building on 150 years and six generations o... | \n",
+ "
\n",
+ " \n",
+ " 15 | \n",
+ " 24.0 | \n",
+ " Germany Zesty orange peels and apple notes abo... | \n",
+ "
\n",
+ " \n",
+ " 16 | \n",
+ " 30.0 | \n",
+ " Argentina Baked plum, molasses, balsamic vineg... | \n",
+ "
\n",
+ " \n",
+ " 17 | \n",
+ " 13.0 | \n",
+ " Argentina Raw black-cherry aromas are direct a... | \n",
+ "
\n",
+ " \n",
+ " 18 | \n",
+ " 28.0 | \n",
+ " Spain Desiccated blackberry, leather, charred ... | \n",
+ "
\n",
+ " \n",
+ " 19 | \n",
+ " 32.0 | \n",
+ " US Red fruit aromas pervade on the nose, with ... | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " price text\n",
+ "0 0.0 Italy Aromas include tropical fruit, broom, br...\n",
+ "1 15.0 Portugal This is ripe and fruity, a wine that ...\n",
+ "2 14.0 US Tart and snappy, the flavors of lime flesh ...\n",
+ "3 13.0 US Pineapple rind, lemon pith and orange bloss...\n",
+ "4 65.0 US Much like the regular bottling from 2012, t...\n",
+ "5 15.0 Spain Blackberry and raspberry aromas show a t...\n",
+ "6 16.0 Italy Here's a bright, informal red that opens...\n",
+ "7 24.0 France This dry and restrained wine offers spi...\n",
+ "8 12.0 Germany Savory dried thyme notes accent sunnie...\n",
+ "9 27.0 France This has great depth of flavor with its...\n",
+ "10 19.0 US Soft, supple plum envelopes an oaky structu...\n",
+ "11 30.0 France This is a dry wine, very spicy, with a ...\n",
+ "12 34.0 US Slightly reduced, this wine offers a chalky...\n",
+ "13 0.0 Italy This is dominated by oak and oak-driven ...\n",
+ "14 12.0 US Building on 150 years and six generations o...\n",
+ "15 24.0 Germany Zesty orange peels and apple notes abo...\n",
+ "16 30.0 Argentina Baked plum, molasses, balsamic vineg...\n",
+ "17 13.0 Argentina Raw black-cherry aromas are direct a...\n",
+ "18 28.0 Spain Desiccated blackberry, leather, charred ...\n",
+ "19 32.0 US Red fruit aromas pervade on the nose, with ..."
+ ]
+ },
+ "execution_count": 25,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "train_df.head(20)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "price=train_df[\"price\"]\n",
+ "train_df=train_df.drop(\"price\",axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "custom_download_dir = \"C:\\\\Users\\\\ysach/nltk\"\n",
+ "nltk.data.path.append(custom_download_dir)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "[nltk_data] Downloading package stopwords to C:\\Users\\ysach/nltk...\n",
+ "[nltk_data] Package stopwords is already up-to-date!\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "True"
+ ]
+ },
+ "execution_count": 28,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "nltk.download('stopwords',download_dir=custom_download_dir)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T06:50:32.51602Z",
+ "iopub.status.busy": "2021-05-25T06:50:32.515411Z",
+ "iopub.status.idle": "2021-05-25T06:50:32.531829Z",
+ "shell.execute_reply": "2021-05-25T06:50:32.530895Z",
+ "shell.execute_reply.started": "2021-05-25T06:50:32.515972Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "lemmatizer = WordNetLemmatizer()\n",
+ "stpwrds = list(stopwords.words('english'))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "['i',\n",
+ " 'me',\n",
+ " 'my',\n",
+ " 'myself',\n",
+ " 'we',\n",
+ " 'our',\n",
+ " 'ours',\n",
+ " 'ourselves',\n",
+ " 'you',\n",
+ " \"you're\",\n",
+ " \"you've\",\n",
+ " \"you'll\",\n",
+ " \"you'd\",\n",
+ " 'your',\n",
+ " 'yours',\n",
+ " 'yourself',\n",
+ " 'yourselves',\n",
+ " 'he',\n",
+ " 'him',\n",
+ " 'his',\n",
+ " 'himself',\n",
+ " 'she',\n",
+ " \"she's\",\n",
+ " 'her',\n",
+ " 'hers',\n",
+ " 'herself',\n",
+ " 'it',\n",
+ " \"it's\",\n",
+ " 'its',\n",
+ " 'itself',\n",
+ " 'they',\n",
+ " 'them',\n",
+ " 'their',\n",
+ " 'theirs',\n",
+ " 'themselves',\n",
+ " 'what',\n",
+ " 'which',\n",
+ " 'who',\n",
+ " 'whom',\n",
+ " 'this',\n",
+ " 'that',\n",
+ " \"that'll\",\n",
+ " 'these',\n",
+ " 'those',\n",
+ " 'am',\n",
+ " 'is',\n",
+ " 'are',\n",
+ " 'was',\n",
+ " 'were',\n",
+ " 'be',\n",
+ " 'been',\n",
+ " 'being',\n",
+ " 'have',\n",
+ " 'has',\n",
+ " 'had',\n",
+ " 'having',\n",
+ " 'do',\n",
+ " 'does',\n",
+ " 'did',\n",
+ " 'doing',\n",
+ " 'a',\n",
+ " 'an',\n",
+ " 'the',\n",
+ " 'and',\n",
+ " 'but',\n",
+ " 'if',\n",
+ " 'or',\n",
+ " 'because',\n",
+ " 'as',\n",
+ " 'until',\n",
+ " 'while',\n",
+ " 'of',\n",
+ " 'at',\n",
+ " 'by',\n",
+ " 'for',\n",
+ " 'with',\n",
+ " 'about',\n",
+ " 'against',\n",
+ " 'between',\n",
+ " 'into',\n",
+ " 'through',\n",
+ " 'during',\n",
+ " 'before',\n",
+ " 'after',\n",
+ " 'above',\n",
+ " 'below',\n",
+ " 'to',\n",
+ " 'from',\n",
+ " 'up',\n",
+ " 'down',\n",
+ " 'in',\n",
+ " 'out',\n",
+ " 'on',\n",
+ " 'off',\n",
+ " 'over',\n",
+ " 'under',\n",
+ " 'again',\n",
+ " 'further',\n",
+ " 'then',\n",
+ " 'once',\n",
+ " 'here',\n",
+ " 'there',\n",
+ " 'when',\n",
+ " 'where',\n",
+ " 'why',\n",
+ " 'how',\n",
+ " 'all',\n",
+ " 'any',\n",
+ " 'both',\n",
+ " 'each',\n",
+ " 'few',\n",
+ " 'more',\n",
+ " 'most',\n",
+ " 'other',\n",
+ " 'some',\n",
+ " 'such',\n",
+ " 'no',\n",
+ " 'nor',\n",
+ " 'not',\n",
+ " 'only',\n",
+ " 'own',\n",
+ " 'same',\n",
+ " 'so',\n",
+ " 'than',\n",
+ " 'too',\n",
+ " 'very',\n",
+ " 's',\n",
+ " 't',\n",
+ " 'can',\n",
+ " 'will',\n",
+ " 'just',\n",
+ " 'don',\n",
+ " \"don't\",\n",
+ " 'should',\n",
+ " \"should've\",\n",
+ " 'now',\n",
+ " 'd',\n",
+ " 'll',\n",
+ " 'm',\n",
+ " 'o',\n",
+ " 're',\n",
+ " 've',\n",
+ " 'y',\n",
+ " 'ain',\n",
+ " 'aren',\n",
+ " \"aren't\",\n",
+ " 'couldn',\n",
+ " \"couldn't\",\n",
+ " 'didn',\n",
+ " \"didn't\",\n",
+ " 'doesn',\n",
+ " \"doesn't\",\n",
+ " 'hadn',\n",
+ " \"hadn't\",\n",
+ " 'hasn',\n",
+ " \"hasn't\",\n",
+ " 'haven',\n",
+ " \"haven't\",\n",
+ " 'isn',\n",
+ " \"isn't\",\n",
+ " 'ma',\n",
+ " 'mightn',\n",
+ " \"mightn't\",\n",
+ " 'mustn',\n",
+ " \"mustn't\",\n",
+ " 'needn',\n",
+ " \"needn't\",\n",
+ " 'shan',\n",
+ " \"shan't\",\n",
+ " 'shouldn',\n",
+ " \"shouldn't\",\n",
+ " 'wasn',\n",
+ " \"wasn't\",\n",
+ " 'weren',\n",
+ " \"weren't\",\n",
+ " 'won',\n",
+ " \"won't\",\n",
+ " 'wouldn',\n",
+ " \"wouldn't\"]"
+ ]
+ },
+ "execution_count": 31,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "stpwrds"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "[nltk_data] Downloading package punkt to C:\\Users\\ysach/nltk...\n",
+ "[nltk_data] Package punkt is already up-to-date!\n",
+ "[nltk_data] Downloading package wordnet to C:\\Users\\ysach/nltk...\n",
+ "[nltk_data] Package wordnet is already up-to-date!\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "True"
+ ]
+ },
+ "execution_count": 32,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "nltk.download('punkt',download_dir=custom_download_dir)\n",
+ "nltk.download('wordnet',download_dir=custom_download_dir)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 33,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "[nltk_data] Downloading package omw-1.4 to C:\\Users\\ysach/nltk...\n",
+ "[nltk_data] Package omw-1.4 is already up-to-date!\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "True"
+ ]
+ },
+ "execution_count": 33,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "nltk.download('omw-1.4',download_dir=custom_download_dir)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 34,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T06:50:32.54905Z",
+ "iopub.status.busy": "2021-05-25T06:50:32.548517Z",
+ "iopub.status.idle": "2021-05-25T06:53:51.648153Z",
+ "shell.execute_reply": "2021-05-25T06:53:51.647283Z",
+ "shell.execute_reply.started": "2021-05-25T06:50:32.549015Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "for x in range(len(train_df)) :\n",
+ " corpus = []\n",
+ " review = train_df['text'][x]\n",
+ " review = re.sub(r'[^a-zA-Z\\s]', '', review)\n",
+ " review = review.lower()\n",
+ " review = nltk.word_tokenize(review)\n",
+ " for y in review :\n",
+ " if y not in stpwrds :\n",
+ " corpus.append(lemmatizer.lemmatize(y))\n",
+ " review = ' '.join(corpus)\n",
+ " train_df['text'][x] = review"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 35,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T07:14:51.798724Z",
+ "iopub.status.busy": "2021-05-25T07:14:51.798361Z",
+ "iopub.status.idle": "2021-05-25T07:14:51.805617Z",
+ "shell.execute_reply": "2021-05-25T07:14:51.804946Z",
+ "shell.execute_reply.started": "2021-05-25T07:14:51.798694Z"
+ },
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'u touch riesling accentuates fresh citrusy backbone cabernet sauvignon ro dry style sprightly lightfooted tone offer load concentrated cherry berry flavor finish brisk clean dry new york osprey dominion dry ro north fork long island ro osprey dominion'"
+ ]
+ },
+ "execution_count": 35,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "train_df['text'][2188]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "89"
+ ]
+ },
+ "execution_count": 36,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "label_train[2188]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T07:16:37.152728Z",
+ "iopub.status.busy": "2021-05-25T07:16:37.152216Z",
+ "iopub.status.idle": "2021-05-25T07:16:37.163059Z",
+ "shell.execute_reply": "2021-05-25T07:16:37.161884Z",
+ "shell.execute_reply.started": "2021-05-25T07:16:37.152696Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "X_train= train_df['text']"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 italy aroma include tropical fruit broom brims...\n",
+ "1 portugal ripe fruity wine smooth still structu...\n",
+ "2 u tart snappy flavor lime flesh rind dominate ...\n",
+ "3 u pineapple rind lemon pith orange blossom sta...\n",
+ "4 u much like regular bottling come across rathe...\n",
+ " ... \n",
+ "129966 germany note honeysuckle cantaloupe sweeten de...\n",
+ "129967 u citation given much decade bottle age prior ...\n",
+ "129968 france welldrained gravel soil give wine crisp...\n",
+ "129969 france dry style pinot gris crisp acidity also...\n",
+ "129970 france big rich offdry powered intense spicine...\n",
+ "Name: text, Length: 129971, dtype: object"
+ ]
+ },
+ "execution_count": 38,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "X_train"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T07:17:50.592597Z",
+ "iopub.status.busy": "2021-05-25T07:17:50.592095Z",
+ "iopub.status.idle": "2021-05-25T07:17:50.598862Z",
+ "shell.execute_reply": "2021-05-25T07:17:50.597641Z",
+ "shell.execute_reply.started": "2021-05-25T07:17:50.592566Z"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(129971,)"
+ ]
+ },
+ "execution_count": 39,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "X_train.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-05-25T07:18:05.89317Z",
+ "iopub.status.busy": "2021-05-25T07:18:05.892651Z",
+ "iopub.status.idle": "2021-05-25T07:18:05.902743Z",
+ "shell.execute_reply": "2021-05-25T07:18:05.901523Z",
+ "shell.execute_reply.started": "2021-05-25T07:18:05.893127Z"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(129971,)"
+ ]
+ },
+ "execution_count": 40,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "label_train.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 45,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from keras.preprocessing.text import Tokenizer\n",
+ "from keras.preprocessing.sequence import pad_sequences"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 46,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The Padding Sequance Shape is --> (129971, 89)\n"
+ ]
+ }
+ ],
+ "source": [
+ "tokenize = Tokenizer(oov_token=\"\")\n",
+ "tokenize.fit_on_texts(X_train)\n",
+ "word_idx = tokenize.word_index\n",
+ "\n",
+ "text2seq = tokenize.texts_to_sequences(X_train)\n",
+ "\n",
+ "# pad_seq = pad_sequences(text2seq, maxlen=150, padding=\"pre\", truncating=\"pre\")\n",
+ "\n",
+ "pad_seq = pad_sequences(text2seq, padding=\"pre\", truncating=\"pre\")\n",
+ "\n",
+ "\n",
+ "print(\"The Padding Sequance Shape is --> \", pad_seq.shape)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 47,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "input_length = max(len(seq) for seq in text2seq)\n",
+ "\n",
+ "vocabulary_size = len(word_idx) + 1"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 48,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The maximum Sequance Length is --> 89\n",
+ "The vocabulary size of dataset is --> 63971\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(\"The maximum Sequance Length is --> \", input_length)\n",
+ "print(\"The vocabulary size of dataset is --> \", vocabulary_size)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 49,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df=pd.DataFrame(pad_seq)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 50,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 3 | \n",
+ " 4 | \n",
+ " 5 | \n",
+ " 6 | \n",
+ " 7 | \n",
+ " 8 | \n",
+ " 9 | \n",
+ " ... | \n",
+ " 79 | \n",
+ " 80 | \n",
+ " 81 | \n",
+ " 82 | \n",
+ " 83 | \n",
+ " 84 | \n",
+ " 85 | \n",
+ " 86 | \n",
+ " 87 | \n",
+ " 88 | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 824 | \n",
+ " 429 | \n",
+ " 449 | \n",
+ " 11386 | \n",
+ " 28260 | \n",
+ " 824 | \n",
+ " 1664 | \n",
+ " 30 | \n",
+ " 8 | \n",
+ " 11386 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 1377 | \n",
+ " 11387 | \n",
+ " 11387 | \n",
+ " 5 | \n",
+ " 277 | \n",
+ " 215 | \n",
+ " 5 | \n",
+ " 249 | \n",
+ " 1377 | \n",
+ " 11387 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 15 | \n",
+ " 104 | \n",
+ " 9316 | \n",
+ " 11 | \n",
+ " 225 | \n",
+ " 373 | \n",
+ " 9 | \n",
+ " 11 | \n",
+ " 225 | \n",
+ " 9316 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 59 | \n",
+ " 488 | \n",
+ " 826 | \n",
+ " 48 | \n",
+ " 293 | \n",
+ " 2608 | \n",
+ " 4718 | \n",
+ " 48 | \n",
+ " 523 | \n",
+ " 6353 | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 7384 | \n",
+ " 393 | \n",
+ " 11 | \n",
+ " 23 | \n",
+ " 373 | \n",
+ " 9 | \n",
+ " 11 | \n",
+ " 23 | \n",
+ " 50 | \n",
+ " 5969 | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 36628 | \n",
+ " 186 | \n",
+ " 58 | \n",
+ " 7385 | \n",
+ " 6729 | \n",
+ " 36628 | \n",
+ " 14584 | \n",
+ " 2103 | \n",
+ " 14584 | \n",
+ " 7385 | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 571 | \n",
+ " 74 | \n",
+ " 11791 | \n",
+ " 23585 | \n",
+ " 3139 | \n",
+ " 3853 | \n",
+ " 3139 | \n",
+ " 571 | \n",
+ " 74 | \n",
+ " 11791 | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 52 | \n",
+ " 290 | \n",
+ " 440 | \n",
+ " 15 | \n",
+ " 172 | \n",
+ " 6031 | \n",
+ " 1064 | \n",
+ " 172 | \n",
+ " 446 | \n",
+ " 6031 | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 855 | \n",
+ " 1135 | \n",
+ " 7630 | \n",
+ " 7922 | \n",
+ " 855 | \n",
+ " 446 | \n",
+ " 1135 | \n",
+ " 446 | \n",
+ " 7630 | \n",
+ " 7922 | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 2141 | \n",
+ " 77 | \n",
+ " 724 | \n",
+ " 11 | \n",
+ " 225 | \n",
+ " 172 | \n",
+ " 11 | \n",
+ " 225 | \n",
+ " 3124 | \n",
+ " 2141 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
10 rows × 89 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 0 1 2 3 4 5 6 7 8 9 ... 79 80 81 82 \\\n",
+ "0 0 0 0 0 0 0 0 0 0 0 ... 824 429 449 11386 \n",
+ "1 0 0 0 0 0 0 0 0 0 0 ... 1377 11387 11387 5 \n",
+ "2 0 0 0 0 0 0 0 0 0 0 ... 15 104 9316 11 \n",
+ "3 0 0 0 0 0 0 0 0 0 0 ... 59 488 826 48 \n",
+ "4 0 0 0 0 0 0 0 0 0 0 ... 7384 393 11 23 \n",
+ "5 0 0 0 0 0 0 0 0 0 0 ... 36628 186 58 7385 \n",
+ "6 0 0 0 0 0 0 0 0 0 0 ... 571 74 11791 23585 \n",
+ "7 0 0 0 0 0 0 0 0 0 0 ... 52 290 440 15 \n",
+ "8 0 0 0 0 0 0 0 0 0 0 ... 855 1135 7630 7922 \n",
+ "9 0 0 0 0 0 0 0 0 0 0 ... 2141 77 724 11 \n",
+ "\n",
+ " 83 84 85 86 87 88 \n",
+ "0 28260 824 1664 30 8 11386 \n",
+ "1 277 215 5 249 1377 11387 \n",
+ "2 225 373 9 11 225 9316 \n",
+ "3 293 2608 4718 48 523 6353 \n",
+ "4 373 9 11 23 50 5969 \n",
+ "5 6729 36628 14584 2103 14584 7385 \n",
+ "6 3139 3853 3139 571 74 11791 \n",
+ "7 172 6031 1064 172 446 6031 \n",
+ "8 855 446 1135 446 7630 7922 \n",
+ "9 225 172 11 225 3124 2141 \n",
+ "\n",
+ "[10 rows x 89 columns]"
+ ]
+ },
+ "execution_count": 50,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df.head(10)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 51,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df['89']=price"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 52,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 3 | \n",
+ " 4 | \n",
+ " 5 | \n",
+ " 6 | \n",
+ " 7 | \n",
+ " 8 | \n",
+ " 9 | \n",
+ " ... | \n",
+ " 80 | \n",
+ " 81 | \n",
+ " 82 | \n",
+ " 83 | \n",
+ " 84 | \n",
+ " 85 | \n",
+ " 86 | \n",
+ " 87 | \n",
+ " 88 | \n",
+ " 89 | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 429 | \n",
+ " 449 | \n",
+ " 11386 | \n",
+ " 28260 | \n",
+ " 824 | \n",
+ " 1664 | \n",
+ " 30 | \n",
+ " 8 | \n",
+ " 11386 | \n",
+ " 0.0 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 11387 | \n",
+ " 11387 | \n",
+ " 5 | \n",
+ " 277 | \n",
+ " 215 | \n",
+ " 5 | \n",
+ " 249 | \n",
+ " 1377 | \n",
+ " 11387 | \n",
+ " 15.0 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 104 | \n",
+ " 9316 | \n",
+ " 11 | \n",
+ " 225 | \n",
+ " 373 | \n",
+ " 9 | \n",
+ " 11 | \n",
+ " 225 | \n",
+ " 9316 | \n",
+ " 14.0 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 488 | \n",
+ " 826 | \n",
+ " 48 | \n",
+ " 293 | \n",
+ " 2608 | \n",
+ " 4718 | \n",
+ " 48 | \n",
+ " 523 | \n",
+ " 6353 | \n",
+ " 13.0 | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 393 | \n",
+ " 11 | \n",
+ " 23 | \n",
+ " 373 | \n",
+ " 9 | \n",
+ " 11 | \n",
+ " 23 | \n",
+ " 50 | \n",
+ " 5969 | \n",
+ " 65.0 | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 186 | \n",
+ " 58 | \n",
+ " 7385 | \n",
+ " 6729 | \n",
+ " 36628 | \n",
+ " 14584 | \n",
+ " 2103 | \n",
+ " 14584 | \n",
+ " 7385 | \n",
+ " 15.0 | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 74 | \n",
+ " 11791 | \n",
+ " 23585 | \n",
+ " 3139 | \n",
+ " 3853 | \n",
+ " 3139 | \n",
+ " 571 | \n",
+ " 74 | \n",
+ " 11791 | \n",
+ " 16.0 | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 290 | \n",
+ " 440 | \n",
+ " 15 | \n",
+ " 172 | \n",
+ " 6031 | \n",
+ " 1064 | \n",
+ " 172 | \n",
+ " 446 | \n",
+ " 6031 | \n",
+ " 24.0 | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 1135 | \n",
+ " 7630 | \n",
+ " 7922 | \n",
+ " 855 | \n",
+ " 446 | \n",
+ " 1135 | \n",
+ " 446 | \n",
+ " 7630 | \n",
+ " 7922 | \n",
+ " 12.0 | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 77 | \n",
+ " 724 | \n",
+ " 11 | \n",
+ " 225 | \n",
+ " 172 | \n",
+ " 11 | \n",
+ " 225 | \n",
+ " 3124 | \n",
+ " 2141 | \n",
+ " 27.0 | \n",
+ "
\n",
+ " \n",
+ " 10 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 158 | \n",
+ " 18 | \n",
+ " 16 | \n",
+ " 132 | \n",
+ " 9 | \n",
+ " 18 | \n",
+ " 16 | \n",
+ " 2523 | \n",
+ " 857 | \n",
+ " 19.0 | \n",
+ "
\n",
+ " \n",
+ " 11 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 15 | \n",
+ " 172 | \n",
+ " 5006 | \n",
+ " 3854 | \n",
+ " 1064 | \n",
+ " 172 | \n",
+ " 446 | \n",
+ " 5006 | \n",
+ " 3854 | \n",
+ " 30.0 | \n",
+ "
\n",
+ " \n",
+ " 12 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 3434 | \n",
+ " 18 | \n",
+ " 16 | \n",
+ " 692 | \n",
+ " 9 | \n",
+ " 18 | \n",
+ " 16 | \n",
+ " 580 | \n",
+ " 3434 | \n",
+ " 34.0 | \n",
+ "
\n",
+ " \n",
+ " 13 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 449 | \n",
+ " 3784 | \n",
+ " 23586 | \n",
+ " 541 | \n",
+ " 1664 | \n",
+ " 1734 | \n",
+ " 1925 | \n",
+ " 3784 | \n",
+ " 23586 | \n",
+ " 0.0 | \n",
+ "
\n",
+ " \n",
+ " 14 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 1233 | \n",
+ " 15 | \n",
+ " 10 | \n",
+ " 5105 | \n",
+ " 26 | \n",
+ " 209 | \n",
+ " 260 | \n",
+ " 26 | \n",
+ " 5105 | \n",
+ " 12.0 | \n",
+ "
\n",
+ " \n",
+ " 15 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 400 | \n",
+ " 3244 | \n",
+ " 12042 | \n",
+ " 11388 | \n",
+ " 48 | \n",
+ " 400 | \n",
+ " 48 | \n",
+ " 3244 | \n",
+ " 12042 | \n",
+ " 24.0 | \n",
+ "
\n",
+ " \n",
+ " 16 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 5616 | \n",
+ " 5616 | \n",
+ " 20414 | \n",
+ " 5616 | \n",
+ " 92 | \n",
+ " 3734 | \n",
+ " 92 | \n",
+ " 5616 | \n",
+ " 20414 | \n",
+ " 30.0 | \n",
+ "
\n",
+ " \n",
+ " 17 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 6032 | \n",
+ " 20415 | \n",
+ " 354 | \n",
+ " 360 | \n",
+ " 92 | \n",
+ " 135 | \n",
+ " 92 | \n",
+ " 6032 | \n",
+ " 20415 | \n",
+ " 13.0 | \n",
+ "
\n",
+ " \n",
+ " 18 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 423 | \n",
+ " 13 | \n",
+ " 603 | \n",
+ " 790 | \n",
+ " 168 | \n",
+ " 886 | \n",
+ " 212 | \n",
+ " 8 | \n",
+ " 6926 | \n",
+ " 28.0 | \n",
+ "
\n",
+ " \n",
+ " 19 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 165 | \n",
+ " 1671 | \n",
+ " 15 | \n",
+ " 540 | \n",
+ " 23587 | \n",
+ " 1070 | \n",
+ " 540 | \n",
+ " 1070 | \n",
+ " 23587 | \n",
+ " 32.0 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
20 rows × 90 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 \\\n",
+ "0 0 0 0 0 0 0 0 0 0 0 ... 429 449 11386 28260 824 \n",
+ "1 0 0 0 0 0 0 0 0 0 0 ... 11387 11387 5 277 215 \n",
+ "2 0 0 0 0 0 0 0 0 0 0 ... 104 9316 11 225 373 \n",
+ "3 0 0 0 0 0 0 0 0 0 0 ... 488 826 48 293 2608 \n",
+ "4 0 0 0 0 0 0 0 0 0 0 ... 393 11 23 373 9 \n",
+ "5 0 0 0 0 0 0 0 0 0 0 ... 186 58 7385 6729 36628 \n",
+ "6 0 0 0 0 0 0 0 0 0 0 ... 74 11791 23585 3139 3853 \n",
+ "7 0 0 0 0 0 0 0 0 0 0 ... 290 440 15 172 6031 \n",
+ "8 0 0 0 0 0 0 0 0 0 0 ... 1135 7630 7922 855 446 \n",
+ "9 0 0 0 0 0 0 0 0 0 0 ... 77 724 11 225 172 \n",
+ "10 0 0 0 0 0 0 0 0 0 0 ... 158 18 16 132 9 \n",
+ "11 0 0 0 0 0 0 0 0 0 0 ... 15 172 5006 3854 1064 \n",
+ "12 0 0 0 0 0 0 0 0 0 0 ... 3434 18 16 692 9 \n",
+ "13 0 0 0 0 0 0 0 0 0 0 ... 449 3784 23586 541 1664 \n",
+ "14 0 0 0 0 0 0 0 0 0 0 ... 1233 15 10 5105 26 \n",
+ "15 0 0 0 0 0 0 0 0 0 0 ... 400 3244 12042 11388 48 \n",
+ "16 0 0 0 0 0 0 0 0 0 0 ... 5616 5616 20414 5616 92 \n",
+ "17 0 0 0 0 0 0 0 0 0 0 ... 6032 20415 354 360 92 \n",
+ "18 0 0 0 0 0 0 0 0 0 0 ... 423 13 603 790 168 \n",
+ "19 0 0 0 0 0 0 0 0 0 0 ... 165 1671 15 540 23587 \n",
+ "\n",
+ " 85 86 87 88 89 \n",
+ "0 1664 30 8 11386 0.0 \n",
+ "1 5 249 1377 11387 15.0 \n",
+ "2 9 11 225 9316 14.0 \n",
+ "3 4718 48 523 6353 13.0 \n",
+ "4 11 23 50 5969 65.0 \n",
+ "5 14584 2103 14584 7385 15.0 \n",
+ "6 3139 571 74 11791 16.0 \n",
+ "7 1064 172 446 6031 24.0 \n",
+ "8 1135 446 7630 7922 12.0 \n",
+ "9 11 225 3124 2141 27.0 \n",
+ "10 18 16 2523 857 19.0 \n",
+ "11 172 446 5006 3854 30.0 \n",
+ "12 18 16 580 3434 34.0 \n",
+ "13 1734 1925 3784 23586 0.0 \n",
+ "14 209 260 26 5105 12.0 \n",
+ "15 400 48 3244 12042 24.0 \n",
+ "16 3734 92 5616 20414 30.0 \n",
+ "17 135 92 6032 20415 13.0 \n",
+ "18 886 212 8 6926 28.0 \n",
+ "19 1070 540 1070 23587 32.0 \n",
+ "\n",
+ "[20 rows x 90 columns]"
+ ]
+ },
+ "execution_count": 52,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df.head(20)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 53,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Index([], dtype='object')\n"
+ ]
+ }
+ ],
+ "source": [
+ "zero_columns = df.columns[df.eq(0).all()]\n",
+ "print(zero_columns)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 54,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(129971, 90)"
+ ]
+ },
+ "execution_count": 54,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 55,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+ "from sklearn.feature_extraction.text import CountVectorizer\n",
+ "vectorizer = CountVectorizer(\n",
+ " ngram_range=(1,1),\n",
+ " max_features=25\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 58,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df3=pd.get_dummies(label_train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 60,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 80 | \n",
+ " 81 | \n",
+ " 82 | \n",
+ " 83 | \n",
+ " 84 | \n",
+ " 85 | \n",
+ " 86 | \n",
+ " 87 | \n",
+ " 88 | \n",
+ " 89 | \n",
+ " ... | \n",
+ " 91 | \n",
+ " 92 | \n",
+ " 93 | \n",
+ " 94 | \n",
+ " 95 | \n",
+ " 96 | \n",
+ " 97 | \n",
+ " 98 | \n",
+ " 99 | \n",
+ " 100 | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " ... | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
10 rows × 21 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 80 81 82 83 84 85 86 87 88 89 ... 91 92 93 94 \\\n",
+ "0 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n",
+ "1 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n",
+ "4 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n",
+ "5 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n",
+ "6 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n",
+ "7 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n",
+ "8 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n",
+ "9 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 \n",
+ "\n",
+ " 95 96 97 98 99 100 \n",
+ "0 0 0 0 0 0 0 \n",
+ "1 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 0 \n",
+ "5 0 0 0 0 0 0 \n",
+ "6 0 0 0 0 0 0 \n",
+ "7 0 0 0 0 0 0 \n",
+ "8 0 0 0 0 0 0 \n",
+ "9 0 0 0 0 0 0 \n",
+ "\n",
+ "[10 rows x 21 columns]"
+ ]
+ },
+ "execution_count": 60,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df3.head(10)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 42,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Requirement already satisfied: tensorflow in c:\\users\\ysach\\anaconda3\\lib\\site-packages (2.15.0)\n",
+ "Requirement already satisfied: tensorflow-intel==2.15.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow) (2.15.0)\n",
+ "Requirement already satisfied: tensorboard<2.16,>=2.15 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (2.15.1)\n",
+ "Requirement already satisfied: tensorflow-estimator<2.16,>=2.15.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (2.15.0)\n",
+ "Requirement already satisfied: numpy<2.0.0,>=1.23.5 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (1.23.5)\n",
+ "Requirement already satisfied: six>=1.12.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (1.16.0)\n",
+ "Requirement already satisfied: google-pasta>=0.1.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (0.2.0)\n",
+ "Requirement already satisfied: absl-py>=1.0.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (1.4.0)\n",
+ "Requirement already satisfied: setuptools in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (65.6.3)\n",
+ "Requirement already satisfied: h5py>=2.9.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (3.7.0)\n",
+ "Requirement already satisfied: libclang>=13.0.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (16.0.6)\n",
+ "Requirement already satisfied: wrapt<1.15,>=1.11.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (1.14.1)\n",
+ "Requirement already satisfied: typing-extensions>=3.6.6 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (4.4.0)\n",
+ "Requirement already satisfied: keras<2.16,>=2.15.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (2.15.0)\n",
+ "Requirement already satisfied: termcolor>=1.1.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (2.4.0)\n",
+ "Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (0.5.4)\n",
+ "Requirement already satisfied: flatbuffers>=23.5.26 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (23.5.26)\n",
+ "Requirement already satisfied: grpcio<2.0,>=1.24.3 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (1.60.0)\n",
+ "Requirement already satisfied: opt-einsum>=2.3.2 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (3.3.0)\n",
+ "Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (0.31.0)\n",
+ "Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (3.20.3)\n",
+ "Requirement already satisfied: ml-dtypes~=0.2.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (0.2.0)\n",
+ "Requirement already satisfied: astunparse>=1.6.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (1.6.3)\n",
+ "Requirement already satisfied: packaging in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (22.0)\n",
+ "Requirement already satisfied: wheel<1.0,>=0.23.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from astunparse>=1.6.0->tensorflow-intel==2.15.0->tensorflow) (0.38.4)\n",
+ "Requirement already satisfied: google-auth-oauthlib<2,>=0.5 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (1.2.0)\n",
+ "Requirement already satisfied: requests<3,>=2.21.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2.28.1)\n",
+ "Requirement already satisfied: werkzeug>=1.0.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2.2.2)\n",
+ "Requirement already satisfied: markdown>=2.6.8 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (3.4.1)\n",
+ "Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (0.7.2)\n",
+ "Requirement already satisfied: google-auth<3,>=1.6.3 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2.25.2)\n",
+ "Requirement already satisfied: pyasn1-modules>=0.2.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (0.2.8)\n",
+ "Requirement already satisfied: rsa<5,>=3.1.4 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (4.9)\n",
+ "Requirement already satisfied: cachetools<6.0,>=2.0.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (5.3.2)\n",
+ "Requirement already satisfied: requests-oauthlib>=0.7.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from google-auth-oauthlib<2,>=0.5->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (1.3.1)\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2023.11.17)\n",
+ "Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (1.26.14)\n",
+ "Requirement already satisfied: charset-normalizer<3,>=2 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2.0.4)\n",
+ "Requirement already satisfied: idna<4,>=2.5 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (3.4)\n",
+ "Requirement already satisfied: MarkupSafe>=2.1.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from werkzeug>=1.0.1->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2.1.1)\n",
+ "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (0.4.8)\n",
+ "Requirement already satisfied: oauthlib>=3.0.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<2,>=0.5->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (3.2.2)\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install tensorflow"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 43,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import tensorflow as tf"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 62,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df1=df['89']"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 63,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import keras\n",
+ "from keras.models import Sequential\n",
+ "from keras.utils import to_categorical\n",
+ "from keras import metrics as metrics1\n",
+ "from keras.layers import LeakyReLU\n",
+ "from keras.layers import Dense, Embedding, GlobalAveragePooling1D, LSTM, Bidirectional,InputLayer"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 64,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "x_train1, x_test, y_train1, y_test = train_test_split(pad_seq, df3, train_size=0.7)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 65,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "WARNING:tensorflow:From c:\\Users\\ysach\\anaconda3\\lib\\site-packages\\keras\\src\\backend.py:873: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.\n",
+ "\n",
+ "WARNING:tensorflow:From c:\\Users\\ysach\\anaconda3\\lib\\site-packages\\keras\\src\\optimizers\\__init__.py:309: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "classifier = Sequential()\n",
+ "classifier.add(Embedding(vocabulary_size, 200, input_length=89))\n",
+ "classifier.add(GlobalAveragePooling1D())\n",
+ "classifier.add(Dense(100, activation='relu'))\n",
+ "classifier.add(Dense(50, activation='relu'))\n",
+ "classifier.add(Dense(21, activation='sigmoid'))\n",
+ "\n",
+ "# Compile the model\n",
+ "classifier.compile(optimizer='adam',\n",
+ " loss='categorical_crossentropy',\n",
+ " metrics=['accuracy'])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 66,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Model: \"sequential\"\n",
+ "_________________________________________________________________\n",
+ " Layer (type) Output Shape Param # \n",
+ "=================================================================\n",
+ " embedding (Embedding) (None, 89, 200) 12794200 \n",
+ " \n",
+ " global_average_pooling1d ( (None, 200) 0 \n",
+ " GlobalAveragePooling1D) \n",
+ " \n",
+ " dense (Dense) (None, 100) 20100 \n",
+ " \n",
+ " dense_1 (Dense) (None, 50) 5050 \n",
+ " \n",
+ " dense_2 (Dense) (None, 21) 1071 \n",
+ " \n",
+ "=================================================================\n",
+ "Total params: 12820421 (48.91 MB)\n",
+ "Trainable params: 12820421 (48.91 MB)\n",
+ "Non-trainable params: 0 (0.00 Byte)\n",
+ "_________________________________________________________________\n"
+ ]
+ }
+ ],
+ "source": [
+ "classifier.summary()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 67,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Epoch 1/10\n",
+ "WARNING:tensorflow:From c:\\Users\\ysach\\anaconda3\\lib\\site-packages\\keras\\src\\utils\\tf_utils.py:492: The name tf.ragged.RaggedTensorValue is deprecated. Please use tf.compat.v1.ragged.RaggedTensorValue instead.\n",
+ "\n",
+ "WARNING:tensorflow:From c:\\Users\\ysach\\anaconda3\\lib\\site-packages\\keras\\src\\engine\\base_layer_utils.py:384: The name tf.executing_eagerly_outside_functions is deprecated. Please use tf.compat.v1.executing_eagerly_outside_functions instead.\n",
+ "\n",
+ "2844/2844 [==============================] - 700s 246ms/step - loss: 1.9759 - accuracy: 0.2405 - val_loss: 1.8103 - val_accuracy: 0.2790\n",
+ "Epoch 2/10\n",
+ "2844/2844 [==============================] - 680s 239ms/step - loss: 1.6719 - accuracy: 0.3298 - val_loss: 1.7734 - val_accuracy: 0.2931\n",
+ "Epoch 3/10\n",
+ "2844/2844 [==============================] - 716s 252ms/step - loss: 1.5275 - accuracy: 0.3928 - val_loss: 1.7976 - val_accuracy: 0.3054\n",
+ "Epoch 4/10\n",
+ "2844/2844 [==============================] - 744s 262ms/step - loss: 1.3634 - accuracy: 0.4765 - val_loss: 1.8590 - val_accuracy: 0.3115\n",
+ "Epoch 5/10\n",
+ "2844/2844 [==============================] - 702s 247ms/step - loss: 1.1777 - accuracy: 0.5636 - val_loss: 2.0084 - val_accuracy: 0.3067\n",
+ "Epoch 6/10\n",
+ "2844/2844 [==============================] - 651s 229ms/step - loss: 1.0069 - accuracy: 0.6370 - val_loss: 2.1700 - val_accuracy: 0.3127\n",
+ "Epoch 7/10\n",
+ "2844/2844 [==============================] - 636s 224ms/step - loss: 0.8697 - accuracy: 0.6905 - val_loss: 2.3802 - val_accuracy: 0.3126\n",
+ "Epoch 8/10\n",
+ "2844/2844 [==============================] - 779s 274ms/step - loss: 0.7557 - accuracy: 0.7336 - val_loss: 2.5939 - val_accuracy: 0.3108\n",
+ "Epoch 9/10\n",
+ "2844/2844 [==============================] - 635s 223ms/step - loss: 0.6597 - accuracy: 0.7682 - val_loss: 2.8150 - val_accuracy: 0.3061\n",
+ "Epoch 10/10\n",
+ "2844/2844 [==============================] - 677s 238ms/step - loss: 0.5769 - accuracy: 0.7990 - val_loss: 3.1260 - val_accuracy: 0.3100\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 67,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "classifier.fit(x_train1,y_train1,epochs=10,validation_data=(x_test, y_test))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 71,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1219/1219 [==============================] - 1s 917us/step\n"
+ ]
+ }
+ ],
+ "source": [
+ "Y_pred = classifier.predict(x_test)\n",
+ "a=[]\n",
+ "for x in Y_pred:\n",
+ " a.append(80 +np.argmax(x))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 72,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[85,\n",
+ " 85,\n",
+ " 90,\n",
+ " 88,\n",
+ " 87,\n",
+ " 86,\n",
+ " 84,\n",
+ " 91,\n",
+ " 91,\n",
+ " 89,\n",
+ " 92,\n",
+ " 92,\n",
+ " 90,\n",
+ " 87,\n",
+ " 86,\n",
+ " 89,\n",
+ " 87,\n",
+ " 99,\n",
+ " 95,\n",
+ " 86,\n",
+ " 90,\n",
+ " 85,\n",
+ " 94,\n",
+ " 93,\n",
+ " 88,\n",
+ " 85,\n",
+ " 88,\n",
+ " 89,\n",
+ " 87,\n",
+ " 90,\n",
+ " 86,\n",
+ " 89,\n",
+ " 85,\n",
+ " 84,\n",
+ " 90,\n",
+ " 85,\n",
+ " 90,\n",
+ " 89,\n",
+ " 85,\n",
+ " 89,\n",
+ " 87,\n",
+ " 86,\n",
+ " 86,\n",
+ " 89,\n",
+ " 84,\n",
+ " 92,\n",
+ " 90,\n",
+ " 92,\n",
+ " 92,\n",
+ " 87,\n",
+ " 85,\n",
+ " 89,\n",
+ " 84,\n",
+ " 90,\n",
+ " 90,\n",
+ " 88,\n",
+ " 86,\n",
+ " 91,\n",
+ " 88,\n",
+ " 93,\n",
+ " 89,\n",
+ " 88,\n",
+ " 85,\n",
+ " 88,\n",
+ " 87,\n",
+ " 90,\n",
+ " 83,\n",
+ " 88,\n",
+ " 91,\n",
+ " 84,\n",
+ " 94,\n",
+ " 87,\n",
+ " 91,\n",
+ " 89,\n",
+ " 88,\n",
+ " 92,\n",
+ " 86,\n",
+ " 92,\n",
+ " 84,\n",
+ " 93,\n",
+ " 89,\n",
+ " 87,\n",
+ " 84,\n",
+ " 89,\n",
+ " 93,\n",
+ " 89,\n",
+ " 88,\n",
+ " 88,\n",
+ " 93,\n",
+ " 94,\n",
+ " 91,\n",
+ " 92,\n",
+ " 93,\n",
+ " 87,\n",
+ " 87,\n",
+ " 82,\n",
+ " 92,\n",
+ " 84,\n",
+ " 92,\n",
+ " 90,\n",
+ " 90,\n",
+ " 93,\n",
+ " 91,\n",
+ " 94,\n",
+ " 93,\n",
+ " 93,\n",
+ " 90,\n",
+ " 90,\n",
+ " 88,\n",
+ " 87,\n",
+ " 83,\n",
+ " 85,\n",
+ " 85,\n",
+ " 92,\n",
+ " 90,\n",
+ " 85,\n",
+ " 93,\n",
+ " 87,\n",
+ " 86,\n",
+ " 89,\n",
+ " 88,\n",
+ " 85,\n",
+ " 92,\n",
+ " 90,\n",
+ " 85,\n",
+ " 94,\n",
+ " 90,\n",
+ " 86,\n",
+ " 91,\n",
+ " 89,\n",
+ " 88,\n",
+ " 88,\n",
+ " 84,\n",
+ " 82,\n",
+ " 85,\n",
+ " 91,\n",
+ " 87,\n",
+ " 88,\n",
+ " 89,\n",
+ " 92,\n",
+ " 89,\n",
+ " 85,\n",
+ " 91,\n",
+ " 88,\n",
+ " 87,\n",
+ " 89,\n",
+ " 88,\n",
+ " 88,\n",
+ " 87,\n",
+ " 95,\n",
+ " 86,\n",
+ " 96,\n",
+ " 88,\n",
+ " 86,\n",
+ " 90,\n",
+ " 90,\n",
+ " 84,\n",
+ " 85,\n",
+ " 89,\n",
+ " 88,\n",
+ " 81,\n",
+ " 88,\n",
+ " 89,\n",
+ " 88,\n",
+ " 88,\n",
+ " 96,\n",
+ " 88,\n",
+ " 91,\n",
+ " 86,\n",
+ " 87,\n",
+ " 92,\n",
+ " 91,\n",
+ " 82,\n",
+ " 90,\n",
+ " 94,\n",
+ " 92,\n",
+ " 90,\n",
+ " 84,\n",
+ " 87,\n",
+ " 89,\n",
+ " 90,\n",
+ " 88,\n",
+ " 87,\n",
+ " 86,\n",
+ " 90,\n",
+ " 92,\n",
+ " 88,\n",
+ " 93,\n",
+ " 89,\n",
+ " 84,\n",
+ " 87,\n",
+ " 87,\n",
+ " 91,\n",
+ " 88,\n",
+ " 84,\n",
+ " 89,\n",
+ " 93,\n",
+ " 88,\n",
+ " 92,\n",
+ " 88,\n",
+ " 86,\n",
+ " 92,\n",
+ " 95,\n",
+ " 92,\n",
+ " 86,\n",
+ " 92,\n",
+ " 87,\n",
+ " 85,\n",
+ " 86,\n",
+ " 84,\n",
+ " 92,\n",
+ " 88,\n",
+ " 90,\n",
+ " 89,\n",
+ " 86,\n",
+ " 93,\n",
+ " 86,\n",
+ " 89,\n",
+ " 90,\n",
+ " 88,\n",
+ " 90,\n",
+ " 90,\n",
+ " 90,\n",
+ " 87,\n",
+ " 87,\n",
+ " 87,\n",
+ " 89,\n",
+ " 87,\n",
+ " 87,\n",
+ " 90,\n",
+ " 84,\n",
+ " 92,\n",
+ " 92,\n",
+ " 85,\n",
+ " 85,\n",
+ " 89,\n",
+ " 90,\n",
+ " 87,\n",
+ " 84,\n",
+ " 93,\n",
+ " 90,\n",
+ " 89,\n",
+ " 86,\n",
+ " 87,\n",
+ " 90,\n",
+ " 86,\n",
+ " 88,\n",
+ " 90,\n",
+ " 89,\n",
+ " 94,\n",
+ " 86,\n",
+ " 84,\n",
+ " 90,\n",
+ " 85,\n",
+ " 90,\n",
+ " 88,\n",
+ " 92,\n",
+ " 90,\n",
+ " 87,\n",
+ " 94,\n",
+ " 88,\n",
+ " 89,\n",
+ " 90,\n",
+ " 91,\n",
+ " 88,\n",
+ " 85,\n",
+ " 88,\n",
+ " 89,\n",
+ " 85,\n",
+ " 95,\n",
+ " 90,\n",
+ " 91,\n",
+ " 88,\n",
+ " 92,\n",
+ " 90,\n",
+ " 92,\n",
+ " 91,\n",
+ " 88,\n",
+ " 90,\n",
+ " 86,\n",
+ " 88,\n",
+ " 90,\n",
+ " 87,\n",
+ " 86,\n",
+ " 84,\n",
+ " 85,\n",
+ " 85,\n",
+ " 85,\n",
+ " 87,\n",
+ " 86,\n",
+ " 87,\n",
+ " 85,\n",
+ " 94,\n",
+ " 90,\n",
+ " 84,\n",
+ " 93,\n",
+ " 88,\n",
+ " 85,\n",
+ " 88,\n",
+ " 84,\n",
+ " 85,\n",
+ " 88,\n",
+ " 89,\n",
+ " 88,\n",
+ " 89,\n",
+ " 94,\n",
+ " 90,\n",
+ " 94,\n",
+ " 86,\n",
+ " 90,\n",
+ " 87,\n",
+ " 86,\n",
+ " 91,\n",
+ " 92,\n",
+ " 85,\n",
+ " 90,\n",
+ " 85,\n",
+ " 93,\n",
+ " 91,\n",
+ " 94,\n",
+ " 89,\n",
+ " 85,\n",
+ " 88,\n",
+ " 95,\n",
+ " 88,\n",
+ " 92,\n",
+ " 90,\n",
+ " 90,\n",
+ " 90,\n",
+ " 93,\n",
+ " 90,\n",
+ " 93,\n",
+ " 90,\n",
+ " 91,\n",
+ " 86,\n",
+ " 84,\n",
+ " 86,\n",
+ " 91,\n",
+ " 84,\n",
+ " 86,\n",
+ " 88,\n",
+ " 89,\n",
+ " 88,\n",
+ " 87,\n",
+ " 85,\n",
+ " 94,\n",
+ " 90,\n",
+ " 88,\n",
+ " 85,\n",
+ " 90,\n",
+ " 86,\n",
+ " 88,\n",
+ " 84,\n",
+ " 89,\n",
+ " 87,\n",
+ " 91,\n",
+ " 83,\n",
+ " 90,\n",
+ " 89,\n",
+ " 88,\n",
+ " 85,\n",
+ " 87,\n",
+ " 84,\n",
+ " 89,\n",
+ " 86,\n",
+ " 88,\n",
+ " 91,\n",
+ " 85,\n",
+ " 88,\n",
+ " 90,\n",
+ " 92,\n",
+ " 85,\n",
+ " 89,\n",
+ " 85,\n",
+ " 95,\n",
+ " 90,\n",
+ " 86,\n",
+ " 95,\n",
+ " 90,\n",
+ " 88,\n",
+ " 90,\n",
+ " 95,\n",
+ " 87,\n",
+ " 88,\n",
+ " 84,\n",
+ " 85,\n",
+ " 87,\n",
+ " 84,\n",
+ " 85,\n",
+ " 91,\n",
+ " 90,\n",
+ " 85,\n",
+ " 87,\n",
+ " 88,\n",
+ " 89,\n",
+ " 87,\n",
+ " 94,\n",
+ " 90,\n",
+ " 86,\n",
+ " 86,\n",
+ " 91,\n",
+ " 90,\n",
+ " 89,\n",
+ " 90,\n",
+ " 88,\n",
+ " 88,\n",
+ " 85,\n",
+ " 90,\n",
+ " 93,\n",
+ " 91,\n",
+ " 84,\n",
+ " 85,\n",
+ " 92,\n",
+ " 95,\n",
+ " 89,\n",
+ " 86,\n",
+ " 85,\n",
+ " 90,\n",
+ " 87,\n",
+ " 93,\n",
+ " 86,\n",
+ " 84,\n",
+ " 88,\n",
+ " 85,\n",
+ " 92,\n",
+ " 89,\n",
+ " 95,\n",
+ " 88,\n",
+ " 89,\n",
+ " 91,\n",
+ " 89,\n",
+ " 89,\n",
+ " 86,\n",
+ " 85,\n",
+ " 86,\n",
+ " 87,\n",
+ " 91,\n",
+ " 88,\n",
+ " 91,\n",
+ " 96,\n",
+ " 85,\n",
+ " 93,\n",
+ " 87,\n",
+ " 90,\n",
+ " 91,\n",
+ " 85,\n",
+ " 89,\n",
+ " 90,\n",
+ " 90,\n",
+ " 89,\n",
+ " 86,\n",
+ " 86,\n",
+ " 91,\n",
+ " 84,\n",
+ " 82,\n",
+ " 90,\n",
+ " 88,\n",
+ " 88,\n",
+ " 95,\n",
+ " 92,\n",
+ " 88,\n",
+ " 90,\n",
+ " 94,\n",
+ " 88,\n",
+ " 93,\n",
+ " 84,\n",
+ " 90,\n",
+ " 90,\n",
+ " 91,\n",
+ " 88,\n",
+ " 91,\n",
+ " 93,\n",
+ " 90,\n",
+ " 88,\n",
+ " 85,\n",
+ " 84,\n",
+ " 92,\n",
+ " 87,\n",
+ " 87,\n",
+ " 85,\n",
+ " 90,\n",
+ " 86,\n",
+ " 91,\n",
+ " 88,\n",
+ " 83,\n",
+ " 85,\n",
+ " 92,\n",
+ " 88,\n",
+ " 85,\n",
+ " 88,\n",
+ " 92,\n",
+ " 87,\n",
+ " 95,\n",
+ " 90,\n",
+ " 90,\n",
+ " 84,\n",
+ " 89,\n",
+ " 86,\n",
+ " 85,\n",
+ " 97,\n",
+ " 92,\n",
+ " 89,\n",
+ " 92,\n",
+ " 89,\n",
+ " 89,\n",
+ " 90,\n",
+ " 89,\n",
+ " 91,\n",
+ " 92,\n",
+ " 90,\n",
+ " 86,\n",
+ " 88,\n",
+ " 88,\n",
+ " 88,\n",
+ " 87,\n",
+ " 89,\n",
+ " 89,\n",
+ " 88,\n",
+ " 89,\n",
+ " 90,\n",
+ " 86,\n",
+ " 90,\n",
+ " 91,\n",
+ " 92,\n",
+ " 94,\n",
+ " 87,\n",
+ " 90,\n",
+ " 88,\n",
+ " 88,\n",
+ " 90,\n",
+ " 90,\n",
+ " 82,\n",
+ " 90,\n",
+ " 84,\n",
+ " 93,\n",
+ " 92,\n",
+ " 91,\n",
+ " 87,\n",
+ " 88,\n",
+ " 90,\n",
+ " 90,\n",
+ " 86,\n",
+ " 91,\n",
+ " 89,\n",
+ " 84,\n",
+ " 83,\n",
+ " 89,\n",
+ " 85,\n",
+ " 90,\n",
+ " 90,\n",
+ " 90,\n",
+ " 90,\n",
+ " 84,\n",
+ " 90,\n",
+ " 90,\n",
+ " 84,\n",
+ " 87,\n",
+ " 92,\n",
+ " 87,\n",
+ " 83,\n",
+ " 89,\n",
+ " 87,\n",
+ " 85,\n",
+ " 85,\n",
+ " 90,\n",
+ " 91,\n",
+ " 94,\n",
+ " 91,\n",
+ " 87,\n",
+ " 85,\n",
+ " 85,\n",
+ " 93,\n",
+ " 88,\n",
+ " 85,\n",
+ " 85,\n",
+ " 88,\n",
+ " 88,\n",
+ " 87,\n",
+ " 92,\n",
+ " 90,\n",
+ " 90,\n",
+ " 89,\n",
+ " 88,\n",
+ " 90,\n",
+ " 90,\n",
+ " 84,\n",
+ " 89,\n",
+ " 85,\n",
+ " 90,\n",
+ " 89,\n",
+ " 94,\n",
+ " 85,\n",
+ " 88,\n",
+ " 88,\n",
+ " 90,\n",
+ " 93,\n",
+ " 94,\n",
+ " 88,\n",
+ " 88,\n",
+ " 92,\n",
+ " 91,\n",
+ " 92,\n",
+ " 91,\n",
+ " 87,\n",
+ " 91,\n",
+ " 95,\n",
+ " 88,\n",
+ " 83,\n",
+ " 84,\n",
+ " 91,\n",
+ " 84,\n",
+ " 90,\n",
+ " 87,\n",
+ " 91,\n",
+ " 88,\n",
+ " 85,\n",
+ " 86,\n",
+ " 92,\n",
+ " 87,\n",
+ " 88,\n",
+ " 90,\n",
+ " 86,\n",
+ " 87,\n",
+ " 87,\n",
+ " 88,\n",
+ " 91,\n",
+ " 88,\n",
+ " 89,\n",
+ " 88,\n",
+ " 87,\n",
+ " 91,\n",
+ " 84,\n",
+ " 92,\n",
+ " 86,\n",
+ " 91,\n",
+ " 92,\n",
+ " 93,\n",
+ " 87,\n",
+ " 85,\n",
+ " 86,\n",
+ " 87,\n",
+ " 90,\n",
+ " 86,\n",
+ " 94,\n",
+ " 92,\n",
+ " 84,\n",
+ " 85,\n",
+ " 91,\n",
+ " 89,\n",
+ " 89,\n",
+ " 84,\n",
+ " 90,\n",
+ " 90,\n",
+ " 85,\n",
+ " 90,\n",
+ " 88,\n",
+ " 89,\n",
+ " 94,\n",
+ " 87,\n",
+ " 89,\n",
+ " 90,\n",
+ " 87,\n",
+ " 88,\n",
+ " 89,\n",
+ " 92,\n",
+ " 89,\n",
+ " 91,\n",
+ " 91,\n",
+ " 85,\n",
+ " 85,\n",
+ " 94,\n",
+ " 89,\n",
+ " 86,\n",
+ " 87,\n",
+ " 82,\n",
+ " 89,\n",
+ " 85,\n",
+ " 90,\n",
+ " 88,\n",
+ " 90,\n",
+ " 93,\n",
+ " 87,\n",
+ " 90,\n",
+ " 93,\n",
+ " 87,\n",
+ " 86,\n",
+ " 84,\n",
+ " 91,\n",
+ " 88,\n",
+ " 89,\n",
+ " 87,\n",
+ " 86,\n",
+ " 86,\n",
+ " 85,\n",
+ " 85,\n",
+ " 87,\n",
+ " 88,\n",
+ " 90,\n",
+ " 92,\n",
+ " 88,\n",
+ " 92,\n",
+ " 92,\n",
+ " 84,\n",
+ " 93,\n",
+ " 90,\n",
+ " 85,\n",
+ " 87,\n",
+ " 85,\n",
+ " 84,\n",
+ " 92,\n",
+ " 87,\n",
+ " 87,\n",
+ " 89,\n",
+ " 84,\n",
+ " 88,\n",
+ " 84,\n",
+ " 87,\n",
+ " 87,\n",
+ " 87,\n",
+ " 90,\n",
+ " 87,\n",
+ " 88,\n",
+ " 85,\n",
+ " 86,\n",
+ " 90,\n",
+ " 92,\n",
+ " 87,\n",
+ " 90,\n",
+ " 88,\n",
+ " 90,\n",
+ " 89,\n",
+ " 92,\n",
+ " 85,\n",
+ " 88,\n",
+ " 87,\n",
+ " 88,\n",
+ " 88,\n",
+ " 89,\n",
+ " 94,\n",
+ " 89,\n",
+ " 92,\n",
+ " 85,\n",
+ " 87,\n",
+ " 94,\n",
+ " 92,\n",
+ " 85,\n",
+ " 90,\n",
+ " 89,\n",
+ " 90,\n",
+ " 90,\n",
+ " 87,\n",
+ " 92,\n",
+ " 89,\n",
+ " 90,\n",
+ " 88,\n",
+ " 88,\n",
+ " 87,\n",
+ " 93,\n",
+ " 90,\n",
+ " 87,\n",
+ " 86,\n",
+ " 90,\n",
+ " 93,\n",
+ " 90,\n",
+ " 94,\n",
+ " 84,\n",
+ " 90,\n",
+ " 92,\n",
+ " 88,\n",
+ " 84,\n",
+ " 90,\n",
+ " 85,\n",
+ " 85,\n",
+ " 92,\n",
+ " 91,\n",
+ " 90,\n",
+ " 91,\n",
+ " 91,\n",
+ " 90,\n",
+ " 93,\n",
+ " 86,\n",
+ " 88,\n",
+ " 94,\n",
+ " 90,\n",
+ " 84,\n",
+ " 86,\n",
+ " 88,\n",
+ " 88,\n",
+ " 92,\n",
+ " 93,\n",
+ " 84,\n",
+ " 86,\n",
+ " 89,\n",
+ " 87,\n",
+ " 92,\n",
+ " 90,\n",
+ " 95,\n",
+ " 92,\n",
+ " 92,\n",
+ " 90,\n",
+ " 87,\n",
+ " 93,\n",
+ " 90,\n",
+ " 84,\n",
+ " 86,\n",
+ " 85,\n",
+ " 85,\n",
+ " 87,\n",
+ " 87,\n",
+ " 89,\n",
+ " 93,\n",
+ " 90,\n",
+ " 92,\n",
+ " 92,\n",
+ " 90,\n",
+ " 85,\n",
+ " 87,\n",
+ " 88,\n",
+ " 85,\n",
+ " 89,\n",
+ " 96,\n",
+ " 91,\n",
+ " 88,\n",
+ " 85,\n",
+ " 87,\n",
+ " 86,\n",
+ " 90,\n",
+ " 89,\n",
+ " 99,\n",
+ " 93,\n",
+ " 93,\n",
+ " 87,\n",
+ " 86,\n",
+ " 94,\n",
+ " 91,\n",
+ " 90,\n",
+ " 85,\n",
+ " 90,\n",
+ " 92,\n",
+ " 88,\n",
+ " 90,\n",
+ " 85,\n",
+ " 88,\n",
+ " 91,\n",
+ " 85,\n",
+ " 91,\n",
+ " 90,\n",
+ " 91,\n",
+ " 90,\n",
+ " 89,\n",
+ " 85,\n",
+ " 83,\n",
+ " 91,\n",
+ " 90,\n",
+ " 90,\n",
+ " 93,\n",
+ " 86,\n",
+ " 84,\n",
+ " 87,\n",
+ " 93,\n",
+ " 90,\n",
+ " 92,\n",
+ " 84,\n",
+ " 90,\n",
+ " 88,\n",
+ " 90,\n",
+ " 94,\n",
+ " 94,\n",
+ " 90,\n",
+ " 88,\n",
+ " 87,\n",
+ " 87,\n",
+ " 85,\n",
+ " 86,\n",
+ " 86,\n",
+ " 92,\n",
+ " 85,\n",
+ " 89,\n",
+ " 86,\n",
+ " 87,\n",
+ " 88,\n",
+ " 85,\n",
+ " 89,\n",
+ " 91,\n",
+ " 90,\n",
+ " 84,\n",
+ " 92,\n",
+ " 88,\n",
+ " 92,\n",
+ " 85,\n",
+ " 91,\n",
+ " 84,\n",
+ " 90,\n",
+ " 93,\n",
+ " 92,\n",
+ " 85,\n",
+ " 85,\n",
+ " 88,\n",
+ " 85,\n",
+ " 90,\n",
+ " 91,\n",
+ " 83,\n",
+ " 95,\n",
+ " 87,\n",
+ " 85,\n",
+ " 94,\n",
+ " 91,\n",
+ " 94,\n",
+ " 86,\n",
+ " 85,\n",
+ " 94,\n",
+ " 90,\n",
+ " 89,\n",
+ " 84,\n",
+ " 88,\n",
+ " 89,\n",
+ " 89,\n",
+ " 88,\n",
+ " 90,\n",
+ " 87,\n",
+ " 88,\n",
+ " 87,\n",
+ " 95,\n",
+ " 92,\n",
+ " 87,\n",
+ " 90,\n",
+ " 90,\n",
+ " 92,\n",
+ " 84,\n",
+ " 84,\n",
+ " 83,\n",
+ " 91,\n",
+ " 87,\n",
+ " 92,\n",
+ " 90,\n",
+ " 89,\n",
+ " 88,\n",
+ " 90,\n",
+ " 90,\n",
+ " 86,\n",
+ " 86,\n",
+ " 88,\n",
+ " 85,\n",
+ " 85,\n",
+ " 87,\n",
+ " 85,\n",
+ " 87,\n",
+ " 98,\n",
+ " 90,\n",
+ " 87,\n",
+ " 88,\n",
+ " 82,\n",
+ " 90,\n",
+ " 85,\n",
+ " 90,\n",
+ " 84,\n",
+ " 88,\n",
+ " 88,\n",
+ " 87,\n",
+ " 88,\n",
+ " 88,\n",
+ " 90,\n",
+ " 90,\n",
+ " 89,\n",
+ " 88,\n",
+ " 87,\n",
+ " 84,\n",
+ " 87,\n",
+ " 85,\n",
+ " 92,\n",
+ " 88,\n",
+ " 95,\n",
+ " 88,\n",
+ " 85,\n",
+ " 89,\n",
+ " 87,\n",
+ " 91,\n",
+ " 90,\n",
+ " 88,\n",
+ " 89,\n",
+ " 87,\n",
+ " 90,\n",
+ " 93,\n",
+ " 90,\n",
+ " 89,\n",
+ " 94,\n",
+ " 86,\n",
+ " 87,\n",
+ " 89,\n",
+ " 92,\n",
+ " 90,\n",
+ " 87,\n",
+ " 89,\n",
+ " 84,\n",
+ " 92,\n",
+ " 95,\n",
+ " 93,\n",
+ " 85,\n",
+ " 90,\n",
+ " 83,\n",
+ " ...]"
+ ]
+ },
+ "execution_count": 72,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "a"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 74,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from keras.layers import SimpleRNN,LSTM"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 75,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "model = Sequential()\n",
+ "model.add(Embedding(vocabulary_size, 100, input_length=input_length))\n",
+ "model.add(SimpleRNN(units=30, return_sequences=False))\n",
+ "model.add(Dense(units=21))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 76,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "model.compile(optimizer='adam',\n",
+ " loss='categorical_crossentropy',\n",
+ " metrics=['accuracy'])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 77,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Model: \"sequential_1\"\n",
+ "_________________________________________________________________\n",
+ " Layer (type) Output Shape Param # \n",
+ "=================================================================\n",
+ " embedding_1 (Embedding) (None, 89, 100) 6397100 \n",
+ " \n",
+ " simple_rnn (SimpleRNN) (None, 30) 3930 \n",
+ " \n",
+ " dense_3 (Dense) (None, 21) 651 \n",
+ " \n",
+ "=================================================================\n",
+ "Total params: 6401681 (24.42 MB)\n",
+ "Trainable params: 6401681 (24.42 MB)\n",
+ "Non-trainable params: 0 (0.00 Byte)\n",
+ "_________________________________________________________________\n"
+ ]
+ }
+ ],
+ "source": [
+ "model.summary()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 81,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Epoch 1/10\n",
+ "2844/2844 [==============================] - 352s 124ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n",
+ "Epoch 2/10\n",
+ "2844/2844 [==============================] - 413s 145ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n",
+ "Epoch 3/10\n",
+ "2844/2844 [==============================] - 342s 120ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n",
+ "Epoch 4/10\n",
+ "2844/2844 [==============================] - 344s 121ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n",
+ "Epoch 5/10\n",
+ "2844/2844 [==============================] - 342s 120ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n",
+ "Epoch 6/10\n",
+ "2844/2844 [==============================] - 353s 124ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n",
+ "Epoch 7/10\n",
+ "2844/2844 [==============================] - 378s 133ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n",
+ "Epoch 8/10\n",
+ "2844/2844 [==============================] - 397s 140ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n",
+ "Epoch 9/10\n",
+ "2844/2844 [==============================] - 347s 122ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n",
+ "Epoch 10/10\n",
+ "2844/2844 [==============================] - 357s 126ms/step - loss: 8.0596 - accuracy: 0.0731 - val_loss: 8.0594 - val_accuracy: 0.0759\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 81,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "model.fit(x_train1,y_train1,epochs=10,validation_data=(x_test, y_test))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 82,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1219/1219 [==============================] - 1s 889us/step\n"
+ ]
+ }
+ ],
+ "source": [
+ "Y_pred1 = classifier.predict(x_test)\n",
+ "a1=[]\n",
+ "for x in Y_pred1:\n",
+ " a1.append(80 +np.argmax(x))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 83,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[85,\n",
+ " 85,\n",
+ " 90,\n",
+ " 88,\n",
+ " 87,\n",
+ " 86,\n",
+ " 84,\n",
+ " 91,\n",
+ " 91,\n",
+ " 89,\n",
+ " 92,\n",
+ " 92,\n",
+ " 90,\n",
+ " 87,\n",
+ " 86,\n",
+ " 89,\n",
+ " 87,\n",
+ " 99,\n",
+ " 95,\n",
+ " 86,\n",
+ " 90,\n",
+ " 85,\n",
+ " 94,\n",
+ " 93,\n",
+ " 88,\n",
+ " 85,\n",
+ " 88,\n",
+ " 89,\n",
+ " 87,\n",
+ " 90,\n",
+ " 86,\n",
+ " 89,\n",
+ " 85,\n",
+ " 84,\n",
+ " 90,\n",
+ " 85,\n",
+ " 90,\n",
+ " 89,\n",
+ " 85,\n",
+ " 89,\n",
+ " 87,\n",
+ " 86,\n",
+ " 86,\n",
+ " 89,\n",
+ " 84,\n",
+ " 92,\n",
+ " 90,\n",
+ " 92,\n",
+ " 92,\n",
+ " 87,\n",
+ " 85,\n",
+ " 89,\n",
+ " 84,\n",
+ " 90,\n",
+ " 90,\n",
+ " 88,\n",
+ " 86,\n",
+ " 91,\n",
+ " 88,\n",
+ " 93,\n",
+ " 89,\n",
+ " 88,\n",
+ " 85,\n",
+ " 88,\n",
+ " 87,\n",
+ " 90,\n",
+ " 83,\n",
+ " 88,\n",
+ " 91,\n",
+ " 84,\n",
+ " 94,\n",
+ " 87,\n",
+ " 91,\n",
+ " 89,\n",
+ " 88,\n",
+ " 92,\n",
+ " 86,\n",
+ " 92,\n",
+ " 84,\n",
+ " 93,\n",
+ " 89,\n",
+ " 87,\n",
+ " 84,\n",
+ " 89,\n",
+ " 93,\n",
+ " 89,\n",
+ " 88,\n",
+ " 88,\n",
+ " 93,\n",
+ " 94,\n",
+ " 91,\n",
+ " 92,\n",
+ " 93,\n",
+ " 87,\n",
+ " 87,\n",
+ " 82,\n",
+ " 92,\n",
+ " 84,\n",
+ " 92,\n",
+ " 90,\n",
+ " 90,\n",
+ " 93,\n",
+ " 91,\n",
+ " 94,\n",
+ " 93,\n",
+ " 93,\n",
+ " 90,\n",
+ " 90,\n",
+ " 88,\n",
+ " 87,\n",
+ " 83,\n",
+ " 85,\n",
+ " 85,\n",
+ " 92,\n",
+ " 90,\n",
+ " 85,\n",
+ " 93,\n",
+ " 87,\n",
+ " 86,\n",
+ " 89,\n",
+ " 88,\n",
+ " 85,\n",
+ " 92,\n",
+ " 90,\n",
+ " 85,\n",
+ " 94,\n",
+ " 90,\n",
+ " 86,\n",
+ " 91,\n",
+ " 89,\n",
+ " 88,\n",
+ " 88,\n",
+ " 84,\n",
+ " 82,\n",
+ " 85,\n",
+ " 91,\n",
+ " 87,\n",
+ " 88,\n",
+ " 89,\n",
+ " 92,\n",
+ " 89,\n",
+ " 85,\n",
+ " 91,\n",
+ " 88,\n",
+ " 87,\n",
+ " 89,\n",
+ " 88,\n",
+ " 88,\n",
+ " 87,\n",
+ " 95,\n",
+ " 86,\n",
+ " 96,\n",
+ " 88,\n",
+ " 86,\n",
+ " 90,\n",
+ " 90,\n",
+ " 84,\n",
+ " 85,\n",
+ " 89,\n",
+ " 88,\n",
+ " 81,\n",
+ " 88,\n",
+ " 89,\n",
+ " 88,\n",
+ " 88,\n",
+ " 96,\n",
+ " 88,\n",
+ " 91,\n",
+ " 86,\n",
+ " 87,\n",
+ " 92,\n",
+ " 91,\n",
+ " 82,\n",
+ " 90,\n",
+ " 94,\n",
+ " 92,\n",
+ " 90,\n",
+ " 84,\n",
+ " 87,\n",
+ " 89,\n",
+ " 90,\n",
+ " 88,\n",
+ " 87,\n",
+ " 86,\n",
+ " 90,\n",
+ " 92,\n",
+ " 88,\n",
+ " 93,\n",
+ " 89,\n",
+ " 84,\n",
+ " 87,\n",
+ " 87,\n",
+ " 91,\n",
+ " 88,\n",
+ " 84,\n",
+ " 89,\n",
+ " 93,\n",
+ " 88,\n",
+ " 92,\n",
+ " 88,\n",
+ " 86,\n",
+ " 92,\n",
+ " 95,\n",
+ " 92,\n",
+ " 86,\n",
+ " 92,\n",
+ " 87,\n",
+ " 85,\n",
+ " 86,\n",
+ " 84,\n",
+ " 92,\n",
+ " 88,\n",
+ " 90,\n",
+ " 89,\n",
+ " 86,\n",
+ " 93,\n",
+ " 86,\n",
+ " 89,\n",
+ " 90,\n",
+ " 88,\n",
+ " 90,\n",
+ " 90,\n",
+ " 90,\n",
+ " 87,\n",
+ " 87,\n",
+ " 87,\n",
+ " 89,\n",
+ " 87,\n",
+ " 87,\n",
+ " 90,\n",
+ " 84,\n",
+ " 92,\n",
+ " 92,\n",
+ " 85,\n",
+ " 85,\n",
+ " 89,\n",
+ " 90,\n",
+ " 87,\n",
+ " 84,\n",
+ " 93,\n",
+ " 90,\n",
+ " 89,\n",
+ " 86,\n",
+ " 87,\n",
+ " 90,\n",
+ " 86,\n",
+ " 88,\n",
+ " 90,\n",
+ " 89,\n",
+ " 94,\n",
+ " 86,\n",
+ " 84,\n",
+ " 90,\n",
+ " 85,\n",
+ " 90,\n",
+ " 88,\n",
+ " 92,\n",
+ " 90,\n",
+ " 87,\n",
+ " 94,\n",
+ " 88,\n",
+ " 89,\n",
+ " 90,\n",
+ " 91,\n",
+ " 88,\n",
+ " 85,\n",
+ " 88,\n",
+ " 89,\n",
+ " 85,\n",
+ " 95,\n",
+ " 90,\n",
+ " 91,\n",
+ " 88,\n",
+ " 92,\n",
+ " 90,\n",
+ " 92,\n",
+ " 91,\n",
+ " 88,\n",
+ " 90,\n",
+ " 86,\n",
+ " 88,\n",
+ " 90,\n",
+ " 87,\n",
+ " 86,\n",
+ " 84,\n",
+ " 85,\n",
+ " 85,\n",
+ " 85,\n",
+ " 87,\n",
+ " 86,\n",
+ " 87,\n",
+ " 85,\n",
+ " 94,\n",
+ " 90,\n",
+ " 84,\n",
+ " 93,\n",
+ " 88,\n",
+ " 85,\n",
+ " 88,\n",
+ " 84,\n",
+ " 85,\n",
+ " 88,\n",
+ " 89,\n",
+ " 88,\n",
+ " 89,\n",
+ " 94,\n",
+ " 90,\n",
+ " 94,\n",
+ " 86,\n",
+ " 90,\n",
+ " 87,\n",
+ " 86,\n",
+ " 91,\n",
+ " 92,\n",
+ " 85,\n",
+ " 90,\n",
+ " 85,\n",
+ " 93,\n",
+ " 91,\n",
+ " 94,\n",
+ " 89,\n",
+ " 85,\n",
+ " 88,\n",
+ " 95,\n",
+ " 88,\n",
+ " 92,\n",
+ " 90,\n",
+ " 90,\n",
+ " 90,\n",
+ " 93,\n",
+ " 90,\n",
+ " 93,\n",
+ " 90,\n",
+ " 91,\n",
+ " 86,\n",
+ " 84,\n",
+ " 86,\n",
+ " 91,\n",
+ " 84,\n",
+ " 86,\n",
+ " 88,\n",
+ " 89,\n",
+ " 88,\n",
+ " 87,\n",
+ " 85,\n",
+ " 94,\n",
+ " 90,\n",
+ " 88,\n",
+ " 85,\n",
+ " 90,\n",
+ " 86,\n",
+ " 88,\n",
+ " 84,\n",
+ " 89,\n",
+ " 87,\n",
+ " 91,\n",
+ " 83,\n",
+ " 90,\n",
+ " 89,\n",
+ " 88,\n",
+ " 85,\n",
+ " 87,\n",
+ " 84,\n",
+ " 89,\n",
+ " 86,\n",
+ " 88,\n",
+ " 91,\n",
+ " 85,\n",
+ " 88,\n",
+ " 90,\n",
+ " 92,\n",
+ " 85,\n",
+ " 89,\n",
+ " 85,\n",
+ " 95,\n",
+ " 90,\n",
+ " 86,\n",
+ " 95,\n",
+ " 90,\n",
+ " 88,\n",
+ " 90,\n",
+ " 95,\n",
+ " 87,\n",
+ " 88,\n",
+ " 84,\n",
+ " 85,\n",
+ " 87,\n",
+ " 84,\n",
+ " 85,\n",
+ " 91,\n",
+ " 90,\n",
+ " 85,\n",
+ " 87,\n",
+ " 88,\n",
+ " 89,\n",
+ " 87,\n",
+ " 94,\n",
+ " 90,\n",
+ " 86,\n",
+ " 86,\n",
+ " 91,\n",
+ " 90,\n",
+ " 89,\n",
+ " 90,\n",
+ " 88,\n",
+ " 88,\n",
+ " 85,\n",
+ " 90,\n",
+ " 93,\n",
+ " 91,\n",
+ " 84,\n",
+ " 85,\n",
+ " 92,\n",
+ " 95,\n",
+ " 89,\n",
+ " 86,\n",
+ " 85,\n",
+ " 90,\n",
+ " 87,\n",
+ " 93,\n",
+ " 86,\n",
+ " 84,\n",
+ " 88,\n",
+ " 85,\n",
+ " 92,\n",
+ " 89,\n",
+ " 95,\n",
+ " 88,\n",
+ " 89,\n",
+ " 91,\n",
+ " 89,\n",
+ " 89,\n",
+ " 86,\n",
+ " 85,\n",
+ " 86,\n",
+ " 87,\n",
+ " 91,\n",
+ " 88,\n",
+ " 91,\n",
+ " 96,\n",
+ " 85,\n",
+ " 93,\n",
+ " 87,\n",
+ " 90,\n",
+ " 91,\n",
+ " 85,\n",
+ " 89,\n",
+ " 90,\n",
+ " 90,\n",
+ " 89,\n",
+ " 86,\n",
+ " 86,\n",
+ " 91,\n",
+ " 84,\n",
+ " 82,\n",
+ " 90,\n",
+ " 88,\n",
+ " 88,\n",
+ " 95,\n",
+ " 92,\n",
+ " 88,\n",
+ " 90,\n",
+ " 94,\n",
+ " 88,\n",
+ " 93,\n",
+ " 84,\n",
+ " 90,\n",
+ " 90,\n",
+ " 91,\n",
+ " 88,\n",
+ " 91,\n",
+ " 93,\n",
+ " 90,\n",
+ " 88,\n",
+ " 85,\n",
+ " 84,\n",
+ " 92,\n",
+ " 87,\n",
+ " 87,\n",
+ " 85,\n",
+ " 90,\n",
+ " 86,\n",
+ " 91,\n",
+ " 88,\n",
+ " 83,\n",
+ " 85,\n",
+ " 92,\n",
+ " 88,\n",
+ " 85,\n",
+ " 88,\n",
+ " 92,\n",
+ " 87,\n",
+ " 95,\n",
+ " 90,\n",
+ " 90,\n",
+ " 84,\n",
+ " 89,\n",
+ " 86,\n",
+ " 85,\n",
+ " 97,\n",
+ " 92,\n",
+ " 89,\n",
+ " 92,\n",
+ " 89,\n",
+ " 89,\n",
+ " 90,\n",
+ " 89,\n",
+ " 91,\n",
+ " 92,\n",
+ " 90,\n",
+ " 86,\n",
+ " 88,\n",
+ " 88,\n",
+ " 88,\n",
+ " 87,\n",
+ " 89,\n",
+ " 89,\n",
+ " 88,\n",
+ " 89,\n",
+ " 90,\n",
+ " 86,\n",
+ " 90,\n",
+ " 91,\n",
+ " 92,\n",
+ " 94,\n",
+ " 87,\n",
+ " 90,\n",
+ " 88,\n",
+ " 88,\n",
+ " 90,\n",
+ " 90,\n",
+ " 82,\n",
+ " 90,\n",
+ " 84,\n",
+ " 93,\n",
+ " 92,\n",
+ " 91,\n",
+ " 87,\n",
+ " 88,\n",
+ " 90,\n",
+ " 90,\n",
+ " 86,\n",
+ " 91,\n",
+ " 89,\n",
+ " 84,\n",
+ " 83,\n",
+ " 89,\n",
+ " 85,\n",
+ " 90,\n",
+ " 90,\n",
+ " 90,\n",
+ " 90,\n",
+ " 84,\n",
+ " 90,\n",
+ " 90,\n",
+ " 84,\n",
+ " 87,\n",
+ " 92,\n",
+ " 87,\n",
+ " 83,\n",
+ " 89,\n",
+ " 87,\n",
+ " 85,\n",
+ " 85,\n",
+ " 90,\n",
+ " 91,\n",
+ " 94,\n",
+ " 91,\n",
+ " 87,\n",
+ " 85,\n",
+ " 85,\n",
+ " 93,\n",
+ " 88,\n",
+ " 85,\n",
+ " 85,\n",
+ " 88,\n",
+ " 88,\n",
+ " 87,\n",
+ " 92,\n",
+ " 90,\n",
+ " 90,\n",
+ " 89,\n",
+ " 88,\n",
+ " 90,\n",
+ " 90,\n",
+ " 84,\n",
+ " 89,\n",
+ " 85,\n",
+ " 90,\n",
+ " 89,\n",
+ " 94,\n",
+ " 85,\n",
+ " 88,\n",
+ " 88,\n",
+ " 90,\n",
+ " 93,\n",
+ " 94,\n",
+ " 88,\n",
+ " 88,\n",
+ " 92,\n",
+ " 91,\n",
+ " 92,\n",
+ " 91,\n",
+ " 87,\n",
+ " 91,\n",
+ " 95,\n",
+ " 88,\n",
+ " 83,\n",
+ " 84,\n",
+ " 91,\n",
+ " 84,\n",
+ " 90,\n",
+ " 87,\n",
+ " 91,\n",
+ " 88,\n",
+ " 85,\n",
+ " 86,\n",
+ " 92,\n",
+ " 87,\n",
+ " 88,\n",
+ " 90,\n",
+ " 86,\n",
+ " 87,\n",
+ " 87,\n",
+ " 88,\n",
+ " 91,\n",
+ " 88,\n",
+ " 89,\n",
+ " 88,\n",
+ " 87,\n",
+ " 91,\n",
+ " 84,\n",
+ " 92,\n",
+ " 86,\n",
+ " 91,\n",
+ " 92,\n",
+ " 93,\n",
+ " 87,\n",
+ " 85,\n",
+ " 86,\n",
+ " 87,\n",
+ " 90,\n",
+ " 86,\n",
+ " 94,\n",
+ " 92,\n",
+ " 84,\n",
+ " 85,\n",
+ " 91,\n",
+ " 89,\n",
+ " 89,\n",
+ " 84,\n",
+ " 90,\n",
+ " 90,\n",
+ " 85,\n",
+ " 90,\n",
+ " 88,\n",
+ " 89,\n",
+ " 94,\n",
+ " 87,\n",
+ " 89,\n",
+ " 90,\n",
+ " 87,\n",
+ " 88,\n",
+ " 89,\n",
+ " 92,\n",
+ " 89,\n",
+ " 91,\n",
+ " 91,\n",
+ " 85,\n",
+ " 85,\n",
+ " 94,\n",
+ " 89,\n",
+ " 86,\n",
+ " 87,\n",
+ " 82,\n",
+ " 89,\n",
+ " 85,\n",
+ " 90,\n",
+ " 88,\n",
+ " 90,\n",
+ " 93,\n",
+ " 87,\n",
+ " 90,\n",
+ " 93,\n",
+ " 87,\n",
+ " 86,\n",
+ " 84,\n",
+ " 91,\n",
+ " 88,\n",
+ " 89,\n",
+ " 87,\n",
+ " 86,\n",
+ " 86,\n",
+ " 85,\n",
+ " 85,\n",
+ " 87,\n",
+ " 88,\n",
+ " 90,\n",
+ " 92,\n",
+ " 88,\n",
+ " 92,\n",
+ " 92,\n",
+ " 84,\n",
+ " 93,\n",
+ " 90,\n",
+ " 85,\n",
+ " 87,\n",
+ " 85,\n",
+ " 84,\n",
+ " 92,\n",
+ " 87,\n",
+ " 87,\n",
+ " 89,\n",
+ " 84,\n",
+ " 88,\n",
+ " 84,\n",
+ " 87,\n",
+ " 87,\n",
+ " 87,\n",
+ " 90,\n",
+ " 87,\n",
+ " 88,\n",
+ " 85,\n",
+ " 86,\n",
+ " 90,\n",
+ " 92,\n",
+ " 87,\n",
+ " 90,\n",
+ " 88,\n",
+ " 90,\n",
+ " 89,\n",
+ " 92,\n",
+ " 85,\n",
+ " 88,\n",
+ " 87,\n",
+ " 88,\n",
+ " 88,\n",
+ " 89,\n",
+ " 94,\n",
+ " 89,\n",
+ " 92,\n",
+ " 85,\n",
+ " 87,\n",
+ " 94,\n",
+ " 92,\n",
+ " 85,\n",
+ " 90,\n",
+ " 89,\n",
+ " 90,\n",
+ " 90,\n",
+ " 87,\n",
+ " 92,\n",
+ " 89,\n",
+ " 90,\n",
+ " 88,\n",
+ " 88,\n",
+ " 87,\n",
+ " 93,\n",
+ " 90,\n",
+ " 87,\n",
+ " 86,\n",
+ " 90,\n",
+ " 93,\n",
+ " 90,\n",
+ " 94,\n",
+ " 84,\n",
+ " 90,\n",
+ " 92,\n",
+ " 88,\n",
+ " 84,\n",
+ " 90,\n",
+ " 85,\n",
+ " 85,\n",
+ " 92,\n",
+ " 91,\n",
+ " 90,\n",
+ " 91,\n",
+ " 91,\n",
+ " 90,\n",
+ " 93,\n",
+ " 86,\n",
+ " 88,\n",
+ " 94,\n",
+ " 90,\n",
+ " 84,\n",
+ " 86,\n",
+ " 88,\n",
+ " 88,\n",
+ " 92,\n",
+ " 93,\n",
+ " 84,\n",
+ " 86,\n",
+ " 89,\n",
+ " 87,\n",
+ " 92,\n",
+ " 90,\n",
+ " 95,\n",
+ " 92,\n",
+ " 92,\n",
+ " 90,\n",
+ " 87,\n",
+ " 93,\n",
+ " 90,\n",
+ " 84,\n",
+ " 86,\n",
+ " 85,\n",
+ " 85,\n",
+ " 87,\n",
+ " 87,\n",
+ " 89,\n",
+ " 93,\n",
+ " 90,\n",
+ " 92,\n",
+ " 92,\n",
+ " 90,\n",
+ " 85,\n",
+ " 87,\n",
+ " 88,\n",
+ " 85,\n",
+ " 89,\n",
+ " 96,\n",
+ " 91,\n",
+ " 88,\n",
+ " 85,\n",
+ " 87,\n",
+ " 86,\n",
+ " 90,\n",
+ " 89,\n",
+ " 99,\n",
+ " 93,\n",
+ " 93,\n",
+ " 87,\n",
+ " 86,\n",
+ " 94,\n",
+ " 91,\n",
+ " 90,\n",
+ " 85,\n",
+ " 90,\n",
+ " 92,\n",
+ " 88,\n",
+ " 90,\n",
+ " 85,\n",
+ " 88,\n",
+ " 91,\n",
+ " 85,\n",
+ " 91,\n",
+ " 90,\n",
+ " 91,\n",
+ " 90,\n",
+ " 89,\n",
+ " 85,\n",
+ " 83,\n",
+ " 91,\n",
+ " 90,\n",
+ " 90,\n",
+ " 93,\n",
+ " 86,\n",
+ " 84,\n",
+ " 87,\n",
+ " 93,\n",
+ " 90,\n",
+ " 92,\n",
+ " 84,\n",
+ " 90,\n",
+ " 88,\n",
+ " 90,\n",
+ " 94,\n",
+ " 94,\n",
+ " 90,\n",
+ " 88,\n",
+ " 87,\n",
+ " 87,\n",
+ " 85,\n",
+ " 86,\n",
+ " 86,\n",
+ " 92,\n",
+ " 85,\n",
+ " 89,\n",
+ " 86,\n",
+ " 87,\n",
+ " 88,\n",
+ " 85,\n",
+ " 89,\n",
+ " 91,\n",
+ " 90,\n",
+ " 84,\n",
+ " 92,\n",
+ " 88,\n",
+ " 92,\n",
+ " 85,\n",
+ " 91,\n",
+ " 84,\n",
+ " 90,\n",
+ " 93,\n",
+ " 92,\n",
+ " 85,\n",
+ " 85,\n",
+ " 88,\n",
+ " 85,\n",
+ " 90,\n",
+ " 91,\n",
+ " 83,\n",
+ " 95,\n",
+ " 87,\n",
+ " 85,\n",
+ " 94,\n",
+ " 91,\n",
+ " 94,\n",
+ " 86,\n",
+ " 85,\n",
+ " 94,\n",
+ " 90,\n",
+ " 89,\n",
+ " 84,\n",
+ " 88,\n",
+ " 89,\n",
+ " 89,\n",
+ " 88,\n",
+ " 90,\n",
+ " 87,\n",
+ " 88,\n",
+ " 87,\n",
+ " 95,\n",
+ " 92,\n",
+ " 87,\n",
+ " 90,\n",
+ " 90,\n",
+ " 92,\n",
+ " 84,\n",
+ " 84,\n",
+ " 83,\n",
+ " 91,\n",
+ " 87,\n",
+ " 92,\n",
+ " 90,\n",
+ " 89,\n",
+ " 88,\n",
+ " 90,\n",
+ " 90,\n",
+ " 86,\n",
+ " 86,\n",
+ " 88,\n",
+ " 85,\n",
+ " 85,\n",
+ " 87,\n",
+ " 85,\n",
+ " 87,\n",
+ " 98,\n",
+ " 90,\n",
+ " 87,\n",
+ " 88,\n",
+ " 82,\n",
+ " 90,\n",
+ " 85,\n",
+ " 90,\n",
+ " 84,\n",
+ " 88,\n",
+ " 88,\n",
+ " 87,\n",
+ " 88,\n",
+ " 88,\n",
+ " 90,\n",
+ " 90,\n",
+ " 89,\n",
+ " 88,\n",
+ " 87,\n",
+ " 84,\n",
+ " 87,\n",
+ " 85,\n",
+ " 92,\n",
+ " 88,\n",
+ " 95,\n",
+ " 88,\n",
+ " 85,\n",
+ " 89,\n",
+ " 87,\n",
+ " 91,\n",
+ " 90,\n",
+ " 88,\n",
+ " 89,\n",
+ " 87,\n",
+ " 90,\n",
+ " 93,\n",
+ " 90,\n",
+ " 89,\n",
+ " 94,\n",
+ " 86,\n",
+ " 87,\n",
+ " 89,\n",
+ " 92,\n",
+ " 90,\n",
+ " 87,\n",
+ " 89,\n",
+ " 84,\n",
+ " 92,\n",
+ " 95,\n",
+ " 93,\n",
+ " 85,\n",
+ " 90,\n",
+ " 83,\n",
+ " ...]"
+ ]
+ },
+ "execution_count": 83,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "a1"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#The first model performed better."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 84,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Requirement already satisfied: tensorflow-hub in c:\\users\\ysach\\anaconda3\\lib\\site-packages (0.15.0)\n",
+ "Requirement already satisfied: numpy>=1.12.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-hub) (1.23.5)\n",
+ "Requirement already satisfied: protobuf>=3.19.6 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-hub) (3.20.3)\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install tensorflow-hub"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 100,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def wine_quality_det(news):\n",
+ " review = news\n",
+ " review = re.sub(r'[^a-zA-Z\\s]', '', review)\n",
+ " review = review.lower()\n",
+ " review = nltk.word_tokenize(review)\n",
+ " for y in review :\n",
+ " if y not in stpwrds :\n",
+ " corpus.append(lemmatizer.lemmatize(y)) \n",
+ " input_data = [' '.join(corpus)]\n",
+ " vectorized_input_data_pre = tokenize.texts_to_sequences(input_data)\n",
+ " vectorized_input_data=pad_sequences(vectorized_input_data_pre, padding=\"pre\", truncating=\"pre\")\n",
+ " prediction = classifier.predict(vectorized_input_data)\n",
+ " print(80 +np.argmax(prediction))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 104,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1/1 [==============================] - 0s 23ms/step\n",
+ "89\n"
+ ]
+ }
+ ],
+ "source": [
+ "wine_quality_det(\"u touch riesling accentuates fresh citrusy backbone cabernet sauvignon ro dry style sprightly lightfooted tone offer load concentrated cherry berry flavor finish brisk clean dry new york osprey dominion dry ro north fork long island ro osprey dominion\")"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.9"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/Wine Reviews Classification/README.md b/Wine Reviews Classification/README.md
new file mode 100644
index 000000000..27be8d30b
--- /dev/null
+++ b/Wine Reviews Classification/README.md
@@ -0,0 +1,101 @@
+# Wine Reviews Classification using DL
+
+## PROJECT TITLE
+
+Wine Reviews Classification using Deep Learning
+
+## GOAL
+
+To classify the quality of wine based on reviews.
+
+## DATASET
+
+The link for the dataset used in this project: https://www.kaggle.com/datasets/zynicide/wine-reviews
+
+## EDA
+Shape of Dataset:(129971, 14)
+
+
+
+## DESCRIPTION
+
+This project aims to identify the quality points of wine based upon its reviews.
+
+## WHAT I HAD DONE
+
+1. Data collection: From the link of the dataset given above.
+2. Data preprocessing: Preprocessed the news by combining title and text to create a new feature and did some augementation like tokeinizing and vectorising before passing them to model training
+3. Model selection: Self Designed model having a Embedding Layer followed by Global Pooling Layer and then 2 Dense layers and then output layer.Second model had a Embedding layer followed by a RNN layer and a Dense output layer.
+4. Comparative analysis: Compared the accuracy score of all the models.
+
+## MODELS SUMMARY
+
+Model: "sequential"
+_________________________________________________________________
+ Layer (type) Output Shape Param #
+=================================================================
+ embedding (Embedding) (None, 89, 200) 12794200
+
+ global_average_pooling1d ( (None, 200) 0
+ GlobalAveragePooling1D)
+
+ dense (Dense) (None, 100) 20100
+
+ dense_1 (Dense) (None, 50) 5050
+
+ dense_2 (Dense) (None, 21) 1071
+
+=================================================================
+Total params: 12820421 (48.91 MB)
+Trainable params: 12820421 (48.91 MB)
+Non-trainable params: 0 (0.00 Byte)
+
+Model-2: "sequential_1"
+_________________________________________________________________
+ Layer (type) Output Shape Param #
+=================================================================
+ embedding_1 (Embedding) (None, 89, 100) 6397100
+
+ simple_rnn (SimpleRNN) (None, 30) 3930
+
+ dense_3 (Dense) (None, 21) 651
+
+=================================================================
+Total params: 6401681 (24.42 MB)
+Trainable params: 6401681 (24.42 MB)
+Non-trainable params: 0 (0.00 Byte)
+
+## LIBRARIES NEEDED
+
+The following libraries are required to run this project:
+
+- nltk
+- pandas
+- matplotlib
+- tensorflow
+- keras
+- sklearn
+
+## EVALUATION METRICS
+
+The evaluation metrics I used to assess the models:
+
+- Loss
+- Accuracy
+
+It is shown using Confusion Matrix in the Images folder
+
+## RESULTS
+Results on Val dataset:
+For Model-1:
+Accuracy:31%
+loss: 3.1
+
+For Model-2:
+Accuracy:9%
+loss:8.05
+
+## CONCLUSION
+Based on results we can draw following conclusions:
+
+1.The model-1 performed better than model 2.
\ No newline at end of file