From 913a53c621ed68acd0184b2fbe55fd195b1d0681 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=D0=9C=D0=B0=D0=B9=D1=8F=20=D0=A1=D0=BF=D0=B8=D1=80=D0=B8?= =?UTF-8?q?=D0=BD=D0=B0?= Date: Mon, 20 May 2024 16:17:45 +0300 Subject: [PATCH] tutorial markdown fix --- .../baselines_extended_tutorial.ipynb | 158 +++++++++++------- 1 file changed, 97 insertions(+), 61 deletions(-) diff --git a/examples/tutorials/baselines_extended_tutorial.ipynb b/examples/tutorials/baselines_extended_tutorial.ipynb index 0dd504b1..57b2ee56 100644 --- a/examples/tutorials/baselines_extended_tutorial.ipynb +++ b/examples/tutorials/baselines_extended_tutorial.ipynb @@ -88,7 +88,7 @@ " inflating: __MACOSX/data_en/._interactions.csv \n", " inflating: data_en/users_en.csv \n", " inflating: __MACOSX/data_en/._users_en.csv \n", - "CPU times: user 234 ms, sys: 96 ms, total: 330 ms\n", + "CPU times: user 286 ms, sys: 122 ms, total: 408 ms\n", "Wall time: 16.1 s\n" ] } @@ -983,24 +983,29 @@ "### Model description \n", "Goal of the model is to present interactions matrix as a product of user(X) and item(Y) embeddings. Implicit ALS model treats all non-zero entries in the matrix as value `1`. The actual weight of the interactions is treated as `confidence` in the observation. Zero entries receive low confidence since this they are treated as missing values and might actually hide items highly relevant to users. Non-zero entries with high confidence will have greater impact on the loss when not predicted correctly. Overall loss functions is the following:\n", "$$ min_{x*, y*}{ \\sum_{u, i}^{}c_{ui}(p_{ui}-x_{u}^Ty_{i})^2 +\\lambda (\\sum_{u}\\lVert x_{u} \\rVert ^2 + \\sum_{i}\\lVert y_{i} \\rVert ^2)}$$\n", - "$c_{ui}$ - confidence in observing the item ($c_{ui} = 1 + \\alpha r_{ui}$)
\n", + "$c_{ui}$ - confidence in observing the item (${c_{ui} = 1 + \\alpha r_{ui}}$)\n", + "\n", "* $r_{ui}$ - weight assigned to interaction of user u with item i\n", "* $\\alpha$ - rate of increase in confidence. Determines to what extent change in weight modifies confidence\n", " \n", - "$ p_{ui}$ - binary representation, whether user and item had interaction
\n", - "$ x_{u}$, $y_{i} $ - vectors we need to find for users and items
\n", - "$ \\lambda $ - regularization term to avoid overfitting
\n", + "$p_{ui}$ - binary representation, whether user and item had interaction \\\n", + "$x_{u}$, $y_{i}$ - vectors we need to find for users and items \\\n", + "$\\lambda$ - regularization term to avoid overfitting \n", + "\n", + "Since both X and Y matrices have to be calculated, an alternating least squares algorithm is used. The procedure is done by repeatadly performing following 2 steps:\n", "\n", - "Since both X and Y matrices have to be calculated, an alternating least squares algorithm is used. The procedure is done by repeatadly performing following 2 steps:
\n", "1. Fix X (user matrix) and find optimal Y (item matrix)\n", + "\n", "2. Fix Y (item matrix) and find optimal X (usermatrix)\n", "\n", "This algorithm simplifies calculations, as when one of the matrices is fixed, the cost function becomes quadratic and has an easily achievable minimum. After the algorithm has converged, values of X and Y are taken as embeddings.\n", "\n", "### Recommendations \n", - "Recommendations for all users are received from multiplication of $X^T$ and $Y$, after that top-k can be extracted from $ \\hat{p} = X^T Y$. As an example consider recommendation procedure for one user, when item and user embeddings are received:\n", - "1. Calculate predicted preferences $ \\hat{p}_{u} = x_{u}^Ty$ (first row of $\\hat{p}$ in the picture). It contains information about how likely first user is to be interested in all items\n", - "2. Take top K items with the greatest value of $ \\hat{p}_{ui} $. In the picture if K=1, first item should be recommended, as 8 > 3 and 8 > 7.4" + "Recommendations for all users are received from multiplication of $X^T$ and $Y$, after that top-k can be extracted from $\\hat{p} = X^T Y$. As an example consider recommendation procedure for one user, when item and user embeddings are received:\n", + "\n", + "1. Calculate predicted preferences $\\hat{p}_{u} = x_{u}^Ty$ (first row of $\\hat{p}$ in the picture). It contains information about how likely first user is to be interested in all items\n", + " \n", + "2. Take top K items with the greatest value of $\\hat{p}_{ui}$. In the picture if K=1, first item should be recommended, as 8 > 3 and 8 > 7.4" ] }, { @@ -1024,9 +1029,11 @@ "### RecTools implementation \n", "RecTools provides a wrapper for implicit library iALS implementation which is the most efficient and widely used: `implicit.als.AlternatingLeastSquares` model. Additionally, RecTools offers a modification, which allows to add explicit user and item features to the model. \n", "\n", - "For items it is done in 2 steps (for user explicit features procedure is similar):
\n", + "For items it is done in 2 steps (for user explicit features procedure is similar): \n", + "\n", "1. Explicit item features are added as additional columns to item embeddings\n", - "2. Same number of columns is added to user embeddings (paired to explicit item features)
\n", + "\n", + "2. Same number of columns is added to user embeddings (paired to explicit item features)\n", "\n", "If both user and item explicit features are used, each embedding matrix contains three logical parts: latent factors, explicit features and paired features. Explicit features will remain their original values after training. Paired features can either be fit together with latent features or separately, this is a hyper-parameter for the wrapper." ] @@ -1068,7 +1075,7 @@ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "9fc5bddca6ab44a8bb80167e309f4e03", + "model_id": "b57f124d613f4989ac275979821ece1a", "version_major": 2, "version_minor": 0 }, @@ -1082,7 +1089,7 @@ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "a0c14542f8104b6a8695a3ed1635b46f", + "model_id": "783fd99252154abaaaf2a17a0535a3cd", "version_major": 2, "version_minor": 0 }, @@ -1097,8 +1104,8 @@ "name": "stdout", "output_type": "stream", "text": [ - "CPU times: user 28.3 s, sys: 371 ms, total: 28.7 s\n", - "Wall time: 28.7 s\n" + "CPU times: user 27.3 s, sys: 326 ms, total: 27.6 s\n", + "Wall time: 27.6 s\n" ] } ], @@ -1276,9 +1283,12 @@ "\n", "\n", "To incorporate user metadata, user latent representation is derived:\n", + "\n", "1. Concat user identity matrix with user features (matrix A)\n", + " \n", "2. Take matrix of user factors with prespecified n_factors (matrix B)\n", - "3. Compute $ A \\cdot B $. Each row in this matrix is a latent representation for one user\n", + " \n", + "3. Compute $A\\cdot B$. Each row in this matrix is a latent representation for one user\n", "\n", "To find item latent representations similar procedure should be performed.\n", "\n", @@ -1305,20 +1315,25 @@ "source": [ "LightFM with **logistic** loss maximizes likelihood of receiving interactions matrix. Zero entries are considered as negatives, non-zero entries are considered positives.\n", "$$ L(e^U, e^I, b^U, b^I) = \\prod_{(u,i)\\in S^{+}} \\hat p_{ui} \\times \\prod_{(u,i)\\in S^{-}}(1 - \\hat p_{ui}) $$\n", - "$ \\hat p_{ui} = f(x_{u}\\cdot y_{i} + b_{u} + b_{i})$, where f is sigmoid function, as binary prediction is made
\n", - "$ x_{u} = \\sum_{j\\in f_{u}} e_{j}^{U}$, $ y_{i} = \\sum_{j\\in f_{i}} e_{j}^{I}$ - latent representations of user and item with $e_{j}^{U}$ and $e_{j}^{I}$ as feature embeddings
\n", - "$ b_{u} = \\sum_{j\\in f_{u}} b_{j}^{U}$, $ b_{i} = \\sum_{j\\in f_{i}} b_{j}^{I}$ - bias term for users and items. $b_{j}^{U}$ and $b_{j}^{I}$ are scalar biases
\n", - "$S^{+}$ and $S^{-}$ are observed and not observed interactions, respectively
\n", + "\n", + "$\\hat p_{ui} = f(x_{u}\\cdot y_{i} + b_{u} + b_{i})$, where f is sigmoid function, as binary prediction is made \\\n", + "$x_{u} = \\sum_{j\\in f_{u}} e_{j}^{U}$, $y_{i} = \\sum_{j\\in f_{i}} e_{j}^{I}$ - latent representations of user and item with $e_{j}^{U}$ and $e_{j}^{I}$ as feature embeddings \\\n", + "$b_{u} = \\sum_{j\\in f_{u}} b_{j}^{U}$, $b_{i} = \\sum_{j\\in f_{i}} b_{j}^{I}$ - bias term for users and items. $b_{j}^{U}$ and $b_{j}^{I}$ are scalar biases \\\n", + "$S^{+}$ and $S^{-}$ are observed and not observed interactions, respectively \n", "\n", "**BPR** is a pairwise loss, which maximizes the difference between positive and random negative examples. In LightFM it is useful for cases with only positive interactions present and when the goal is to maximize ROC AUC. It is derived from a Bayesian formulation of the problem by finding maximum posterior from likelihood and normal prior. Resulting formula in general case is the following:\n", "$$ \\sum_{(u,i,j) \\in D_s} ln \\sigma (\\hat{p}_{uij}) - \\lambda_{\\theta} \\lVert \\theta \\rVert ^2$$\n", - "$ D_s := \\{(u,i,g)| i \\in I_{u}^+ \\wedge j \\in I \\backslash I_{u}^+\\}$, set containing triplets of user, positive and negative example
\n", - "$\\hat{p}_{uij}$ - a function describing relationship between user and 2 items
\n", - "$ \\theta $ - parameters to find\n", "\n", - "For LightFM framework formula can be rewritten taking into account that:
\n", - "1. $ \\hat{p}_{uij} = \\hat{p}_{ui} - \\hat{p}_{uj} = x_uy_i + b_u + b_i - x_uy_j - b_u - b_j = x_u(y_i - y_j) + (b_i - b_j)$\n", - "2. $ \\Theta = (e^U, e^I, b^U, b^I) $\n", + "$D_s := \\{(u,i,g)| i \\in I_{u}^+ \\wedge j \\in I \\backslash I_{u}^+\\}$, set containing triplets of user, positive and negative example \\\n", + "$\\hat{p}_{uij}$ - a function describing relationship between user and 2 items \\\n", + "$\\theta$ - parameters to find\n", + "\n", + "For LightFM framework formula can be rewritten taking into account that: \n", + "\n", + "1. $\\hat{p}_{uij} = \\hat{p}_{ui} - \\hat{p}_{uj} = x_uy_i + b_u + b_i - x_uy_j - b_u - b_j = x_u(y_i - y_j) + (b_i - b_j)$\n", + " \n", + "2. $\\Theta = (e^U, e^I, b^U, b^I)$\n", + " \n", "3. LightFM has 2 regularization parameters, which is applied both to embeddings and biases: $\\lambda_{item}$ - for items and $\\lambda_{user}$ - for users. \n", " \n", "Thus, formula is:\n", @@ -1329,18 +1344,18 @@ "Model is trained using asynchronous stochastic gradient descent. Weights of the interactions define how much each observation affects the loss.\n", "\n", "### Recommendations \n", - "1. For hot user that already had interactions during training scores are computed from user and item latent representations and biases: $ score = x_{u}\\cdot y_{i} + b_u + b_i$\n", + "1. For hot user that already had interactions during training scores are computed from user and item latent representations and biases: $score = x_{u}\\cdot y_{i} + b_u + b_i$\n", "\n", - "2. In the warm start scenario the user has no interactions, but user features are known. In this case user representation is taken from user features embeddings and then score is computed: $ score = x_{u}\\cdot y_{i} + b_i$ \n", + "2. In the warm start scenario the user has no interactions, but user features are known. In this case user representation is taken from user features embeddings and then score is computed: $score = x_{u}\\cdot y_{i} + b_i$ \n", "\n", - "3. In the cold start scenario the user has no interactions and no user features are known. In this case model recommends popular items, basing on biases only $ score = b_i$\n", + "3. In the cold start scenario the user has no interactions and no user features are known. In this case model recommends popular items, basing on biases only $score = b_i$\n", "\n", "Top k items with highest scores for each user are taken as recommendations. Sigmoid function can be skipped during recommendations for faster inference\n", "\n", "### RecTools implementation \n", "RecTools provides a wrapper for the LightFM model and additionally:\n", - "* offers **10-25 times faster** inference\n", - "* provides recommendations for hot, cold and warm users in the same interface out of the box" + "- offers **10-25 times faster** inference\n", + "- provides recommendations for hot, cold and warm users in the same interface out of the box" ] }, { @@ -1357,6 +1372,7 @@ "metadata": {}, "source": [ "`lightfm` extension for rectools is required to run this code. You can install it with `pip install rectools[lightfm]` \n", + "\n", "* Select `loss` from \"logistic\", \"warp\", \"bpr\", \"warp-kos\". \"logistic\" is default but it usually has the worst performance\n", "* Specify embeddings size with LightFM `no_components`\n", "* Specify l2 regularization penalty on features with LightFM `item_alpha` and `user_alpha`\n", @@ -1376,8 +1392,8 @@ "name": "stdout", "output_type": "stream", "text": [ - "CPU times: user 6.92 s, sys: 70.4 ms, total: 6.99 s\n", - "Wall time: 7 s\n" + "CPU times: user 6.28 s, sys: 53.8 ms, total: 6.33 s\n", + "Wall time: 6.33 s\n" ] } ], @@ -1665,14 +1681,17 @@ "metadata": {}, "source": [ "Latent representations of users and items are the following:\n", + "\n", "1. user factors $X$ = $U$\n", + " \n", "2. item factors $Y$ = $\\Sigma \\cdot V^{T}$\n", "\n", "\n", "RecTools uses scipy implementation of scipy.sparse.linalg.svds function to compute decomposition.\n", "\n", "### Recommendations \n", - "1. Recommendation for user/item pair is the following: $ \\hat{r_{ui}} = x_u y_i$\n", + "1. Recommendation for user/item pair is the following: $\\hat{r_{ui}} = x_u y_i$\n", + "\n", "2. Take k items with greatest $\\hat{r_{ui}}$ for fixed user\n", "\n", "### Model application \n", @@ -1690,8 +1709,8 @@ "name": "stdout", "output_type": "stream", "text": [ - "CPU times: user 702 ms, sys: 30.7 ms, total: 732 ms\n", - "Wall time: 741 ms\n" + "CPU times: user 681 ms, sys: 19.7 ms, total: 700 ms\n", + "Wall time: 699 ms\n" ] } ], @@ -1740,7 +1759,7 @@ " 0\n", " 176549\n", " 7571\n", - " 1.893730\n", + " 1.893236\n", " 1\n", " 100% Wolf\n", " \n", @@ -1748,7 +1767,7 @@ " 1\n", " 176549\n", " 12173\n", - " 1.525142\n", + " 1.524853\n", " 2\n", " Avengers: Endgame\n", " \n", @@ -1756,7 +1775,7 @@ " 2\n", " 176549\n", " 10942\n", - " 0.996924\n", + " 0.996734\n", " 3\n", " MARVEL'S THE AVENGERS\n", " \n", @@ -1766,9 +1785,9 @@ ], "text/plain": [ " user_id item_id score rank title_orig\n", - "0 176549 7571 1.893730 1 100% Wolf\n", - "1 176549 12173 1.525142 2 Avengers: Endgame \n", - "2 176549 10942 0.996924 3 MARVEL'S THE AVENGERS" + "0 176549 7571 1.893236 1 100% Wolf\n", + "1 176549 12173 1.524853 2 Avengers: Endgame \n", + "2 176549 10942 0.996734 3 MARVEL'S THE AVENGERS" ] }, "execution_count": 25, @@ -1825,12 +1844,17 @@ "# Nearest Neighbours \n", "## ItemKNN \n", "### Model description \n", - "Model bases on the idea that users may like items similar to what they have interacted with previously. To achieve this goal model starts with computing item-to-item similarities from the interactions matrix.
\n", + "Model bases on the idea that users may like items similar to what they have interacted with previously. To achieve this goal model starts with computing item-to-item similarities from the interactions matrix. \\\n", "Algorithm used is the following:\n", + "\n", "1. Get items vectors as columns in the Interactions matrix\n", + " \n", "2. Compute distances between item vectors (e.g. Cosine, TF-IDF, BM25)\n", + " \n", "3. Keep only K closest vectors for each item\n", + " \n", "4. Form item-item similarity matrix with every item having K filled scores for top similar items\n", + " \n", "5. Build recommendations by multiplying user interactions on item-item similarity matrix so that users receive similar items to the ones that they have already interacted with" ] }, @@ -1853,10 +1877,14 @@ "metadata": {}, "source": [ "### Recommendations \n", - "Consider an example of how to make a top-1 recommendation for one user.
\n", + "Consider an example of how to make a top-1 recommendation for one user. \n", + "\n", "1. Suppose user interacted with items 2 and 4, which have weights 2 and 4, respectively\n", + " \n", "2. Calculate similarity matrix for each item, which includes 2 neighbors (for other items set similarity equal to 0). Resulting matrix stores information on item closeness\n", - "3. For recommendation use interaction values as weights and calculate how likely a user is to like an item. For instance, as the user has weight 2 for item 2, we can go through the closest items and add results to the respective item. Closest to item 2 is item 1, thus, add to item 1 the following: $2 \\cdot 0.8 = 1.6$, add to item 4: $2 \\cdot 0.5 = 1$ \n", + " \n", + "3. For recommendation use interaction values as weights and calculate how likely a user is to like an item. For instance, as the user has weight 2 for item 2, we can go through the closest items and add results to the respective item. Closest to item 2 is item 1, thus, add to item 1 the following: $2 \\cdot 0.8 = 1.6$, add to item 4: $2 \\cdot 0.5 = 1$\n", + " \n", "4. After all calculations are performed, sum the values for each item and recommend items with the greatest sum. Also, it is possible to filter out already seen recommendations by setting their value to 0." ] }, @@ -1904,8 +1932,8 @@ "name": "stdout", "output_type": "stream", "text": [ - "CPU times: user 808 ms, sys: 31.5 ms, total: 839 ms\n", - "Wall time: 838 ms\n" + "CPU times: user 764 ms, sys: 32.2 ms, total: 797 ms\n", + "Wall time: 796 ms\n" ] } ], @@ -2017,6 +2045,7 @@ "source": [ "# Linear Autoencoders \n", "Autoencoders in general attempt to make output of the model as close to input as possible. In recommendation setting there is a class of models that aim for the same task. Interactions matrix should be approximated as closely as possible based on the data that is present only in this matrix itself. Linear autoencoders like EASE and SLIM were found to be strong recommender baselines by many academy and industry practitioners.\n", + "\n", "## EASE " ] }, @@ -2026,7 +2055,7 @@ "metadata": {}, "source": [ "### Model description \n", - "The model's goal is to find a dense item-item similarity matrix which will reconstruct the interactions matrix when being multiplied on it. Unlike neighborhood models, its computation is done by loss minimization.
\n", + "The model's goal is to find a dense item-item similarity matrix which will reconstruct the interactions matrix when being multiplied on it. Unlike neighborhood models, its computation is done by loss minimization. \\\n", "Matrix we would like to receive is B, it can be received from the following minimization problem:\n", "$$ \n", "\\begin{equation}\n", @@ -2035,24 +2064,31 @@ " diag(B) = 0\n", " \\end{cases}\\,.\n", "\\end{equation}$$ \n", - "X - interactions matrix
\n", - "B - weight-matrix, which should be found
\n", - "$\\lVert X - XB \\rvert\\rvert _{F}$ - Frobenius norm
\n", - "$ \\lambda $ - regularization term to avoid overfitting
\n", + "X - interactions matrix \\\n", + "B - weight-matrix, which should be found \\\n", + "$\\lVert X - XB \\rvert\\rvert _{F}$ - Frobenius norm \\\n", + "$\\lambda$ - regularization term to avoid overfitting \\\n", "Constraint is needed, as otherwise model may return identity B and because of the fact that replicating input model should generalize (not recommend the same item as input)\n", "\n", - "By analyzing closed-form solution it was found out that B can be calculated from the inverse of Gram matrix = $ X \\cdot X^{T}$. Thus, simple algorithm is used to find B.
\n", + "By analyzing closed-form solution it was found out that B can be calculated from the inverse of Gram matrix = $X \\cdot X^{T}$. Thus, simple algorithm is used to find B.\n", "\n", "Algorithm:\n", - "1. Define $ G = X \\cdot X^{T}$\n", - "1. Add $\\lambda$ to diagonal indices of G\n", - "2. Find inverse of G ($G^{-1}$)\n", - "3. B = $ \\frac{G^{-1}}{diag(G^{-1})}$, where $ diag(G^{-1})$ is array of diagonal elements\n", - "4. Set all elements on the diagonal of B equal to 0\n", + "\n", + "1. Define $G = X \\cdot X^{T}$\n", + " \n", + "2. Add $\\lambda$ to diagonal indices of G\n", + " \n", + "3. Find inverse of G ($G^{-1}$)\n", + " \n", + "4. B = $\\frac{G^{-1}}{diag(G^{-1})}$, where $diag(G^{-1})$ is array of diagonal elements\n", + " \n", + "5. Set all elements on the diagonal of B equal to 0\n", "\n", "### Recommendations \n", "Recommendations for one user are defined as following: \n", - "1. Compute scores $ s_{u} = x_{u} \\cdot B $ where $ x_{u}$ refers to user row in interactions matrix\n", + "\n", + "1. Compute scores $s_{u} = x_{u} \\cdot B$ where $x_{u}$ refers to user row in interactions matrix\n", + "\n", "2. Take k items with greatest scores" ] }, @@ -2090,8 +2126,8 @@ "name": "stdout", "output_type": "stream", "text": [ - "CPU times: user 3min 24s, sys: 3.68 s, total: 3min 28s\n", - "Wall time: 3min 29s\n" + "CPU times: user 3min 22s, sys: 2.13 s, total: 3min 25s\n", + "Wall time: 3min 25s\n" ] } ],