In this direction, a web-app was developed to help users understand how biases are introduced in recommender systems. Moreover, we could thereby estimate the extend of bias in these systems. The app is comprised by four main pages. The first page visualizes datasets to help users find possible biases. In the second page, the user can build a recommender system by choosing between a vast collection of algorithms and hyperparameter tuning options in a user-friendly way. Additionally, we developed a page for the evaluation of a recommender system as per popularity bias, fairness, diversity, novelty and coverage. The evaluation consists of a) bias monitoring through different types of plots for a single dataset or dataset comparison b) cut-off analysis and c) hyperparameter analysis. Finally, we developed a page for popularity bias mitigation using one of the four algorithms that are available: FAR, PFAR, FA*IR and Calibrated recommendations.
With reference to the broader field of ethical issues, this thesis shares special interest to popularity bias, diversity, novelty and item coverage. An extensive experimental study was conducted to gain a better understanding of the sources of bias and analyze the effect of different bias mitigation algorith-ms. This was implemented by utilizing the aforementioned web app. Four datasets were used in the present study: one real dataset provided by a major electronics retailer, and three datasets collected from the internet. The first part of the study examines the role of the hyperparameter tuning for every algorithm that was used and the role of dataset characteristics, in bias and accuracy. It also compares the above-mentioned datasets. The second part consists of bias mitigation using three re-ranking algorithms: FAR, PFAR and Calibrated recommendations and an in-processing algorithm.
This study has identified that data characteristics, and especially the sparsity of user-item matrix, can highly affect the bias that is introduced. Moreover, another significant finding is that the post-processing mitigation algorithms that were examined can improve the bias-accuracy tradeoff, but have several limitations too. In conclusion, developers of recommender systems need to be aware of sources of biases and of the accuracy-bias tradeoff. This work contributes to this direction and lays the groundwork for future research into bias in recommender systems.
Useful information about Movielens1M dataset |
In this page, the user can get a qualitative understanding of the data via getting useful information and statistical details for the dataset and via 4 main types of plots that are offered:
The user can also build a recommendation system, choosing from a variety of algorithms and evaluation metrics, provided by Elliot framework.
He can use the default values of the hyperparameters of every algorithm or set his preferred values (for more experienced users):
After building a recommender system, the user can analyze the generated recommendations by using the evaluation metrics of his preference. There are 24 metrics available including accuracy, popularity bias, coverage, diversity and novelty metrics, provided by Elliot framework. You can either analyze the results of a single dataset or compare the results of different datasets.
The bias mitigation technique used, belongs to the category of post-processing techniques, and more specifically to the re-ranking algorithms. These techniques take as input a recommendation list for every user in the dataset, produced by an algorithm of your choice. There are 3 bias mitigation algorithms, available in our app: FAR, PFAR and FA*IR. These algorithms are provided by Librec-auto.
When the bias mitigation process has been completed, plots comparing the results produced by the technique with the initial results are shown.
The app provides also detailed and simple explanations in non-technical terms, for all the evaluation metrics contained in our app: