Author: Ing. Aldo Escobar
Advisor: Mg. Pablo Roccatagliata
The present work proposes two experiments to automate the process of bank credit allocation using an unbalanced data set. The first experiment uses an implementation of a scalable and modular algorithm, NGBoost, to classify observations as either good and bad payers based on an objective function given a cost matrix (cost-sensitive problem). Once the model has been trained a feature importance analysis will be carried out to decide which features to hide for the following (second) exercise. Using this modified data set, performance will then be studied, in terms of the company's benefits, with regard to the use of a standard credit scoring. Finally, a new rule will be generated to allocate credit, considering credit scoring and the need to give credit in order to learn from users.
Keywords: credit scoring, machine learning, natural gradient boosting, cost-sensitive problem, thresholding, feature importance.