Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
arnobotha authored Nov 27, 2024
1 parent 166f315 commit fbeb8a7
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@

A novel procedure is presented for finding the true but latent endpoints within the repayment histories of individual loans. The monthly observations beyond these true endpoints are false, largely due to operational failures that delay account closure, thereby corrupting some loans in the dataset with `false' observations. Detecting these false observations is difficult at scale since each affected loan history might have a different sequence of zero (or very small) month-end balances that persist towards the end. Identifying these trails of diminutive balances would require an exact definition of a "small balance", which can be found using our so-called _TruEnd_-procedure. We demonstrate this procedure and isolate the ideal small-balance definition using residential mortgages from a large South African bank. Evidently, corrupted loans are remarkably prevalent and have excess histories that are surprisingly long, which ruin the timing of certain risk events and compromise any subsequent time-to-event model such as survival analysis. Excess histories can be discarded using the ideal small-balance definition, which demonstrably improves the accuracy of both the predicted timing and severity of risk events, without materially impacting the monetary value of the portfolio. The resulting estimates of credit losses are lower and less biased, which augurs well for raising accurate credit impairments under the IFRS 9 accounting standard. Our work therefore addresses a pernicious data error, which highlights the pivotal role of data preparation in producing credible forecasts of credit risk.

## Structure
This R-codebase can be run sequentially using the file numbering itself as a structure. Delinquency measures are algorithmically defined in **DelinqM.R** as data-driven functions, which may be valuable to the practitioner outside of the study's current scope. The TruEnd-procedure and its set of funcations are defined in **TruEnd.R**, which may also be valuable to the practitioner beyond the current scope.
## Structure
This R-codebase can be run sequentially using the file numbering itself as a structure. **Delinquency measures** are algorithmically defined in **DelinqM.R** as data-driven functions, which may be valuable to the practitioner outside of the study's current scope. These delinquency measures were formulated and empirically tested in [Botha22](https://www.researchgate.net/publication/358329458_The_loss_optimization_of_loan_recovery_decision_times_using_forecast_cashflows), as part of a loss optimisation exercise of recovery decision times, as implemented in the corresponding [R-codebase](https://github.com/arnobotha/The-loss-optimisation-of-loan-recovery-decision-times-using-forecast-cash-flows). A simulation study from [Botha2021](https://www.researchgate.net/publication/350169758_Simulation-based_optimisation_of_the_timing_of_loan_recovery_across_different_portfolios) also demonstrated these delinquency measures at length, with its corresponding [R-codebase](https://github.com/arnobotha/Simulation-based-optimisation-of-the-timing-of-loan-recovery-across-different-portfolios). Similarly, the **TruEnd-procedure** from [Botha24](https://www.researchgate.net/publication/380214432_The_TruEnd-procedure_Treating_trailing_zero-valued_balances_in_credit_data) and its corresponding [R-codebase](https://github.com/arnobotha/TruEnd-Procedure) is impliemented in the **TruEnd.R** script, which includes a small variety of functions related to running the TruEnd-procedure practically.

## Data
This R-codebase assumes that monthly loan performance data is available. Naturally, the data itself can't be made publically available given its sensitive nature, as well as various data privacy laws, particularly the _Protection of Personal Information (POPI)_ Act of 2013 in South Africa. However, the structure and type of data that is required for reproducing this study, is sufficiently described in the commentary within the scripts. This should enable the practitioner to extract and prepare data accordingly. Moreover, this codebase assumes South African macroeconomic data is available, as sourced and collated by internal staff of the bank in question.
Expand Down

0 comments on commit fbeb8a7

Please sign in to comment.