Use Case: Loan repayment prediction

Overview

This use case demonstrates the application of machine learning in financial services, focusing on predicting the likelihood of loans becoming charged off for LendingClub.

Problem scenario

LendingClub, a peer-to-peer lending service, connects investors with borrowers. Although lending facilitates financial access, it involves the risk of loans going into "charged off" status, causing a financial loss. This use case aims to build a machine learning model to predict loans likely to be charged off, thereby helping to minimize losses and allocate funds more effectively.

Dataset

The dataset consists of two files: approved loans and rejected loans. This use case focuses only on approved loans, including those fully paid and charged off. For training, data from 2007 to 2014 is used, with 2015 data reserved for testing. Loan amounts, interest rates, and payment statuses are key data points in the dataset.

Objectives

Cross Validation: Perform k-fold cross-validation with multiple machine learning algorithms to identify the best-performing model for predicting charged-off loans.
Model Building: Use a Random Forest model for training and format it for hosting as a REST API.
Model Hosting: Deploy the model using Ultra Task to make predictions accessible as an API.
Profit Analysis: Measure the financial impact of the model on overall loan profitability by analyzing the difference in profit with and without machine learning predictions.

Cross validation

The pipeline uses multiple algorithms for k-fold cross-validation to determine which model provides the best accuracy in predicting charged-off loans. The process leverages the Cross Validator (Classification) Snap in SnapLogic.

Model building

The Random Forest algorithm is selected to build the predictive model. The Trainer (Classification) Snap is used to train the model and save it in JSON format.

Model hosting

To enable API access for predictions, the model is deployed using the Predictor (Classification) Snap as an Ultra Task, allowing external applications to query the model.

Profit analysis

This pipeline evaluates the model’s impact on profit by calculating potential savings from reduced charge-offs. Metrics include the total fund, total profit, and average profit per loan before and after implementing machine learning predictions.

API testing

A sample API request is generated using the JSON Generator Snap, and tested with the deployed model API to ensure accurate response handling and prediction results.