All the code referenced in this article can be found on GitHub. The business logic and modeling functions live inside the
src/selectiondirectory, in this particular file:
src/modeling/score_computation.pyThe related analysis and findings are recorded in:
09_score_computation.qmdThe visuals, tables, and charts were created with assistance from the Codex coding tool.
Your credit score trails you at every turn. It determines whether you qualify for a loan, a credit card, or even a rental apartment. The engine powering the majority of these decisions is FICO. Its underlying logic is straightforward once you unpack it.
FICO factors in five areas:
- Payment history (35%): make sure your bills get paid on schedule.
- Amounts owed (30%): keep credit utilization under 20%.
- Length of credit history (15%): a longer track record works in your favor.
- Credit mix (10%): maintain a variety of credit types.
- New credit (10%): don’t apply for too many new accounts.
When you pay your credit card statements on time, your score climbs. Payment history is the heaviest factor.
These weighted components produce a score that falls into specific bands:
- 300–579: Poor.
- 580–669: Fair.
- 670–739: Good.
- 740–799: Very Good.
- 800–850: Excellent.
This article mirrors that same logic but applies it to our own custom model.
We draw on the dataset from this series about building a scoring model. The objective is clear: assign a weight to each retained variable, calculate the score for every client in our dataset, and demonstrate how a new client’s score is derived.
As before, Codex helped write the code and produce the tables and charts. I keep mentioning this because it’s worth emphasizing: AI agents can genuinely accelerate your workflow. But always review what they produce. Confidence only builds through verification. Leverage these tools, but keep your guard up.
Let’s recap what we identified previously. We retained four variables:
loan_int_rate: the interest rate on the loan.loan_percent_income: the proportion of income going toward loan payments.cb_person_default_on_file: whether the borrower has previously defaulted.home_ownership_3: the borrower’s housing situation.
Similar to FICO, we assign each variable a weight and construct a score ranging from 0 to 1000. A high score signals low risk. A low score signals a high likelihood of default.
Turning Model Coefficients into a Score
We convert each individual coefficient into a score value.
Score for each category within a variable
Consider loan_int_rate as an illustration. The score for category is computed as:
In this formula, represents the coefficient for category of variable . Meanwhile, is the largest coefficient for variable . For instance, with the variable loan_int_rate, the largest coefficient is .
Applying this formula yields the score table shown below.

Calculating a client’s score, one step at a time
Consider a new client. We identify which category they belong to for each variable:
loan_int_ratesits at 10%. Score: 181.72.loan_percent_incomesits at 25%. Score: 0.- No prior default (
cb_person_default_on_file = N). Score: 59.52. - They own their home (
home_ownership_3 = OWN). Score: 373.94.
We sum these individual scores to arrive at the client’s overall score:
We perform this same calculation for every single client in our dataset.
Determining the Influence of Each Variable
After computing the score, a natural question arises: which factor has the greatest impact on the result?
To answer this, we evaluate on the training set:

In this context:
- The overbar on denotes the population-weighted mean score for variable j;
Put simply, captures how strongly variable influences the overall score. The wider the spread across its categories, the more important that variable becomes.
The following table displays the weight assigned to each variable.

loan_percent_income carries the largest weight at 35%, followed by home_ownership_3 at 31%, then loan_int_rate at 28%, with cb_person_default_on_file contributing the least.
This outcome is logical. A borrower allocating more than 20% of their income toward loan repayments represents a significant risk factor. It’s reassuring that the model correctly identifies this as the dominant driver of the score.
How Well Does the Score Distinguish Risk Levels?
Before constructing the risk grid, we verify whether the score fulfills its intended purpose: cleanly separating defaulters from non-defaulters.
We visualize the score density for each group, partitioned by default status, across the training, testing, and out-of-time datasets.

The wider the gap between the two curves, the more effective the score.
What emerges: defaulters tend to concentrate at lower scores, while non-defaulters cluster toward higher scores. This is precisely the pattern we aim for — a higher score should correspond to lower risk.
Constructing the Risk Grid
With validation complete, we proceed to build the risk grid.
Step 1: Default rate per score band
We divide the score into 20 equal-width bands and compute the default rate within each. We begin by charting the default rate against the vingtiles (20 equally populated segments) of the final score.

This chart serves as the backbone of the grid: it provides a natural basis for consolidating the 20 segments into six distinct risk tiers.
Step 2: Defining six risk tiers
Guided by the chart, we merge the 20 segments as follows:
- Groups 1, 2, and 3, with scores ranging from 0 to 241, represent the lowest scores and the highest risk.
Groups 4, 5, and 6, with scores ranging from 241 to 331.
Groups 7 and 8, with scores ranging from 332 to 498.
Groups 9, 10, 11, and 12, with scores ranging from 498 to 589.
Groups 13, 14, 15, 16, and 17, with scores ranging from 589 to 780.
Groups 18, 19, and 20, with scores ranging from 781 to 1000, represent the highest scores and the lowest risk.

These categories must adhere to three key principles:
✓ Every category should represent a uniform level of risk;
✓ There must be at least a 30% distinction between one category and the next;
✓ Each category must contain at least 1% of the total client base.

The table above confirms that these principles are being met.
Step 3: Assessing stability
A risk grid is only reliable if it remains consistent over time. We verify two factors:
Riskier categories must consistently show higher default rates throughout the entire history.
The number of clients in each category must remain stable over time.

Both conditions are satisfied: risk maintains its proper order, and category sizes remain consistent.
Conclusion
This article concludes our series on developing a scoring model. We began with the data and now finish with a risk grid.
We created a score ranging from 0 to 1000 by assigning points to each category within every variable. A client’s total score is the sum of these category points. The score effectively separates risk: defaulters and non-defaulters fall into distinctly different ranges.
The weight assigned to each variable: loan_percent_income leads at 35%, followed by home_ownership_3 at 31%, loan_int_rate at 28%, and cb_person_default_on_file at the end.
👉 Useful insight: the greater your income relative to your loan, the higher your score.
The final risk grid:
0–241: Very High Risk.
241–331: High Risk.
332–498: Medium-High Risk.
499–589: Medium Risk.
590–789: Low Risk.
790–1000: Very Low Risk.
I intentionally kept this article concise. We constructed the grid here using vingtiles and visual grouping, but other statistical techniques exist to divide scores into homogeneous categories. K-means, hierarchical clustering, and Weight of Evidence (WoE) all provide a more rigorous approach to achieving the same objective. That will be the focus of my next article.
References
[1] Lorenzo Beretta and Alessandro Santaniello.
Nearest Neighbor Imputation Algorithms: A Critical Evaluation.
National Library of Medicine, 2016.
[2] Nexialog Consulting.
Traitement des données manquantes dans le milieu bancaire.
Working paper, 2022.
[3] John T. Hancock and Taghi M. Khoshgoftaar.
Survey on Categorical Data for Neural Networks.
Journal of Big Data, 7(28), 2020.
[4] Melissa J. Azur, Elizabeth A. Stuart, Constantine Frangakis, and Philip J. Leaf.
Multiple Imputation by Chained Equations: What Is It and How Does It Work?
International Journal of Methods in Psychiatric Research, 2011.
[5] Majid Sarmad.
Robust Data Analysis for Factorial Experimental Designs: Improved Methods and Software.
Department of Mathematical Sciences, University of Durham, England, 2006.
[6] Daniel J. Stekhoven and Peter Bühlmann.
MissForest—Non-Parametric Missing Value Imputation for Mixed-Type Data.Bioinformatics, 2011.
[7] Supriyanto Wibisono, Anwar, and Amin.
Multivariate Weather Anomaly Detection Using the DBSCAN Clustering Algorithm.
Journal of Physics: Conference Series, 2021.
[8] Laborda, J., & Ryoo, S. (2021). Feature selection in a credit scoring model. Mathematics, 9(7), 746.
Data & Licensing
The dataset used in this article is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This license permits anyone to share and adapt the dataset for any purpose, including commercial use, as long as proper attribution is provided to the source.
For further details, refer to the official license text: CC0: Public Domain.
Disclaimer
Any remaining errors or inaccuracies are the sole responsibility of the author. Feedback and corrections are greatly welcomed.



