Case Study Overview
Objective
Determine the income distribution of 100 clients from a bank to aid in the segmentation of their financial services, helping the bank better understand its customer base and tailor its offerings accordingly.
Challenge
Ensuring accurate income categorization across R$2k intervals while maintaining statistical relevance for financial segmentation. The analysis required precise midpoint assumptions for mean calculations, reconciliation of median class boundaries, and mitigation of modal group distortions. This demanded rigorous frequency distribution validation, careful handling of cumulative percentages for quartile identification, and translation of statistical outputs into actionable banking strategies that balance risk management (15% base tier) with revenue optimization (20% premium tier), all while maintaining GDPR-compliant data handling for sensitive financial information.
Context
You are a manager or supervisor working at a financial consultancy and have been tasked with analysing the monthly income of a sample of 100 clients from a bank. The goal is to understand the distribution of the clients' incomes in order to help the bank better segment its financial services.
After collecting the income data from each client, you created a frequency distribution table to summarise and visualise the data:
Income Range R$ | Frequency |
---|---|
0 --| 2000 | 15 |
2000 --| 4000 | 30 |
4000 --| 6000 | 25 |
6000 --| 8000 | 20 |
8000 --| 10000 | 10 |
Frequency Distribution Table: Monthly Income
Source: Author (fictional data).
From the data, you need to perform the necessary calculations and answer the following questions:
Complete the frequency distribution table as per the model below:
Income Range R$ | Frequency | Relative Frequency | Relative Frequency % | Cumulative Absolute Frequency |
---|---|---|---|---|
0 --| 2000 | 15 | |||
2000 --| 4000 | 30 | |||
4000 --| 6000 | 25 | |||
6000 --| 8000 | 20 | |||
8000 --| 10000 | 10 |
Source: Author (fictional data).
Income Range R$ | Frequency | Relative Frequency | Relative Frequency % | Cumulative Absolute Frequency |
---|---|---|---|---|
0 --| 2000 | 15 | 0.15 | 15% | 15 |
2000 --| 4000 | 30 | 0.30 | 30% | 45 |
4000 --| 6000 | 25 | 0.25 | 25% | 70 |
6000 --| 8000 | 20 | 0.20 | 20% | 90 |
8000 --| 10000 | 10 | 0.10 | 10% | 100 |
The calculations are performed as follows:
- Relative Frequency: The absolute frequency is divided by the total number of observations.
- Percentage Relative Frequency: The relative frequency is multiplied by 100.
- Cumulative Absolute Frequency: The current absolute frequency is added to the previous cumulative absolute frequency.
Income Range: 0 - 2000
Relative Frequency:
\( f(rel\ 0-2000) = \frac{f_i}{n} \)
\( f(rel\ 0-2000) = \frac{15}{100} \)
\( f(rel\ 0-2000) = 0.15 \)
Percentage Relative Frequency:
\( f(rel\%\ 0-2000) = f_i \times 100 \)
\( f(rel\%\ 0-2000) = 0.15 \times 100 \)
\( f(rel\%\ 0-2000) = 15\% \)
Cumulative Absolute Frequency:
\( f(Ac\ 0-2000) = f_i + f_{Ac-1} \)
\( f(Ac\ 0-2000) = 15 + 0 \)
\( f(Ac\ 0-2000) = 15 \)
Income Range: 2000 - 4000
Relative Frequency:
\( f(rel\ 2000-4000) = \frac{30}{100} \)
\( f(rel\ 2000-4000) = 0.30 \)
Percentage Relative Frequency:
\( f(rel\%\ 2000-4000) = 0.30 \times 100 \)
\( f(rel\%\ 2000-4000) = 30\% \)
Cumulative Absolute Frequency:
\( f(Ac\ 2000-4000) = 30 + 15 \)
\( f(Ac\ 2000-4000) = 45 \)
Income Range: 4000 - 6000
Relative Frequency:
\( f(rel\ 4000-6000) = \frac{25}{100} \)
\( f(rel\ 4000-6000) = 0.25 \)
Percentage Relative Frequency:
\( f(rel\%\ 4000-6000) = 0.25 \times 100 \)
\( f(rel\%\ 4000-6000) = 25\% \)
Cumulative Absolute Frequency:
\( f(Ac\ 4000-6000) = 25 + 45 \)
\( f(Ac\ 4000-6000) = 70 \)
Income Range: 6000 - 8000
Relative Frequency:
\( f(rel\ 6000-8000) = \frac{20}{100} \)
\( f(rel\ 6000-8000) = 0.20 \)
Percentage Relative Frequency:
\( f(rel\%\ 6000-8000) = 0.20 \times 100 \)
\( f(rel\%\ 6000-8000) = 20\% \)
Cumulative Absolute Frequency:
\( f(Ac\ 6000-8000) = 20 + 70 \)
\( f(Ac\ 6000-8000) = 90 \)
Income Range: 8000 - 10000
Relative Frequency:
\( f(rel\ 8000-10000) = \frac{10}{100} \)
\( f(rel\ 8000-10000) = 0.10 \)
Percentage Relative Frequency:
\( f(rel\%\ 8000-10000) = 0.10 \times 100 \)
\( f(rel\%\ 8000-10000) = 10\% \)
Cumulative Absolute Frequency:
\( f(Ac\ 8000-10000) = 10 + 90 \)
\( f(Ac\ 8000-10000) = 100 \)
Graphical Representation
For the graphical visualisation and application of other concepts studied in the discipline, below is the histogram of the income distribution based on absolute frequencies.
Frequency Distribution Chart: Monthly Income
Source: Prepared by the student
Questions
a. What is the average income of the sample clients?
The average income of the sample customers is calculated as follows:
\( \bar{x} = \frac{\sum (x_j \cdot f_j)}{\sum f_j} \)
\( \bar{x} = \frac{(15 \cdot 1000) + (30 \cdot 3000) + (25 \cdot 5000) + (20 \cdot 7000) + (10 \cdot 9000)}{100} \)
\( \bar{x} = \frac{15000 + 90000 + 125000 + 140000 + 90000}{100} \)
\( \bar{x} = 4600 \)
The average income of the sample customers is R$ 4600.00.
b. What is the median income of the sample clients?
The median income is calculated as follows:
\( \tilde{x} = l_{Md} + \frac{\left(\frac{n}{2} - f_{AC_{Md-1}}\right) \cdot h}{f_{Md}} \)
\( \tilde{x} = 4000 + \frac{(50 - 45) \cdot 2000}{25} \)
\( \tilde{x} = 4400 \)
The median income of the sample customers is R$ 4400.00.
c. What is the most popular income range of the sample clients?
The modal income is calculated as follows:
\( \hat{x} = l_{Mo} + \frac{(f_{Mo} - f_{Mo-1}) \cdot h}{(f_{Mo} - f_{Mo-1}) + (f_{Mo} - f_{Mo+1})} \)
\( \hat{x} = 2000 + \frac{(30 - 15) \cdot 2000}{(30 - 15) + (30 - 25)} \)
\( \hat{x} = 3500 \)
The most common income among the sample customers is R$ 3500.00.
d. How can this information be useful to the bank in segmenting its financial services?
Knowledge of the mean, median, and mode of the bank's customers can assist in offering specific products tailored to the income ranges of the customers. Products such as insurance policies at an acceptable value, loans with affordable instalments, investment plans that are attractive to these customers, among others.
Additionally, this knowledge can aid in risk analysis of customers and help the bank decide on the limits it can set for customers, reducing the risk of default. Credit card limits, overdrafts, personal loan amounts, mortgages, and financing can all be adjusted based on this analysis. While this analysis should be done on a customer-by-customer basis and consider other factors, it can help define an acceptable risk based on the overall data.
Furthermore, by understanding the existing customer base, the bank can direct its marketing efforts towards people within this income range or attempt to attract customers from other income ranges if desired.
Analytical Outcomes
Data Insights Achievements
- 45% client concentration in R$2k-4k income bracket
- R$4,600 mean vs R$4,400 median income alignment
- 30% modal frequency in mid-income range
- 20% high-income client representation
- 15% base-tier financial inclusion potential
- 90th percentile at R$8k income threshold
Method Validation
The statistical analysis demonstrated:
- Effective use of Czuber's formula for modal determination
- Accurate median calculation through cumulative frequencies
- Precise weighted mean computation per income brackets