7a

Clean The Credit Card Dataset And Scale The Features

Notebook section: Load, null-handling, and scaling cells

Requirement: Load the dataset, check nulls, handle missing values, and apply StandardScaler before PCA.

The notebook removes the identifier column, fills the missing credit-limit and minimum-payment fields with medians, and scales the remaining numeric feature set for PCA.

Results Capture

Missing-value counts filled: {"MINIMUM_PAYMENTS":313,"CREDIT_LIMIT":1}.
Scaling is applied to the full feature matrix before PCA fitting.

working_df = df.drop(columns=['CUST_ID']).copy()
for column in working_df.columns:
    if working_df[column].isna().any():
        working_df[column] = working_df[column].fillna(working_df[column].median())
scaled = scaler.fit_transform(working_df)

Capstone 7 Evidence Map

Capstone 7 Scope

Original Project PDF

Requirement Checklist

Cluster the credit card users into different groups to find meaningful patterns.

Use Principal Component Analysis (PCA) to reduce the dimension of the feature space.

Use the K-means algorithm to find clusters.

Import relevant Python libraries.

Load dataset `CC GENERAL.csv`.

Check for null values.

Handle the null values.

Perform feature scaling using `StandardScaler`.

Perform PCA with all columns.

Plot number of components versus PCA cumulative explained variance.

Identify the number of components required to cover 85 percent of the variance.

Perform PCA with 2 principal components for clustering visualization.

Find the 2 columns that give the most covariances.

Interpret the PCA results by looking at the covariance matrix using `get_covariance()`.

Perform K Means clustering on the 2-component PCA transformed data with clusters ranging from 2 to 11.

Plot K Means inertia against the number of clusters using the Elbow Method.

Identify the ideal number of clusters from the elbow plot.

Perform K Means clustering on the 2-component PCA transformed data using the ideal number of clusters.

Visualize the clusters on a scatter plot between the 1st and 2nd PCA components using different colors for each cluster.

Requirement Walkthrough

Clean The Credit Card Dataset And Scale The Features

Use PCA To Measure Variance Coverage And Covariance

PCA Explained Variance

KMeans Elbow

Run KMeans And Visualize The Final Clusters

Cluster Scatter Plot

Data And Artifact Links

Project PDF

Notebook Evidence

Requirements File

Original CSV Dataset

JSON Output

CSV Output

Cluster Assignments CSV

Covariance Matrix CSV

Summary JSON

Colab Notebook

Capstone Session 7

Objective

Notebook Launch

Outputs And Results