{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1824e88f",
   "metadata": {},
   "source": [
    "# Capstone 3 — Session 3: Data Analysis with Pandas\n",
    "\n",
    "**Run timestamp:** `2026-02-19 01:49:22`\n",
    "\n",
    "## Goal\n",
    "- Perform structured exploratory analysis on the Capstone 2 processed dataset using categorical profiling, pivot analysis, and distribution tables.\n",
    "- Produce interpretable tables for health, region, demographic, and income patterns to support downstream business insights.\n",
    "\n",
    "## Inputs\n",
    "- `NSMES1988updated.csv` (primary source from `Capstone 2/outputs/`)\n",
    "\n",
    "## Outputs\n",
    "- All exports go to `./outputs/` (and plots to `./outputs/plots/` when applicable)\n",
    "\n",
    "## Libraries (documented)\n",
    "- `pandas`: needed for pivots, crosstabs, grouping, and distribution tables; enabled all required session analyses.\n",
    "\n",
    "## Key dataset note\n",
    "- `age` is encoded as **Age in years (divided by 10)** (e.g., `6.9` = 69 years)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16523c1d",
   "metadata": {},
   "source": [
    "## C?-T0 — Runtime setup (paths + output folders)\n",
    "I used this setup block first to configure reproducible paths and output folders."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "f231dff1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Capstone: 3 | Session: Session 3: Data Analysis with Pandas\n",
      "Run timestamp: 2026-05-12 03:57:44\n",
      "Runtime: Local / notebook runtime\n",
      "Base directory: x:\\SIMPLILEARN\\FrancisBurnetCom\\Incremental Capstones\\Applied Data Science with Python\\Capstone 3\n",
      "Input directory: x:\\SIMPLILEARN\\FrancisBurnetCom\\Incremental Capstones\\Applied Data Science with Python\\Capstone 3\\inputs\n",
      "Output directory: x:\\SIMPLILEARN\\FrancisBurnetCom\\Incremental Capstones\\Applied Data Science with Python\\Capstone 3\\outputs\n",
      "Plots directory: x:\\SIMPLILEARN\\FrancisBurnetCom\\Incremental Capstones\\Applied Data Science with Python\\Capstone 3\\outputs\\plots\n"
     ]
    }
   ],
   "source": [
    "from pathlib import Path\n",
    "from datetime import datetime\n",
    "from urllib.parse import quote\n",
    "from urllib.request import urlretrieve\n",
    "import os\n",
    "import sys\n",
    "\n",
    "try:\n",
    "    from IPython.display import display\n",
    "except Exception:\n",
    "    def display(value):\n",
    "        print(value)\n",
    "\n",
    "# --- Project metadata ---\n",
    "CAPSTONE = 3\n",
    "SESSION_TITLE = 'Session 3: Data Analysis with Pandas'\n",
    "IS_COLAB = 'google.colab' in sys.modules\n",
    "RAW_BASE = os.environ.get('FRANCISBURNET_RAW_BASE', 'https://raw.githubusercontent.com/FrancisBurnet/francisburnet/main')\n",
    "\n",
    "print(f\"Capstone: {CAPSTONE} | Session: {SESSION_TITLE}\")\n",
    "print(\"Run timestamp:\", datetime.now().strftime(\"%Y-%m-%d %H:%M:%S\"))\n",
    "print(\"Runtime:\", \"Google Colab\" if IS_COLAB else \"Local / notebook runtime\")\n",
    "\n",
    "CWD = Path.cwd()\n",
    "if IS_COLAB:\n",
    "    BASE_DIR = Path('/content/francisburnet_capstone_3')\n",
    "elif (CWD / f\"Capstone {CAPSTONE}\").exists():\n",
    "    BASE_DIR = CWD / f\"Capstone {CAPSTONE}\"\n",
    "elif CWD.name == f\"Capstone {CAPSTONE}\":\n",
    "    BASE_DIR = CWD\n",
    "elif (CWD / 'Incremental_Capstone' / f\"Capstone {CAPSTONE}\").exists():\n",
    "    BASE_DIR = CWD / 'Incremental_Capstone' / f\"Capstone {CAPSTONE}\"\n",
    "elif (CWD / 'Incremental Capstones' / 'Applied Data Science with Python' / f\"Capstone {CAPSTONE}\").exists():\n",
    "    BASE_DIR = CWD / 'Incremental Capstones' / 'Applied Data Science with Python' / f\"Capstone {CAPSTONE}\"\n",
    "else:\n",
    "    BASE_DIR = CWD\n",
    "\n",
    "INPUT_DIR = BASE_DIR / 'inputs'\n",
    "OUTPUT_DIR = BASE_DIR / 'outputs'\n",
    "PLOTS_DIR = OUTPUT_DIR / 'plots'\n",
    "INPUT_DIR.mkdir(parents=True, exist_ok=True)\n",
    "OUTPUT_DIR.mkdir(parents=True, exist_ok=True)\n",
    "PLOTS_DIR.mkdir(parents=True, exist_ok=True)\n",
    "\n",
    "def first_existing_path(candidates):\n",
    "    for candidate in candidates:\n",
    "        candidate = Path(candidate).expanduser()\n",
    "        if candidate.exists():\n",
    "            return candidate\n",
    "    return None\n",
    "\n",
    "def github_raw_url(relative_path: str) -> str:\n",
    "    normalized = relative_path.replace('\\\\', '/')\n",
    "    return RAW_BASE.rstrip('/') + '/' + quote(normalized, safe='/')\n",
    "\n",
    "def stage_colab_asset(relative_path: str, destination_name=None) -> Path:\n",
    "    destination = INPUT_DIR / (destination_name or Path(relative_path).name)\n",
    "    if not destination.exists():\n",
    "        asset_url = github_raw_url(relative_path)\n",
    "        urlretrieve(asset_url, destination)\n",
    "        print('Downloaded:', destination.name, 'from', asset_url)\n",
    "    return destination\n",
    "\n",
    "LOCAL_SEARCH_ROOTS = [\n",
    "    INPUT_DIR,\n",
    "    BASE_DIR,\n",
    "    CWD,\n",
    "    CWD / f\"Capstone {CAPSTONE}\",\n",
    "    CWD / 'Incremental_Capstone' / f\"Capstone {CAPSTONE}\",\n",
    "    CWD / 'Incremental Capstones' / 'Applied Data Science with Python' / f\"Capstone {CAPSTONE}\",\n",
    "    CWD.parent / 'Incremental_Capstone' / f\"Capstone {CAPSTONE}\",\n",
    "    CWD.parent / 'Incremental Capstones' / 'Applied Data Science with Python' / f\"Capstone {CAPSTONE}\",\n",
    "]\n",
    "\n",
    "def resolve_dataset_path(default_filename: str) -> Path:\n",
    "    candidates = [Path(root) / default_filename for root in LOCAL_SEARCH_ROOTS]\n",
    "    path = first_existing_path(candidates)\n",
    "    if path is None:\n",
    "        searched_paths = [str(candidate) for candidate in candidates]\n",
    "        raise FileNotFoundError(f\"Dataset not found: {default_filename}. Searched: {searched_paths}\")\n",
    "    return path\n",
    "\n",
    "if IS_COLAB:\n",
    "    staged_assets = [\n",
    "        'Incremental Capstones/Applied Data Science with Python/Capstone 3/Capstone_Session_3.pdf',\n",
    "        'Incremental Capstones/Applied Data Science with Python/Capstone 3/NSMES1988.csv',\n",
    "        'Incremental Capstones/Applied Data Science with Python/Capstone 3/capstone_3.ipynb',\n",
    "    ]\n",
    "    for relative_path in staged_assets:\n",
    "        stage_colab_asset(relative_path)\n",
    "\n",
    "print('Base directory:', BASE_DIR)\n",
    "print('Input directory:', INPUT_DIR)\n",
    "print('Output directory:', OUTPUT_DIR)\n",
    "print('Plots directory:', PLOTS_DIR)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09935ca9",
   "metadata": {},
   "source": [
    "## C?-T1 — Imports I used\n",
    "I documented each import with why I used it and what it enabled in this analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "9b093044",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd  # DataFrames + pivot/crosstab/groupby analysis for categorical and distribution tables"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44d0e642",
   "metadata": {},
   "source": [
    "## C?-T2 — Load dataset\n",
    "I loaded the required Capstone 3 dataset with the configured default and fallback path."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "544a6c66",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Local default not found; using fallback: c:\\DEV_Projects\\SIMPLILEARN\\CAPSTONE_Applied_Data_Science_with _Python\\Incremental_Capstone\\Capstone 2\\outputs\\NSMES1988updated.csv\n",
      "Loaded: c:\\DEV_Projects\\SIMPLILEARN\\CAPSTONE_Applied_Data_Science_with _Python\\Incremental_Capstone\\Capstone 2\\outputs\\NSMES1988updated.csv\n",
      "Shape: (4406, 20)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>visits</th>\n",
       "      <th>nvisits</th>\n",
       "      <th>ovisits</th>\n",
       "      <th>novisits</th>\n",
       "      <th>emergency</th>\n",
       "      <th>hospital</th>\n",
       "      <th>health</th>\n",
       "      <th>chronic</th>\n",
       "      <th>adl</th>\n",
       "      <th>region</th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>married</th>\n",
       "      <th>school</th>\n",
       "      <th>income</th>\n",
       "      <th>employed</th>\n",
       "      <th>insurance</th>\n",
       "      <th>medicaid</th>\n",
       "      <th>age_years</th>\n",
       "      <th>income_dollars</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>average</td>\n",
       "      <td>2</td>\n",
       "      <td>normal</td>\n",
       "      <td>other</td>\n",
       "      <td>6.9</td>\n",
       "      <td>male</td>\n",
       "      <td>yes</td>\n",
       "      <td>6</td>\n",
       "      <td>2.8810</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>no</td>\n",
       "      <td>69</td>\n",
       "      <td>28810</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>average</td>\n",
       "      <td>2</td>\n",
       "      <td>normal</td>\n",
       "      <td>other</td>\n",
       "      <td>7.4</td>\n",
       "      <td>female</td>\n",
       "      <td>yes</td>\n",
       "      <td>10</td>\n",
       "      <td>2.7478</td>\n",
       "      <td>no</td>\n",
       "      <td>yes</td>\n",
       "      <td>no</td>\n",
       "      <td>74</td>\n",
       "      <td>27478</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>13</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>poor</td>\n",
       "      <td>4</td>\n",
       "      <td>limited</td>\n",
       "      <td>other</td>\n",
       "      <td>6.6</td>\n",
       "      <td>female</td>\n",
       "      <td>no</td>\n",
       "      <td>10</td>\n",
       "      <td>0.6532</td>\n",
       "      <td>no</td>\n",
       "      <td>no</td>\n",
       "      <td>yes</td>\n",
       "      <td>66</td>\n",
       "      <td>6532</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>16</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>poor</td>\n",
       "      <td>2</td>\n",
       "      <td>limited</td>\n",
       "      <td>other</td>\n",
       "      <td>7.6</td>\n",
       "      <td>male</td>\n",
       "      <td>yes</td>\n",
       "      <td>3</td>\n",
       "      <td>0.6588</td>\n",
       "      <td>no</td>\n",
       "      <td>yes</td>\n",
       "      <td>no</td>\n",
       "      <td>76</td>\n",
       "      <td>6588</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>average</td>\n",
       "      <td>2</td>\n",
       "      <td>limited</td>\n",
       "      <td>other</td>\n",
       "      <td>7.9</td>\n",
       "      <td>female</td>\n",
       "      <td>yes</td>\n",
       "      <td>6</td>\n",
       "      <td>0.6588</td>\n",
       "      <td>no</td>\n",
       "      <td>yes</td>\n",
       "      <td>no</td>\n",
       "      <td>79</td>\n",
       "      <td>6588</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   visits  nvisits  ovisits  novisits  emergency  hospital   health  chronic  \\\n",
       "0       5        0        0         0          0         1  average        2   \n",
       "1       1        0        2         0          2         0  average        2   \n",
       "2      13        0        0         0          3         3     poor        4   \n",
       "3      16        0        5         0          1         1     poor        2   \n",
       "4       3        0        0         0          0         0  average        2   \n",
       "\n",
       "       adl region  age  gender married  school  income employed insurance  \\\n",
       "0   normal  other  6.9    male     yes       6  2.8810      yes       yes   \n",
       "1   normal  other  7.4  female     yes      10  2.7478       no       yes   \n",
       "2  limited  other  6.6  female      no      10  0.6532       no        no   \n",
       "3  limited  other  7.6    male     yes       3  0.6588       no       yes   \n",
       "4  limited  other  7.9  female     yes       6  0.6588       no       yes   \n",
       "\n",
       "  medicaid  age_years  income_dollars  \n",
       "0       no         69           28810  \n",
       "1       no         74           27478  \n",
       "2      yes         66            6532  \n",
       "3       no         76            6588  \n",
       "4       no         79            6588  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "DEFAULT_DATASET = 'NSMES1988updated.csv'\n",
    "\n",
    "fallback_candidates = [\n",
    "    BASE_DIR.parent / 'Capstone 2' / 'outputs' / DEFAULT_DATASET,\n",
    "    CWD / 'Capstone 2' / 'outputs' / DEFAULT_DATASET,\n",
    "    CWD / 'Incremental_Capstone' / 'Capstone 2' / 'outputs' / DEFAULT_DATASET,\n",
    "    CWD / 'Incremental Capstones' / 'Applied Data Science with Python' / 'Capstone 2' / 'outputs' / DEFAULT_DATASET,\n",
    "    CWD.parent / 'Incremental_Capstone' / 'Capstone 2' / 'outputs' / DEFAULT_DATASET,\n",
    "    CWD.parent / 'Incremental Capstones' / 'Applied Data Science with Python' / 'Capstone 2' / 'outputs' / DEFAULT_DATASET,\n",
    "]\n",
    "\n",
    "try:\n",
    "    dataset_path = resolve_dataset_path(DEFAULT_DATASET)\n",
    "except FileNotFoundError:\n",
    "    fallback = first_existing_path(fallback_candidates)\n",
    "    if fallback is None and IS_COLAB:\n",
    "        fallback = stage_colab_asset(\n",
    "            'Incremental Capstones/Applied Data Science with Python/Capstone 2/outputs/NSMES1988updated.csv'\n",
    "        )\n",
    "    if fallback is None:\n",
    "        searched_paths = [str(candidate) for candidate in fallback_candidates]\n",
    "        raise FileNotFoundError(f\"Dataset not found: {DEFAULT_DATASET}. Searched: {searched_paths}\")\n",
    "    dataset_path = fallback\n",
    "    print('Default dataset not found; using fallback:', dataset_path)\n",
    "\n",
    "df = pd.read_csv(dataset_path)\n",
    "\n",
    "print('Loaded:', dataset_path)\n",
    "print('Shape:', df.shape)\n",
    "display(df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "058b855f",
   "metadata": {},
   "source": [
    "## C?-T3 — Validation checks\n",
    "- I confirmed expected columns exist\n",
    "- I confirmed key dtypes\n",
    "- I checked missing values\n",
    "\n",
    "**Results Capture:**\n",
    "- What I did: I validated expected Capstone 3 schema from Capstone 2 output, checked dtypes, and computed missing values.\n",
    "- What I found: dataframe loaded as `(4406, 20)` with all expected columns including `age_years` and `income_dollars`; total missing values = 0.\n",
    "- Caveats: label columns (`health`, `region`, `gender`, `married`, `employed`, `insurance`, `medicaid`) are categorical and should not be interpreted as continuous numeric values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "2e4d302c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Missing expected columns: []\n",
      "\n",
      "Dtypes:\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "visits              int64\n",
       "nvisits             int64\n",
       "ovisits             int64\n",
       "novisits            int64\n",
       "emergency           int64\n",
       "hospital            int64\n",
       "health                str\n",
       "chronic             int64\n",
       "adl                   str\n",
       "region                str\n",
       "age               float64\n",
       "gender                str\n",
       "married               str\n",
       "school              int64\n",
       "income            float64\n",
       "employed              str\n",
       "insurance             str\n",
       "medicaid              str\n",
       "age_years           int64\n",
       "income_dollars      int64\n",
       "dtype: object"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Missing values (count):\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "visits       0\n",
       "nvisits      0\n",
       "ovisits      0\n",
       "novisits     0\n",
       "emergency    0\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "expected_cols = [\n",
    "    \"visits\", \"nvisits\", \"ovisits\", \"novisits\", \"emergency\", \"hospital\",\n",
    "    \"health\", \"chronic\", \"adl\", \"region\", \"age\", \"gender\",\n",
    "    \"married\", \"school\", \"income\", \"employed\", \"insurance\", \"medicaid\",\n",
    "    \"age_years\", \"income_dollars\"\n",
    "]\n",
    "\n",
    "missing_cols = [c for c in expected_cols if c not in df.columns]\n",
    "print(\"Missing expected columns:\", missing_cols)\n",
    "\n",
    "print(\"\\nDtypes:\")\n",
    "display(df.dtypes)\n",
    "\n",
    "print(\"\\nMissing values (count):\")\n",
    "na_counts = df.isna().sum().sort_values(ascending=False)\n",
    "display(na_counts[na_counts > 0] if (na_counts > 0).any() else na_counts.head())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30960024",
   "metadata": {},
   "source": [
    "## C3-T4 — Identify categorical types + key categories\n",
    "**PDF requirement:** Identify different types of data and identify categorical types.\n",
    "\n",
    "### What I completed\n",
    "- I audited dtypes, grouped variables by type, and listed unique category values for health and region.\n",
    "\n",
    "### Results Capture\n",
    "- Built a type-audit table (`column`, `dtype`, `data_type_group`) for all 20 columns.\n",
    "- `health` categories: `['average', 'excellent', 'poor']`.\n",
    "- `region` categories: `['midwest', 'northeast', 'other', 'west']`.\n",
    "\n",
    "### Code evidence\n",
    "- The next cell shows the dtype audit and category-value extraction I executed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "229d9ae6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column</th>\n",
       "      <th>dtype</th>\n",
       "      <th>data_type_group</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>visits</td>\n",
       "      <td>int64</td>\n",
       "      <td>numeric</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>nvisits</td>\n",
       "      <td>int64</td>\n",
       "      <td>numeric</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>ovisits</td>\n",
       "      <td>int64</td>\n",
       "      <td>numeric</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>novisits</td>\n",
       "      <td>int64</td>\n",
       "      <td>numeric</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>emergency</td>\n",
       "      <td>int64</td>\n",
       "      <td>numeric</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>hospital</td>\n",
       "      <td>int64</td>\n",
       "      <td>numeric</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>health</td>\n",
       "      <td>str</td>\n",
       "      <td>categorical/flag</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>chronic</td>\n",
       "      <td>int64</td>\n",
       "      <td>numeric</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>adl</td>\n",
       "      <td>str</td>\n",
       "      <td>categorical/flag</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>region</td>\n",
       "      <td>str</td>\n",
       "      <td>categorical/flag</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>age</td>\n",
       "      <td>float64</td>\n",
       "      <td>numeric</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>gender</td>\n",
       "      <td>str</td>\n",
       "      <td>categorical/flag</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>married</td>\n",
       "      <td>str</td>\n",
       "      <td>categorical/flag</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>school</td>\n",
       "      <td>int64</td>\n",
       "      <td>numeric</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>income</td>\n",
       "      <td>float64</td>\n",
       "      <td>numeric</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>employed</td>\n",
       "      <td>str</td>\n",
       "      <td>categorical/flag</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>insurance</td>\n",
       "      <td>str</td>\n",
       "      <td>categorical/flag</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>medicaid</td>\n",
       "      <td>str</td>\n",
       "      <td>categorical/flag</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>age_years</td>\n",
       "      <td>int64</td>\n",
       "      <td>numeric</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>income_dollars</td>\n",
       "      <td>int64</td>\n",
       "      <td>numeric</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            column    dtype   data_type_group\n",
       "0           visits    int64           numeric\n",
       "1          nvisits    int64           numeric\n",
       "2          ovisits    int64           numeric\n",
       "3         novisits    int64           numeric\n",
       "4        emergency    int64           numeric\n",
       "5         hospital    int64           numeric\n",
       "6           health      str  categorical/flag\n",
       "7          chronic    int64           numeric\n",
       "8              adl      str  categorical/flag\n",
       "9           region      str  categorical/flag\n",
       "10             age  float64           numeric\n",
       "11          gender      str  categorical/flag\n",
       "12         married      str  categorical/flag\n",
       "13          school    int64           numeric\n",
       "14          income  float64           numeric\n",
       "15        employed      str  categorical/flag\n",
       "16       insurance      str  categorical/flag\n",
       "17        medicaid      str  categorical/flag\n",
       "18       age_years    int64           numeric\n",
       "19  income_dollars    int64           numeric"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Unique values for health:\n",
      "['average', 'excellent', 'poor']\n",
      "\n",
      "Unique values for region:\n",
      "['midwest', 'northeast', 'other', 'west']\n"
     ]
    }
   ],
   "source": [
    "categorical_cols = {\"health\", \"adl\", \"region\", \"gender\", \"married\", \"employed\", \"insurance\", \"medicaid\"}\n",
    "dtype_audit = pd.DataFrame({\n",
    "    \"column\": df.columns,\n",
    "    \"dtype\": [str(df[c].dtype) for c in df.columns],\n",
    "    \"data_type_group\": [\"categorical/flag\" if c in categorical_cols else \"numeric\" for c in df.columns]\n",
    "})\n",
    "display(dtype_audit)\n",
    "\n",
    "for col in [\"health\", \"region\"]:\n",
    "    if col in df.columns:\n",
    "        print(f\"\\nUnique values for {col}:\")\n",
    "        print(sorted(df[col].dropna().unique().tolist()))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2aec8938",
   "metadata": {},
   "source": [
    "## C3-T5 — Pivoting including Health and Region\n",
    "**PDF requirement:** Perform detailed pivoting and include categorical data analysis for Health and Region.\n",
    "\n",
    "### What I completed\n",
    "- I generated pivot tables for counts, mean visits, and mean income by region and health.\n",
    "\n",
    "### Results Capture\n",
    "- Generated 3 detailed pivots by `region × health`: count matrix, mean visits, and mean income (dollars).\n",
    "- Pivot count size = `(4, 3)` and total records represented = `4406`.\n",
    "- Interpretation highlights:\n",
    "  - All four regions contain observations across all three health categories.\n",
    "  - Mean utilization varies by region-health combinations, supporting segmented analysis.\n",
    "  - Mean income differences by region-health indicate socio-economic heterogeneity across segments.\n",
    "\n",
    "### Code evidence\n",
    "- The next cell contains the exact pivot computations and displayed outputs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "21c21e78",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Pivot 1: Count by region x health\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>health</th>\n",
       "      <th>average</th>\n",
       "      <th>excellent</th>\n",
       "      <th>poor</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>region</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>midwest</th>\n",
       "      <td>957</td>\n",
       "      <td>90</td>\n",
       "      <td>110</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>northeast</th>\n",
       "      <td>694</td>\n",
       "      <td>57</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>other</th>\n",
       "      <td>1237</td>\n",
       "      <td>105</td>\n",
       "      <td>272</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>west</th>\n",
       "      <td>621</td>\n",
       "      <td>91</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "health     average  excellent  poor\n",
       "region                             \n",
       "midwest        957         90   110\n",
       "northeast      694         57    86\n",
       "other         1237        105   272\n",
       "west           621         91    86"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Pivot 2: Mean visits by region x health\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>health</th>\n",
       "      <th>average</th>\n",
       "      <th>excellent</th>\n",
       "      <th>poor</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>region</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>midwest</th>\n",
       "      <td>5.282</td>\n",
       "      <td>3.444</td>\n",
       "      <td>8.118</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>northeast</th>\n",
       "      <td>5.695</td>\n",
       "      <td>4.018</td>\n",
       "      <td>10.674</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>other</th>\n",
       "      <td>5.088</td>\n",
       "      <td>3.143</td>\n",
       "      <td>8.746</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>west</th>\n",
       "      <td>6.499</td>\n",
       "      <td>3.374</td>\n",
       "      <td>8.593</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "health     average  excellent    poor\n",
       "region                               \n",
       "midwest      5.282      3.444   8.118\n",
       "northeast    5.695      4.018  10.674\n",
       "other        5.088      3.143   8.746\n",
       "west         6.499      3.374   8.593"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Pivot 3: Mean income_dollars by region x health\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>health</th>\n",
       "      <th>average</th>\n",
       "      <th>excellent</th>\n",
       "      <th>poor</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>region</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>midwest</th>\n",
       "      <td>24984.23</td>\n",
       "      <td>31053.89</td>\n",
       "      <td>21618.12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>northeast</th>\n",
       "      <td>27015.35</td>\n",
       "      <td>33400.12</td>\n",
       "      <td>20659.30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>other</th>\n",
       "      <td>22010.96</td>\n",
       "      <td>30024.58</td>\n",
       "      <td>16851.79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>west</th>\n",
       "      <td>31663.77</td>\n",
       "      <td>37255.87</td>\n",
       "      <td>21118.85</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "health      average  excellent      poor\n",
       "region                                  \n",
       "midwest    24984.23   31053.89  21618.12\n",
       "northeast  27015.35   33400.12  20659.30\n",
       "other      22010.96   30024.58  16851.79\n",
       "west       31663.77   37255.87  21118.85"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Region-health pivots (counts + average visits + average income)\n",
    "if set([\"region\", \"health\"]).issubset(df.columns):\n",
    "    pivot_count = pd.pivot_table(df, index=\"region\", columns=\"health\", values=\"visits\", aggfunc=\"count\", fill_value=0)\n",
    "    pivot_mean_visits = pd.pivot_table(df, index=\"region\", columns=\"health\", values=\"visits\", aggfunc=\"mean\", fill_value=0)\n",
    "    pivot_mean_income = pd.pivot_table(df, index=\"region\", columns=\"health\", values=\"income_dollars\", aggfunc=\"mean\", fill_value=0)\n",
    "    print(\"Pivot 1: Count by region x health\")\n",
    "    display(pivot_count)\n",
    "    print(\"Pivot 2: Mean visits by region x health\")\n",
    "    display(pivot_mean_visits.round(3))\n",
    "    print(\"Pivot 3: Mean income_dollars by region x health\")\n",
    "    display(pivot_mean_income.round(2))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dabf163f",
   "metadata": {},
   "source": [
    "## C3-T6 — Criteria-based analysis (visits, gender, marital, school, income, employed, insurance, medicaid)\n",
    "**PDF requirement:** Analyze the data based on multiple criteria including visits type, gender, marital status, school, income, employed, insurance, medical aid.\n",
    "\n",
    "### What I completed\n",
    "- I produced grouped mean tables for each requested criterion across utilization and income metrics.\n",
    "\n",
    "### Results Capture\n",
    "- Produced one compact mean table per criterion (`gender`, `married`, `school`, `employed`, `insurance`, `medicaid`) for visit metrics and `income_dollars`.\n",
    "- Key findings:\n",
    "  - `gender`: mean visits `female=6.014` vs `male=5.420`; mean income `female=22,493.476`, `male=29,377.156`.\n",
    "  - `insurance`: insured group has higher mean visits (`6.023`) than uninsured (`4.913`).\n",
    "  - `medicaid`: enrollment corresponds to higher mean visits (`6.714`) but lower mean income (`11,365.388`) vs non-enrolled.\n",
    "\n",
    "### Code evidence\n",
    "- The next cell shows all criterion-based grouped tables I generated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "f3960280",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Metric columns guess: ['visits', 'nvisits', 'ovisits', 'novisits', 'emergency', 'hospital']\n",
      "\n",
      "Criteria table: gender\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>visits</th>\n",
       "      <th>nvisits</th>\n",
       "      <th>ovisits</th>\n",
       "      <th>novisits</th>\n",
       "      <th>emergency</th>\n",
       "      <th>hospital</th>\n",
       "      <th>income_dollars</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gender</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>6.014</td>\n",
       "      <td>1.732</td>\n",
       "      <td>0.683</td>\n",
       "      <td>0.513</td>\n",
       "      <td>0.269</td>\n",
       "      <td>0.286</td>\n",
       "      <td>22493.476</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>5.420</td>\n",
       "      <td>1.449</td>\n",
       "      <td>0.851</td>\n",
       "      <td>0.570</td>\n",
       "      <td>0.255</td>\n",
       "      <td>0.311</td>\n",
       "      <td>29377.156</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        visits  nvisits  ovisits  novisits  emergency  hospital  \\\n",
       "gender                                                            \n",
       "female   6.014    1.732    0.683     0.513      0.269     0.286   \n",
       "male     5.420    1.449    0.851     0.570      0.255     0.311   \n",
       "\n",
       "        income_dollars  \n",
       "gender                  \n",
       "female       22493.476  \n",
       "male         29377.156  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Criteria table: married\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>visits</th>\n",
       "      <th>nvisits</th>\n",
       "      <th>ovisits</th>\n",
       "      <th>novisits</th>\n",
       "      <th>emergency</th>\n",
       "      <th>hospital</th>\n",
       "      <th>income_dollars</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>married</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>no</th>\n",
       "      <td>5.932</td>\n",
       "      <td>1.549</td>\n",
       "      <td>0.650</td>\n",
       "      <td>0.507</td>\n",
       "      <td>0.302</td>\n",
       "      <td>0.311</td>\n",
       "      <td>17194.294</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>yes</th>\n",
       "      <td>5.644</td>\n",
       "      <td>1.675</td>\n",
       "      <td>0.835</td>\n",
       "      <td>0.560</td>\n",
       "      <td>0.232</td>\n",
       "      <td>0.283</td>\n",
       "      <td>31985.391</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         visits  nvisits  ovisits  novisits  emergency  hospital  \\\n",
       "married                                                            \n",
       "no        5.932    1.549    0.650     0.507      0.302     0.311   \n",
       "yes       5.644    1.675    0.835     0.560      0.232     0.283   \n",
       "\n",
       "         income_dollars  \n",
       "married                  \n",
       "no            17194.294  \n",
       "yes           31985.391  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Criteria table: school\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>visits</th>\n",
       "      <th>nvisits</th>\n",
       "      <th>ovisits</th>\n",
       "      <th>novisits</th>\n",
       "      <th>emergency</th>\n",
       "      <th>hospital</th>\n",
       "      <th>income_dollars</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>school</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>4.214</td>\n",
       "      <td>0.806</td>\n",
       "      <td>1.748</td>\n",
       "      <td>1.709</td>\n",
       "      <td>0.282</td>\n",
       "      <td>0.340</td>\n",
       "      <td>21664.155</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5.538</td>\n",
       "      <td>0.154</td>\n",
       "      <td>0.385</td>\n",
       "      <td>0.077</td>\n",
       "      <td>0.231</td>\n",
       "      <td>0.462</td>\n",
       "      <td>12929.154</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5.079</td>\n",
       "      <td>0.184</td>\n",
       "      <td>0.421</td>\n",
       "      <td>0.263</td>\n",
       "      <td>0.184</td>\n",
       "      <td>0.342</td>\n",
       "      <td>13085.395</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>6.239</td>\n",
       "      <td>0.535</td>\n",
       "      <td>1.014</td>\n",
       "      <td>0.296</td>\n",
       "      <td>0.662</td>\n",
       "      <td>0.423</td>\n",
       "      <td>14183.507</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4.750</td>\n",
       "      <td>0.910</td>\n",
       "      <td>0.520</td>\n",
       "      <td>0.180</td>\n",
       "      <td>0.320</td>\n",
       "      <td>0.260</td>\n",
       "      <td>13851.000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5.854</td>\n",
       "      <td>0.417</td>\n",
       "      <td>0.718</td>\n",
       "      <td>0.058</td>\n",
       "      <td>0.350</td>\n",
       "      <td>0.359</td>\n",
       "      <td>14735.214</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>5.358</td>\n",
       "      <td>1.116</td>\n",
       "      <td>0.428</td>\n",
       "      <td>1.006</td>\n",
       "      <td>0.353</td>\n",
       "      <td>0.341</td>\n",
       "      <td>17893.104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>5.448</td>\n",
       "      <td>0.982</td>\n",
       "      <td>0.529</td>\n",
       "      <td>0.457</td>\n",
       "      <td>0.285</td>\n",
       "      <td>0.285</td>\n",
       "      <td>16157.104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>5.582</td>\n",
       "      <td>1.431</td>\n",
       "      <td>0.785</td>\n",
       "      <td>0.507</td>\n",
       "      <td>0.281</td>\n",
       "      <td>0.307</td>\n",
       "      <td>20111.042</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>4.795</td>\n",
       "      <td>1.043</td>\n",
       "      <td>0.765</td>\n",
       "      <td>0.436</td>\n",
       "      <td>0.333</td>\n",
       "      <td>0.380</td>\n",
       "      <td>18457.111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>5.696</td>\n",
       "      <td>1.913</td>\n",
       "      <td>0.817</td>\n",
       "      <td>0.554</td>\n",
       "      <td>0.315</td>\n",
       "      <td>0.284</td>\n",
       "      <td>19863.163</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>5.789</td>\n",
       "      <td>1.599</td>\n",
       "      <td>0.938</td>\n",
       "      <td>0.339</td>\n",
       "      <td>0.264</td>\n",
       "      <td>0.233</td>\n",
       "      <td>21885.476</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>5.887</td>\n",
       "      <td>1.932</td>\n",
       "      <td>0.776</td>\n",
       "      <td>0.574</td>\n",
       "      <td>0.222</td>\n",
       "      <td>0.304</td>\n",
       "      <td>27290.700</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>6.378</td>\n",
       "      <td>1.866</td>\n",
       "      <td>0.488</td>\n",
       "      <td>0.372</td>\n",
       "      <td>0.331</td>\n",
       "      <td>0.291</td>\n",
       "      <td>29762.744</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>6.126</td>\n",
       "      <td>1.828</td>\n",
       "      <td>0.504</td>\n",
       "      <td>0.483</td>\n",
       "      <td>0.206</td>\n",
       "      <td>0.227</td>\n",
       "      <td>32537.803</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>7.081</td>\n",
       "      <td>1.663</td>\n",
       "      <td>0.535</td>\n",
       "      <td>0.605</td>\n",
       "      <td>0.105</td>\n",
       "      <td>0.186</td>\n",
       "      <td>34549.686</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>6.092</td>\n",
       "      <td>2.616</td>\n",
       "      <td>0.900</td>\n",
       "      <td>0.592</td>\n",
       "      <td>0.184</td>\n",
       "      <td>0.248</td>\n",
       "      <td>43962.132</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>6.478</td>\n",
       "      <td>3.065</td>\n",
       "      <td>0.761</td>\n",
       "      <td>0.391</td>\n",
       "      <td>0.152</td>\n",
       "      <td>0.196</td>\n",
       "      <td>51312.761</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>7.887</td>\n",
       "      <td>1.915</td>\n",
       "      <td>0.679</td>\n",
       "      <td>0.500</td>\n",
       "      <td>0.151</td>\n",
       "      <td>0.274</td>\n",
       "      <td>54112.377</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        visits  nvisits  ovisits  novisits  emergency  hospital  \\\n",
       "school                                                            \n",
       "0        4.214    0.806    1.748     1.709      0.282     0.340   \n",
       "1        5.538    0.154    0.385     0.077      0.231     0.462   \n",
       "2        5.079    0.184    0.421     0.263      0.184     0.342   \n",
       "3        6.239    0.535    1.014     0.296      0.662     0.423   \n",
       "4        4.750    0.910    0.520     0.180      0.320     0.260   \n",
       "5        5.854    0.417    0.718     0.058      0.350     0.359   \n",
       "6        5.358    1.116    0.428     1.006      0.353     0.341   \n",
       "7        5.448    0.982    0.529     0.457      0.285     0.285   \n",
       "8        5.582    1.431    0.785     0.507      0.281     0.307   \n",
       "9        4.795    1.043    0.765     0.436      0.333     0.380   \n",
       "10       5.696    1.913    0.817     0.554      0.315     0.284   \n",
       "11       5.789    1.599    0.938     0.339      0.264     0.233   \n",
       "12       5.887    1.932    0.776     0.574      0.222     0.304   \n",
       "13       6.378    1.866    0.488     0.372      0.331     0.291   \n",
       "14       6.126    1.828    0.504     0.483      0.206     0.227   \n",
       "15       7.081    1.663    0.535     0.605      0.105     0.186   \n",
       "16       6.092    2.616    0.900     0.592      0.184     0.248   \n",
       "17       6.478    3.065    0.761     0.391      0.152     0.196   \n",
       "18       7.887    1.915    0.679     0.500      0.151     0.274   \n",
       "\n",
       "        income_dollars  \n",
       "school                  \n",
       "0            21664.155  \n",
       "1            12929.154  \n",
       "2            13085.395  \n",
       "3            14183.507  \n",
       "4            13851.000  \n",
       "5            14735.214  \n",
       "6            17893.104  \n",
       "7            16157.104  \n",
       "8            20111.042  \n",
       "9            18457.111  \n",
       "10           19863.163  \n",
       "11           21885.476  \n",
       "12           27290.700  \n",
       "13           29762.744  \n",
       "14           32537.803  \n",
       "15           34549.686  \n",
       "16           43962.132  \n",
       "17           51312.761  \n",
       "18           54112.377  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Criteria table: employed\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>visits</th>\n",
       "      <th>nvisits</th>\n",
       "      <th>ovisits</th>\n",
       "      <th>novisits</th>\n",
       "      <th>emergency</th>\n",
       "      <th>hospital</th>\n",
       "      <th>income_dollars</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>employed</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>no</th>\n",
       "      <td>5.807</td>\n",
       "      <td>1.640</td>\n",
       "      <td>0.786</td>\n",
       "      <td>0.549</td>\n",
       "      <td>0.269</td>\n",
       "      <td>0.304</td>\n",
       "      <td>23600.560</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>yes</th>\n",
       "      <td>5.495</td>\n",
       "      <td>1.426</td>\n",
       "      <td>0.444</td>\n",
       "      <td>0.424</td>\n",
       "      <td>0.220</td>\n",
       "      <td>0.226</td>\n",
       "      <td>39779.402</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          visits  nvisits  ovisits  novisits  emergency  hospital  \\\n",
       "employed                                                            \n",
       "no         5.807    1.640    0.786     0.549      0.269     0.304   \n",
       "yes        5.495    1.426    0.444     0.424      0.220     0.226   \n",
       "\n",
       "          income_dollars  \n",
       "employed                  \n",
       "no             23600.560  \n",
       "yes            39779.402  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Criteria table: insurance\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>visits</th>\n",
       "      <th>nvisits</th>\n",
       "      <th>ovisits</th>\n",
       "      <th>novisits</th>\n",
       "      <th>emergency</th>\n",
       "      <th>hospital</th>\n",
       "      <th>income_dollars</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>insurance</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>no</th>\n",
       "      <td>4.913</td>\n",
       "      <td>0.934</td>\n",
       "      <td>1.056</td>\n",
       "      <td>0.436</td>\n",
       "      <td>0.333</td>\n",
       "      <td>0.311</td>\n",
       "      <td>16629.839</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>yes</th>\n",
       "      <td>6.023</td>\n",
       "      <td>1.815</td>\n",
       "      <td>0.663</td>\n",
       "      <td>0.565</td>\n",
       "      <td>0.243</td>\n",
       "      <td>0.292</td>\n",
       "      <td>27759.441</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           visits  nvisits  ovisits  novisits  emergency  hospital  \\\n",
       "insurance                                                            \n",
       "no          4.913    0.934    1.056     0.436      0.333     0.311   \n",
       "yes         6.023    1.815    0.663     0.565      0.243     0.292   \n",
       "\n",
       "           income_dollars  \n",
       "insurance                  \n",
       "no              16629.839  \n",
       "yes             27759.441  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Criteria table: medicaid\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>visits</th>\n",
       "      <th>nvisits</th>\n",
       "      <th>ovisits</th>\n",
       "      <th>novisits</th>\n",
       "      <th>emergency</th>\n",
       "      <th>hospital</th>\n",
       "      <th>income_dollars</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>medicaid</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>no</th>\n",
       "      <td>5.680</td>\n",
       "      <td>1.654</td>\n",
       "      <td>0.725</td>\n",
       "      <td>0.534</td>\n",
       "      <td>0.246</td>\n",
       "      <td>0.284</td>\n",
       "      <td>26667.471</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>yes</th>\n",
       "      <td>6.714</td>\n",
       "      <td>1.259</td>\n",
       "      <td>1.007</td>\n",
       "      <td>0.555</td>\n",
       "      <td>0.443</td>\n",
       "      <td>0.415</td>\n",
       "      <td>11365.388</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          visits  nvisits  ovisits  novisits  emergency  hospital  \\\n",
       "medicaid                                                            \n",
       "no         5.680    1.654    0.725     0.534      0.246     0.284   \n",
       "yes        6.714    1.259    1.007     0.555      0.443     0.415   \n",
       "\n",
       "          income_dollars  \n",
       "medicaid                  \n",
       "no             26667.471  \n",
       "yes            11365.388  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "criteria_cols = [\"gender\", \"married\", \"school\", \"employed\", \"insurance\", \"medicaid\"]\n",
    "metric_cols = [c for c in [\"visits\", \"nvisits\", \"ovisits\", \"novisits\", \"emergency\", \"hospital\"] if c in df.columns]\n",
    "\n",
    "print(\"Metric columns guess:\", metric_cols)\n",
    "\n",
    "for c in criteria_cols:\n",
    "    if c in df.columns and metric_cols:\n",
    "        print(f\"\\nCriteria table: {c}\")\n",
    "        table = df.groupby(c)[metric_cols + [\"income_dollars\"]].mean(numeric_only=True).round(3)\n",
    "        display(table)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2459604d",
   "metadata": {},
   "source": [
    "## C3-T7 — Create required distribution tables\n",
    "**PDF requirement:** Create distribution tables: Age+Gender; Health by Gender; Income by Gender; Regional Income; Age-wise Income.\n",
    "\n",
    "### What I completed\n",
    "- I built all required distribution tables using age groups, crosstabs, and grouped income summaries.\n",
    "\n",
    "### Results Capture\n",
    "- Built all required distribution tables:\n",
    "  - Age + Gender (`5x2`), Health by Gender (`3x2`), Income by Gender (`2x3`), Regional Income (`4x3`), Age-wise Income (`5x3`).\n",
    "- Notable patterns:\n",
    "  - Regional mean income is highest in `west` and lowest in `other`.\n",
    "  - Male mean income exceeds female mean income, while female mean visits are higher.\n",
    "  - Older age groups remain strongly represented across both genders.\n",
    "\n",
    "### Code evidence\n",
    "- The next cell contains the exact table-generation logic for all required distributions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "34abf88a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Distribution 1: Age + Gender\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>gender</th>\n",
       "      <th>female</th>\n",
       "      <th>male</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>age_group</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>(64.999, 70.0]</th>\n",
       "      <td>897</td>\n",
       "      <td>671</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(70.0, 75.0]</th>\n",
       "      <td>736</td>\n",
       "      <td>530</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(75.0, 80.0]</th>\n",
       "      <td>525</td>\n",
       "      <td>321</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(80.0, 85.0]</th>\n",
       "      <td>303</td>\n",
       "      <td>176</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(85.0, 110.0]</th>\n",
       "      <td>167</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "gender          female  male\n",
       "age_group                   \n",
       "(64.999, 70.0]     897   671\n",
       "(70.0, 75.0]       736   530\n",
       "(75.0, 80.0]       525   321\n",
       "(80.0, 85.0]       303   176\n",
       "(85.0, 110.0]      167    80"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Distribution 2: Health by Gender\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>gender</th>\n",
       "      <th>female</th>\n",
       "      <th>male</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>health</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>average</th>\n",
       "      <td>2093</td>\n",
       "      <td>1416</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>excellent</th>\n",
       "      <td>193</td>\n",
       "      <td>150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>poor</th>\n",
       "      <td>342</td>\n",
       "      <td>212</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "gender     female  male\n",
       "health                 \n",
       "average      2093  1416\n",
       "excellent     193   150\n",
       "poor          342   212"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Distribution 3: Income by Gender\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>median</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>income_dollars</th>\n",
       "      <th>income_dollars</th>\n",
       "      <th>income_dollars</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gender</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>2628</td>\n",
       "      <td>22493.48</td>\n",
       "      <td>14160.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>1778</td>\n",
       "      <td>29377.16</td>\n",
       "      <td>20574.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                count           mean         median\n",
       "       income_dollars income_dollars income_dollars\n",
       "gender                                             \n",
       "female           2628       22493.48        14160.0\n",
       "male             1778       29377.16        20574.0"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Distribution 4: Regional Income\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>median</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>income_dollars</th>\n",
       "      <th>income_dollars</th>\n",
       "      <th>income_dollars</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>region</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>midwest</th>\n",
       "      <td>1157</td>\n",
       "      <td>25136.34</td>\n",
       "      <td>17875.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>northeast</th>\n",
       "      <td>837</td>\n",
       "      <td>26797.09</td>\n",
       "      <td>17413.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>other</th>\n",
       "      <td>1614</td>\n",
       "      <td>21662.84</td>\n",
       "      <td>14220.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>west</th>\n",
       "      <td>798</td>\n",
       "      <td>31165.05</td>\n",
       "      <td>20656.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   count           mean         median\n",
       "          income_dollars income_dollars income_dollars\n",
       "region                                                \n",
       "midwest             1157       25136.34        17875.0\n",
       "northeast            837       26797.09        17413.0\n",
       "other               1614       21662.84        14220.0\n",
       "west                 798       31165.05        20656.0"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Distribution 5: Age-wise Income\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>median</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>income_dollars</th>\n",
       "      <th>income_dollars</th>\n",
       "      <th>income_dollars</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>age_group</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>(64.999, 70.0]</th>\n",
       "      <td>1568</td>\n",
       "      <td>27488.76</td>\n",
       "      <td>19872.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(70.0, 75.0]</th>\n",
       "      <td>1266</td>\n",
       "      <td>26194.35</td>\n",
       "      <td>16986.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(75.0, 80.0]</th>\n",
       "      <td>846</td>\n",
       "      <td>22997.60</td>\n",
       "      <td>15135.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(80.0, 85.0]</th>\n",
       "      <td>479</td>\n",
       "      <td>20269.47</td>\n",
       "      <td>12886.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>(85.0, 110.0]</th>\n",
       "      <td>247</td>\n",
       "      <td>23951.36</td>\n",
       "      <td>13940.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        count           mean         median\n",
       "               income_dollars income_dollars income_dollars\n",
       "age_group                                                  \n",
       "(64.999, 70.0]           1568       27488.76        19872.0\n",
       "(70.0, 75.0]             1266       26194.35        16986.0\n",
       "(75.0, 80.0]              846       22997.60        15135.0\n",
       "(80.0, 85.0]              479       20269.47        12886.0\n",
       "(85.0, 110.0]             247       23951.36        13940.0"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df3 = df.copy()\n",
    "\n",
    "age_col = \"age_years\" if \"age_years\" in df3.columns else \"age\"\n",
    "income_col = \"income_dollars\" if \"income_dollars\" in df3.columns else \"income\"\n",
    "\n",
    "if age_col in df3.columns:\n",
    "    df3[\"age_group\"] = pd.cut(df3[age_col].astype(float), bins=[65, 70, 75, 80, 85, 110], include_lowest=True)\n",
    "\n",
    "if income_col in df3.columns:\n",
    "    quantiles = df3[income_col].quantile([0, .2, .4, .6, .8, 1.0]).values\n",
    "    edges = [quantiles[0]]\n",
    "    for value in quantiles[1:]:\n",
    "        edges.append(value if value > edges[-1] else edges[-1] + 1)\n",
    "    df3[\"income_band\"] = pd.cut(df3[income_col].astype(float), bins=edges, include_lowest=True)\n",
    "\n",
    "if set([\"age_group\", \"gender\"]).issubset(df3.columns):\n",
    "    print(\"Distribution 1: Age + Gender\")\n",
    "    display(pd.crosstab(df3[\"age_group\"], df3[\"gender\"]))\n",
    "\n",
    "if set([\"health\", \"gender\"]).issubset(df3.columns):\n",
    "    print(\"Distribution 2: Health by Gender\")\n",
    "    display(pd.crosstab(df3[\"health\"], df3[\"gender\"]))\n",
    "\n",
    "if set([\"income_dollars\", \"gender\"]).issubset(df3.columns):\n",
    "    print(\"Distribution 3: Income by Gender\")\n",
    "    display(pd.pivot_table(df3, index=\"gender\", values=\"income_dollars\", aggfunc=[\"count\", \"mean\", \"median\"]).round(2))\n",
    "\n",
    "if set([\"income_dollars\", \"region\"]).issubset(df3.columns):\n",
    "    print(\"Distribution 4: Regional Income\")\n",
    "    display(pd.pivot_table(df3, index=\"region\", values=\"income_dollars\", aggfunc=[\"count\", \"mean\", \"median\"]).round(2))\n",
    "\n",
    "if set([\"age_group\", \"income_dollars\"]).issubset(df3.columns):\n",
    "    print(\"Distribution 5: Age-wise Income\")\n",
    "    display(pd.pivot_table(df3, index=\"age_group\", values=\"income_dollars\", aggfunc=[\"count\", \"mean\", \"median\"]).round(2))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa09bb46",
   "metadata": {},
   "source": [
    "## Final section — Conclusions (required)\n",
    "- I used `Capstone 2/outputs/NSMES1988updated.csv` as the input for this analysis.\n",
    "- I completed categorical profiling and pivoting for health and region with clear segmentation outputs.\n",
    "- I documented meaningful criteria-based differences across gender, insurance, medicaid, marital status, employment, and schooling groups.\n",
    "- I generated and interpreted all required distribution tables.\n",
    "- I updated `WORK_SUMMARY.md` with evidence and marked all Capstone 3 tasks complete.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "1db0d1f3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Capstone 3 completed: C3-T4 to C3-T7\n",
      "Input used: Capstone 2/outputs/NSMES1988updated.csv\n"
     ]
    }
   ],
   "source": [
    "print(\"Capstone 3 completed: C3-T4 to C3-T7\")\n",
    "print(\"Input used: Capstone 2/outputs/NSMES1988updated.csv\")\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv (3.12.10)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
