Skip to content
Open

Main #40

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
8dd5898
Create main.yml
liyun0016 Jun 21, 2025
e14c814
Initialize DVC
liyun0016 Jun 21, 2025
d8b253e
Stop tracking census.csv with Git, will use DVC instead
liyun0016 Jun 21, 2025
4f3d6e3
Track census.csv with DVC
liyun0016 Jun 21, 2025
a5c8927
Track .gitignore with DVC
liyun0016 Jun 21, 2025
ba76f9d
Update cleaned census.csv in DVC
liyun0016 Jun 21, 2025
93e705e
update the model and unit tests
liyun0016 Jun 21, 2025
abcea3f
update model card
liyun0016 Jun 21, 2025
9f98084
Merge branch 'master' of https://github.com/liyun0016/nd0821-c3-start…
liyun0016 Jun 21, 2025
6f6f03c
create FastAPI
liyun0016 Jun 21, 2025
46cf945
update FastAPI - wrong column name
liyun0016 Jun 23, 2025
d2854f4
upload main.test
liyun0016 Jun 23, 2025
db87c07
update requirements
liyun0016 Jun 23, 2025
9a371a4
move requriements
liyun0016 Jun 23, 2025
a4e47db
Update main.yml
liyun0016 Jun 23, 2025
ce6af13
update requirements_v2
liyun0016 Jun 23, 2025
f3e34b4
Update main.yml
liyun0016 Jun 23, 2025
19e899a
Update main.yml
liyun0016 Jun 23, 2025
1b014fd
Update main.yml
liyun0016 Jun 23, 2025
83e4c52
move requirments
liyun0016 Jun 23, 2025
10f0599
Add files via upload
liyun0016 Jun 23, 2025
3904e90
Update requirements.txt
liyun0016 Jun 23, 2025
007ed43
Delete starter/requirements.txt
liyun0016 Jun 23, 2025
f04c1dd
Update requirements.txt
liyun0016 Jun 23, 2025
d877e69
Update requirements.txt
liyun0016 Jun 23, 2025
c7ed3c9
update import
liyun0016 Jun 23, 2025
af2fb51
Update main.py
liyun0016 Jun 23, 2025
0b009cb
Create post_inference.py
liyun0016 Jun 23, 2025
cca6c51
Merge branch 'main' of https://github.com/liyun0016/nd0821-c3-starter…
liyun0016 Jun 23, 2025
9b7725f
update import
liyun0016 Jun 23, 2025
6b3bd05
update main.yml
liyun0016 Jun 23, 2025
e5713fb
main update
liyun0016 Jun 23, 2025
d79d005
main.yml update
liyun0016 Jun 23, 2025
ba760ee
update test_main
liyun0016 Jun 23, 2025
5019f27
update
liyun0016 Jun 23, 2025
fc70dc2
update
liyun0016 Jun 23, 2025
2efdeb2
update
liyun0016 Jun 23, 2025
43466fe
reuirements
liyun0016 Jun 23, 2025
0c76308
requirements
liyun0016 Jun 23, 2025
1a4e4e2
update
liyun0016 Jun 23, 2025
4d795c6
Add pyproject.toml to fix Render build error
liyun0016 Jun 23, 2025
008116d
Fix Render build by including setuptools and wheel
liyun0016 Jun 23, 2025
46f1e4d
[build-system]
liyun0016 Jun 23, 2025
6769e95
Add runtime.txt to set Python version
liyun0016 Jun 23, 2025
bcd4eec
Remove pyproject.toml
liyun0016 Jun 23, 2025
f3c0f8d
Add render.yaml for auto deploy
liyun0016 Jun 23, 2025
fa69506
Force correct Python version with render.yaml
liyun0016 Jun 23, 2025
71e3e8a
update
liyun0016 Jun 23, 2025
7a96c95
update
liyun0016 Jun 23, 2025
7a343ac
update
liyun0016 Jun 24, 2025
e030b49
update
liyun0016 Jun 24, 2025
6f861a9
upload live post
liyun0016 Jun 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .dvc/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/config.local
/tmp
/cache
Empty file added .dvc/config
Empty file.
3 changes: 3 additions & 0 deletions .dvcignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Add patterns of files dvc should ignore, which could improve
# the performance. Learn more at
# https://dvc.org/doc/user-guide/dvcignore
42 changes: 42 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
name: Python CI

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
lint-and-test:
runs-on: ubuntu-latest

strategy:
matrix:
python-version: [3.8]

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set PYTHONPATH
run: echo "PYTHONPATH=$GITHUB_WORKSPACE/starter" >> $GITHUB_ENV

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 pytest
pip install -r requirements.txt

- name: Run flake8 (fail if linting fails)
run: |
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
continue-on-error: false

- name: Run pytest (fail if tests fail)
run: PYTHONPATH=$GITHUB_WORKSPACE pytest starter/
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.11.9
5 changes: 5 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"python.analysis.extraPaths": [
"./starter/starter"
]
}
Binary file not shown.
Binary file added continuous_deployment.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added continuous_integration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added live_get.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added live_post.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
27 changes: 27 additions & 0 deletions post_inference.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
import requests

# Replace with your actual deployed URL
url = "https://nd0821-c3-starter-code-w71g.onrender.com/inference"

# Example payload (match your Pydantic model structure and alias fields)
payload = {
"age": 37,
"workclass": "Private",
"fnlgt": 284582,
"education": "Bachelors",
"education-num": 13,
"marital-status": "Never-married",
"occupation": "Exec-managerial",
"relationship": "Not-in-family",
"race": "White",
"sex": "Male",
"capital-gain": 0,
"capital-loss": 0,
"hours-per-week": 40,
"native-country": "United-States"
}

response = requests.post(url, json=payload)

print("Status code:", response.status_code)
print("Response JSON:", response.json())
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"
6 changes: 6 additions & 0 deletions render.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
services:
- type: web
name: starter-service
env: python
buildCommand: pip install -r requirements.txt
startCommand: uvicorn starter.main:app --host=0.0.0.0 --port=10000
14 changes: 14 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Build dependencies first
setuptools>=65.0
wheel>=0.37.0
pip>=22.0

# Your application dependencies
numpy==1.24.3
pandas
scikit-learn==1.3.2
pytest
requests
fastapi==0.63.0
uvicorn
gunicorn
35 changes: 35 additions & 0 deletions slice_output.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
marital-status = Never-married
Precision: 0.8627
Recall: 0.4074
F1 Score: 0.5535

marital-status = Divorced
Precision: 0.7381
Recall: 0.3100
F1 Score: 0.4366

marital-status = Married-civ-spouse
Precision: 0.7033
Recall: 0.6841
F1 Score: 0.6935

marital-status = Widowed
Precision: 1.0000
Recall: 0.2500
F1 Score: 0.4000

marital-status = Separated
Precision: 1.0000
Recall: 0.2143
F1 Score: 0.3529

marital-status = Married-spouse-absent
Precision: 1.0000
Recall: 0.5000
F1 Score: 0.6667

marital-status = Married-AF-spouse
Precision: 1.0000
Recall: 0.0000
F1 Score: 0.0000

Empty file added starter/__init__.py
Empty file.
Binary file added starter/__pycache__/__init__.cpython-38.pyc
Binary file not shown.
Binary file added starter/__pycache__/main.cpython-38.pyc
Binary file not shown.
Binary file added starter/__pycache__/test_main.cpython-38.pyc
Binary file not shown.
Binary file not shown.
1 change: 1 addition & 0 deletions starter/data/.gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@

/census.csv
32,563 changes: 0 additions & 32,563 deletions starter/data/census.csv

This file was deleted.

4 changes: 4 additions & 0 deletions starter/data/census.csv.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: 12c208530a5680c15ae19b34152286dd
size: 3518606
path: census.csv
84 changes: 84 additions & 0 deletions starter/main.py
Original file line number Diff line number Diff line change
@@ -1 +1,85 @@
# Put the code for your API here.
from fastapi import FastAPI
from pydantic import BaseModel, Field
import joblib
import numpy as np
import pandas as pd
from typing import Literal

from .starter.ml.data import process_data
from .starter.ml.model import inference

app = FastAPI()

@app.get("/")
def read_root():
return {"message": "Welcome to the Income Prediction API!"}

# Define Pydantic model for request body
class InferenceInput(BaseModel):
age: int
workclass: str
fnlgt: int
education: str
education_num: int = Field(..., alias="education-num")
marital_status: str = Field(..., alias="marital-status")
occupation: str
relationship: str
race: str
sex: str
capital_gain: int = Field(..., alias="capital-gain")
capital_loss: int = Field(..., alias="capital-loss")
hours_per_week: int = Field(..., alias="hours-per-week")
native_country: str = Field(..., alias="native-country")

class Config:
populate_by_name = True
json_schema_extra = {
"examples": [
{
"age": 37,
"workclass": "Self-emp-not-inc",
"fnlgt": 284582,
"education": "Bachelors",
"education-num": 13,
"marital-status": "Never-married",
"occupation": "Exec-managerial",
"relationship": "Not-in-family",
"race": "White",
"sex": "Male",
"capital-gain": 0,
"capital-loss": 0,
"hours-per-week": 40,
"native-country": "United-States"
}
]
}


# Load model and encoders
model = joblib.load("starter/model/model.pkl")
encoder = joblib.load("starter/model/encoder.pkl")
lb = joblib.load("starter/model/label_binarizer.pkl")

@app.post("/inference")
def predict(input_data: InferenceInput):
print("RECEIVED INPUT:", input_data.dict(by_alias=True))
input_dict = input_data.dict(by_alias=True)
data_df = pd.DataFrame([input_dict])

X, _, _, _ = process_data(
data_df,
categorical_features=[
"workclass", "education", "marital-status", "occupation",
"relationship", "race", "sex", "native-country"
],
label=None,
training=False,
encoder=encoder,
lb=lb
)

pred = inference(model, X)
prediction_label = lb.inverse_transform(pred)[0]

return {"prediction": prediction_label}
Binary file added starter/model/encoder.pkl
Binary file not shown.
Binary file added starter/model/label_binarizer.pkl
Binary file not shown.
Binary file added starter/model/model.pkl
Binary file not shown.
59 changes: 58 additions & 1 deletion starter/model_card_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,72 @@ For additional information see the Model Card paper: https://arxiv.org/pdf/1810.

## Model Details

- **Developer**: Created as part of the ND0821 ML DevOps Engineering Nanodegree program.
- **Model Date**: June 2025
- **Model Version**: v1.0
- **Model Type**: RandomForestClassifier
- **Algorithms/Parameters**:
- Ensemble method using decision trees
- Default hyperparameters except `random_state=42`
- **License**: For educational use


## Intended Use

- **Primary Use Case**: Predicting whether an individual's income exceeds \$50K based on demographic attributes (from census data)
- **Primary Users**: Students, educators, or ML practitioners learning deployment and testing practices
- **Out-of-Scope Use Cases**:
- Real-world income prediction in sensitive applications (e.g., hiring, credit scoring)
- Use in production environments without fairness audits


## Training Data

- **Source**: UCI Adult Census Income dataset
- **Features**: Age, workclass, education, marital-status, race, sex, and others
- **Target**: Income bracket (<=50K or >50K)


## Evaluation Data

- **Split**: 20% holdout from original dataset
- **Preprocessing**:
- One-hot encoding for categorical features
- Label binarization for the target
- Sliced evaluation by the `marital-status` feature


## Metrics
_Please include the metrics used and your model's performance on those metrics._

#### Global Performance:
- **Precision**: 0.7152
- **Recall**: 0.6110
- **F1 Score**: 0.6590

#### Sliced Performance by `marital-status`:

| Marital Status | Precision | Recall | F1 Score |
|--------------------------|-----------|--------|----------|
| Married-civ-spouse | 0.7149 | 0.6475 | 0.6795 |
| Never-married | 0.7692 | 0.4255 | 0.5479 |
| Married-spouse-absent | 1.0000 | 0.1429 | 0.2500 |
| Divorced | 0.6531 | 0.3902 | 0.4885 |
| Separated | 1.0000 | 0.3636 | 0.5333 |
| Widowed | 0.5000 | 0.1429 | 0.2222 |
| Married-AF-spouse | 1.0000 | 0.0000 | 0.0000 |


## Ethical Considerations

- Disparities in F1 scores across demographic slices may reflect model bias.
- The model was not audited for fairness, bias, or societal impacts.
- Use in decision-making without fairness evaluation could lead to discriminatory outcomes.



## Caveats and Recommendations

- Performance varies significantly by subgroup — further fairness testing is advised.
- Consider collecting more balanced training data for underrepresented categories.
- **Do not deploy** this model without bias analysis, stakeholder review, and fairness auditing.

9 changes: 0 additions & 9 deletions starter/requirements.txt

This file was deleted.

8 changes: 0 additions & 8 deletions starter/setup.py

This file was deleted.

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
2 changes: 1 addition & 1 deletion starter/starter/ml/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def process_data(
X_continuous = X.drop(*[categorical_features], axis=1)

if training is True:
encoder = OneHotEncoder(sparse=False, handle_unknown="ignore")
encoder = OneHotEncoder(sparse_output=False, handle_unknown="ignore")
lb = LabelBinarizer()
X_categorical = encoder.fit_transform(X_categorical)
y = lb.fit_transform(y.values).ravel()
Expand Down
Loading