to run the project, you either can use the provided nix-shell to have a environment for jupter notebooks and testing. also, you can care a custom Python env with the provided requirement file in the test path.
python -m pytest tests/test_dataset_quality.pyto make sure the flow of data is consistent and data passes the minimum required quality, I have decided to have following test cases:
| Test Case | Description | Range |
|---|---|---|
test_load_csv |
Ensures the dataset is successfully loaded and not empty. | N/A (checks file presence and data existence). |
test_validate_no_missing_values_in_title |
Ensures every job posting has a title. | N/A (ensures completeness). |
test_validate_no_missing_values_in_description |
Ensures every job posting has a description. | N/A (ensures completeness). |
test_validate_column_in_range (max_salary) |
Ensures salaries are within a reasonable range. | 0.0 (no negative salaries) to 1,000,000.0. |
test_validate_column_values_in_list (pay_period) |
Ensures pay_period contains only valid values. |
Must be one of: ["BIWEEKLY", "HOURLY", "MONTHLY", "WEEKLY", "YEARLY"]. |