Skip to content

Commit 0202d37

Browse files
authored
Merge pull request #55 from codelion/feat-more-tests
add tests
2 parents aceb811 + 27ab3de commit 0202d37

File tree

4 files changed

+647
-9
lines changed

4 files changed

+647
-9
lines changed

.github/workflows/test.yml

Lines changed: 37 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,31 +7,59 @@ on:
77
branches: [ main ]
88

99
jobs:
10-
test:
10+
unit-tests:
1111
runs-on: ubuntu-latest
12-
12+
1313
steps:
1414
- uses: actions/checkout@v3
15-
15+
1616
- name: Set up Python 3.12
1717
uses: actions/setup-python@v4
1818
with:
1919
python-version: '3.12'
20-
20+
2121
- name: Install dependencies
2222
run: |
2323
python -m pip install --upgrade pip
2424
pip install -e .
2525
pip install pytest pytest-cov psutil
26-
27-
- name: Run tests
26+
27+
- name: Run unit tests
2828
run: |
29-
pytest tests/ -v --cov=adaptive_classifier --cov-report=xml --cov-report=term
30-
29+
pytest tests/ -v --cov=adaptive_classifier --cov-report=xml --cov-report=term -m "not integration"
30+
3131
- name: Upload coverage to Codecov
3232
uses: codecov/codecov-action@v3
3333
with:
3434
file: ./coverage.xml
3535
flags: unittests
3636
name: codecov-umbrella
37-
fail_ci_if_error: false
37+
fail_ci_if_error: false
38+
39+
integration-tests:
40+
runs-on: ubuntu-latest
41+
needs: unit-tests # Only run if unit tests pass
42+
timeout-minutes: 30 # Generous timeout for model downloads
43+
44+
steps:
45+
- uses: actions/checkout@v3
46+
47+
- name: Set up Python 3.12
48+
uses: actions/setup-python@v4
49+
with:
50+
python-version: '3.12'
51+
52+
- name: Install dependencies
53+
run: |
54+
python -m pip install --upgrade pip
55+
pip install -e .
56+
pip install pytest psutil
57+
58+
- name: Run integration tests
59+
run: |
60+
pytest tests/test_enterprise_classifiers_integration.py -v -m "integration" --tb=short
61+
62+
- name: Integration test summary
63+
if: always()
64+
run: |
65+
echo "Integration tests completed. Check logs above for detailed results."

docs/integration_tests.md

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# Enterprise Classifier Integration Tests
2+
3+
This document describes the integration test suite for enterprise classifiers hosted on Hugging Face Hub.
4+
5+
## Overview
6+
7+
The integration tests (`tests/test_enterprise_classifiers_integration.py`) verify that all 17 enterprise classifiers maintain their expected performance and behavior. These tests serve as regression tests to ensure code changes don't break the published models.
8+
9+
## Test Coverage
10+
11+
The integration test suite covers:
12+
13+
- **Model Loading**: Can each model be loaded from HuggingFace Hub?
14+
- **Prediction Functionality**: Do models make valid predictions?
15+
- **k-Parameter Consistency**: Do k=1 and k=2 produce consistent results? (regression test for the k parameter bug)
16+
- **Prediction Stability**: Are repeated predictions consistent?
17+
- **Performance**: Does inference complete within reasonable time?
18+
- **Class Coverage**: Do models know about all expected classes?
19+
- **Health Check**: Overall ecosystem health assessment
20+
21+
## Running Integration Tests
22+
23+
### Run All Integration Tests
24+
```bash
25+
pytest tests/test_enterprise_classifiers_integration.py -v
26+
```
27+
28+
### Run Only Unit Tests (Skip Integration)
29+
```bash
30+
pytest tests/ -m "not integration" -v
31+
```
32+
33+
### Run Specific Test for One Classifier
34+
```bash
35+
pytest tests/test_enterprise_classifiers_integration.py -k "fraud-detection" -v
36+
```
37+
38+
### Run Specific Test Type
39+
```bash
40+
# Test k-parameter consistency for all classifiers
41+
pytest tests/test_enterprise_classifiers_integration.py::TestEnterpriseClassifiers::test_k_parameter_consistency -v
42+
43+
# Test model loading for all classifiers
44+
pytest tests/test_enterprise_classifiers_integration.py::TestEnterpriseClassifiers::test_model_loading -v
45+
```
46+
47+
## CI/CD Integration
48+
49+
The CI/CD pipeline runs integration tests automatically:
50+
51+
1. **Unit Tests Job**: Runs all unit tests first
52+
2. **Integration Tests Job**: Runs only if unit tests pass
53+
- 30-minute timeout for model downloads
54+
- Tests all 17 enterprise classifiers
55+
- Reports detailed results
56+
57+
## Tested Classifiers
58+
59+
The following 17 enterprise classifiers are tested:
60+
61+
| Classifier | Expected Accuracy | Classes | Use Case |
62+
|------------|------------------|---------|----------|
63+
| business-sentiment | 98.8% | 4 | Business text sentiment analysis |
64+
| compliance-classification | 65.3% | 5 | Regulatory compliance categorization |
65+
| content-moderation | 100.0% | 3 | Content filtering and moderation |
66+
| customer-intent | 85.2% | 4 | Customer service intent detection |
67+
| document-quality | 100.0% | 2 | Document quality assessment |
68+
| document-type | 98.0% | 5 | Document type classification |
69+
| email-priority | 83.9% | 3 | Email priority triage |
70+
| email-security | 93.8% | 4 | Email security threat detection |
71+
| escalation-detection | 97.6% | 2 | Support ticket escalation detection |
72+
| expense-category | 84.2% | 5 | Business expense categorization |
73+
| fraud-detection | 92.7% | 2 | Financial fraud detection |
74+
| language-detection | 100.0% | 4 | Text language identification |
75+
| pii-detection | 100.0% | 2 | Personal information detection |
76+
| product-category | 85.2% | 4 | E-commerce product categorization |
77+
| risk-assessment | 75.6% | 2 | Security risk assessment |
78+
| support-ticket | 82.9% | 4 | Support ticket categorization |
79+
| vendor-classification | 92.7% | 2 | Vendor relationship classification |
80+
81+
## Test Assertions
82+
83+
Each classifier is tested against:
84+
85+
- **Minimum Accuracy Thresholds**: Must meet or exceed defined minimums
86+
- **k-Parameter Consistency**: k=1 and k=2 must produce identical top predictions
87+
- **Response Time**: Inference must complete within 2 seconds
88+
- **Class Completeness**: Must know about all expected classes
89+
- **Prediction Validity**: All predictions must be properly formatted
90+
91+
## Failure Modes
92+
93+
Tests will fail if:
94+
95+
- Model cannot be loaded from HuggingFace Hub
96+
- Accuracy drops below minimum threshold
97+
- k=1 and k=2 produce different top predictions (regression)
98+
- Inference takes longer than 2 seconds
99+
- Predicted classes don't match expected class set
100+
- Prediction format is invalid
101+
102+
## Adding New Enterprise Classifiers
103+
104+
To add a new enterprise classifier to the test suite:
105+
106+
1. Update `CLASSIFIER_METRICS` dictionary with expected metrics
107+
2. Add domain-specific test sentences to `TEST_SENTENCES`
108+
3. Ensure the model is available on HuggingFace Hub as `adaptive-classifier/{name}`
109+
110+
## Debugging Test Failures
111+
112+
### Model Loading Failures
113+
- Check that the model exists on HuggingFace Hub
114+
- Verify network connectivity
115+
- Check for authentication issues (if model is private)
116+
117+
### Accuracy Failures
118+
- Compare actual vs expected accuracy in test output
119+
- Check if model was retrained recently
120+
- Verify test sentences are appropriate for the domain
121+
122+
### k-Parameter Inconsistencies
123+
- This indicates a regression in the k parameter fix
124+
- Check prediction logic in `_predict_regular` method
125+
- Verify prototype and neural head prediction combination
126+
127+
### Performance Issues
128+
- Check system resources during testing
129+
- Consider network latency for model downloads
130+
- Verify model size hasn't increased significantly
131+
132+
## Local Development
133+
134+
For faster local testing, you can:
135+
136+
```bash
137+
# Skip integration tests during development
138+
pytest tests/ -m "not integration"
139+
140+
# Test specific classifier during debugging
141+
pytest tests/test_enterprise_classifiers_integration.py -k "fraud-detection" -v -s
142+
```
143+
144+
## Maintenance
145+
146+
The integration test suite should be updated when:
147+
148+
- New enterprise classifiers are published
149+
- Expected accuracy thresholds change
150+
- New test dimensions are needed
151+
- Test sentences need updating for better coverage
152+
153+
This ensures the test suite remains comprehensive and valuable for regression detection.

pyproject.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@ minversion = "7.0"
1212
addopts = "-ra -q"
1313
testpaths = ["tests"]
1414
python_files = ["test_*.py"]
15+
markers = [
16+
"integration: marks tests as integration tests (deselect with '-m \"not integration\"')",
17+
"slow: marks tests as slow running",
18+
]
1519
filterwarnings = [
1620
"ignore::DeprecationWarning",
1721
"ignore::UserWarning",

0 commit comments

Comments
 (0)