Skip to content

Commit cd4f5fe

Browse files
authored
Merge pull request #57 from codelion/feat-add-multi-label
Add multi label classifier
2 parents 0202d37 + f1baf83 commit cd4f5fe

File tree

6 files changed

+1118
-2
lines changed

6 files changed

+1118
-2
lines changed

README.md

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,37 @@ print(predictions)
136136
# Output: [('positive', 0.85), ('neutral', 0.12), ('negative', 0.03)]
137137
```
138138

139+
### 🏷️ Multi-Label Classification
140+
141+
Classify texts into multiple categories simultaneously with automatic threshold adaptation:
142+
143+
```python
144+
from adaptive_classifier import MultiLabelAdaptiveClassifier
145+
146+
# Initialize multi-label classifier
147+
classifier = MultiLabelAdaptiveClassifier(
148+
"bert-base-uncased",
149+
min_predictions=1, # Ensure at least 1 prediction
150+
max_predictions=5 # Limit to top 5 predictions
151+
)
152+
153+
# Multi-label training data (each text can have multiple labels)
154+
texts = [
155+
"AI researchers study climate change using machine learning",
156+
"Tech startup develops healthcare solutions"
157+
]
158+
labels = [
159+
["technology", "science", "climate", "ai"],
160+
["technology", "business", "healthcare"]
161+
]
162+
163+
classifier.add_examples(texts, labels)
164+
165+
# Make multi-label predictions
166+
predictions = classifier.predict_multilabel("Medical AI breakthrough announced")
167+
# Output: [('healthcare', 0.72), ('technology', 0.68), ('ai', 0.45)]
168+
```
169+
139170
### 💾 Save & Load Models
140171

141172
```python
@@ -188,6 +219,46 @@ more_labels = ["positive"] * 2
188219
classifier.add_examples(more_examples, more_labels)
189220
```
190221

222+
### Multi-Label Classification with Advanced Configuration
223+
224+
```python
225+
from adaptive_classifier import MultiLabelAdaptiveClassifier
226+
227+
# Configure advanced multi-label settings
228+
classifier = MultiLabelAdaptiveClassifier(
229+
"bert-base-uncased",
230+
default_threshold=0.5, # Base threshold for predictions
231+
min_predictions=1, # Minimum labels to return
232+
max_predictions=10 # Maximum labels to return
233+
)
234+
235+
# Training with diverse multi-label examples
236+
texts = [
237+
"Scientists develop AI for medical diagnosis and climate research",
238+
"Tech company launches sustainable energy and healthcare products",
239+
"Olympic athletes use sports science and nutrition technology"
240+
]
241+
labels = [
242+
["science", "ai", "healthcare", "research"],
243+
["technology", "business", "environment", "healthcare"],
244+
["sports", "science", "health", "technology"]
245+
]
246+
247+
classifier.add_examples(texts, labels)
248+
249+
# Advanced prediction options
250+
predictions = classifier.predict_multilabel(
251+
"New research on AI applications in environmental science",
252+
threshold=0.3, # Custom threshold
253+
max_labels=5 # Limit results
254+
)
255+
256+
# Get detailed statistics
257+
stats = classifier.get_label_statistics()
258+
print(f"Adaptive threshold: {stats['adaptive_threshold']}")
259+
print(f"Label-specific thresholds: {stats['label_thresholds']}")
260+
```
261+
191262
### Strategic Classification (Anti-Gaming)
192263

193264
```python
@@ -224,6 +295,69 @@ print(f"Strategic: {strategic_preds}")
224295
print(f"Robust: {robust_preds}")
225296
```
226297

298+
## 🏷️ Multi-Label Classification
299+
300+
The `MultiLabelAdaptiveClassifier` extends adaptive classification to handle scenarios where each text can belong to multiple categories simultaneously. It automatically handles threshold adaptation for scenarios with many labels.
301+
302+
### Key Features
303+
304+
- **🎯 Automatic Threshold Adaptation**: Dynamically adjusts thresholds based on the number of labels to prevent empty predictions
305+
- **📊 Sigmoid Activation**: Uses proper multi-label architecture with BCE loss instead of softmax
306+
- **⚙️ Configurable Limits**: Set minimum and maximum number of predictions per input
307+
- **📈 Label-Specific Thresholds**: Automatically adjusts thresholds based on label frequency
308+
- **🔄 Incremental Learning**: Add new labels and examples without retraining from scratch
309+
310+
### Usage
311+
312+
```python
313+
from adaptive_classifier import MultiLabelAdaptiveClassifier
314+
315+
# Initialize with configuration
316+
classifier = MultiLabelAdaptiveClassifier(
317+
"distilbert/distilbert-base-cased",
318+
default_threshold=0.5,
319+
min_predictions=1,
320+
max_predictions=5
321+
)
322+
323+
# Multi-label training data
324+
texts = [
325+
"Breaking: Scientists discover AI can help predict climate change patterns",
326+
"Tech giant announces breakthrough in quantum computing for healthcare",
327+
"Olympic committee adopts new sports technology for athlete performance"
328+
]
329+
330+
labels = [
331+
["science", "technology", "climate", "news"],
332+
["technology", "healthcare", "quantum", "business"],
333+
["sports", "technology", "performance", "news"]
334+
]
335+
336+
# Train the classifier
337+
classifier.add_examples(texts, labels)
338+
339+
# Make predictions
340+
predictions = classifier.predict_multilabel(
341+
"Revolutionary medical AI system launched by tech startup"
342+
)
343+
344+
# Results: [('technology', 0.85), ('healthcare', 0.72), ('business', 0.45)]
345+
```
346+
347+
### Adaptive Thresholds
348+
349+
The classifier automatically adjusts prediction thresholds based on the number of labels:
350+
351+
| Number of Labels | Threshold | Benefit |
352+
|-----------------|-----------|---------|
353+
| 2-4 labels | 0.5 (default) | Standard precision |
354+
| 5-9 labels | 0.4 (20% lower) | Balanced recall |
355+
| 10-19 labels | 0.3 (40% lower) | Better coverage |
356+
| 20-29 labels | 0.2 (60% lower) | Prevents empty results |
357+
| 30+ labels | 0.1 (80% lower) | Ensures predictions |
358+
359+
This solves the common "No labels met the threshold criteria" issue when dealing with many-label scenarios.
360+
227361
---
228362

229363
## 🏢 Enterprise Use Cases

0 commit comments

Comments
 (0)