Fix feature alignment when applying XGBoost/HistGradientBoosting weights without specifying ss_main_score #167
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
When training an XGBoost or HistGradientBoosting model with a specific
--ss_main_scoreparameter and then applying the trained weights without specifying the same parameter, features become misaligned, causing incorrect scoring results.Example of the issue:
# Step 1: Train with specific main score pyprophet score --in data.osw --level=ms1ms2 --classifier=XGBoost --ss_main_score=var_dotprod_scoreThe model trains successfully with
var_dotprod_scoreas the main score, showing correct feature importances:# Step 2: Apply weights WITHOUT specifying the main score pyprophet score --in data.osw --level=ms1ms2 --classifier=XGBoost --apply_weights=weights.binThis applies weights to incorrect features because
--ss_main_scoredefaults toauto, potentially selecting a different main score and changing the feature order:The root cause is that during training, features are prepared based on the specified
ss_main_score, but when applying weights, if this parameter is not specified, it defaults toauto, which may select a different main score. This changes the feature order, causing the model to apply weights to the wrong features.Solution
This PR stores metadata (ss_main_score, classifier, level) alongside the trained model and automatically restores the correct
ss_main_scorewhen applying weights.Implementation
1. Enhanced Model Serialization (
pyprophet/io/_base.py,pyprophet/io/scoring/osw.py)Models are now saved with metadata:
2. Automatic Metadata Restoration (
pyprophet/scoring/runner.py)When loading weights:
--ss_main_score=auto(default), automatically uses the stored valuess_use_dynamic_main_scoreflag for correct semi-supervised learning behavior3. Backward Compatibility
Old weight files (without metadata) are automatically detected and still work with an appropriate warning:
Usage
After this fix, applying weights no longer requires manually specifying
--ss_main_score:Benefits
ss_main_scorewas used during trainingTesting
Comprehensive testing demonstrates:
Related
Fixes issue: "
--apply_weightsrequires--ss_main_scoreto be specified as in the original command"Related to draft PR #117 which explored feature name tracking approaches.
Original prompt
Fixes #151
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.