Fix feature alignment when applying XGBoost/HistGradientBoosting weights without specifying ss_main_score #167

Copilot · 2025-10-21T01:36:54Z

Problem

When training an XGBoost or HistGradientBoosting model with a specific --ss_main_score parameter and then applying the trained weights without specifying the same parameter, features become misaligned, causing incorrect scoring results.

Example of the issue:

# Step 1: Train with specific main score
pyprophet score --in data.osw --level=ms1ms2 --classifier=XGBoost --ss_main_score=var_dotprod_score

The model trains successfully with var_dotprod_score as the main score, showing correct feature importances:

# Step 2: Apply weights WITHOUT specifying the main score
pyprophet score --in data.osw --level=ms1ms2 --classifier=XGBoost --apply_weights=weights.bin

This applies weights to incorrect features because --ss_main_score defaults to auto, potentially selecting a different main score and changing the feature order:

The root cause is that during training, features are prepared based on the specified ss_main_score, but when applying weights, if this parameter is not specified, it defaults to auto, which may select a different main score. This changes the feature order, causing the model to apply weights to the wrong features.

Solution

This PR stores metadata (ss_main_score, classifier, level) alongside the trained model and automatically restores the correct ss_main_score when applying weights.

Implementation

1. Enhanced Model Serialization (pyprophet/io/_base.py, pyprophet/io/scoring/osw.py)

Models are now saved with metadata:

model_data = {
    "model": weights,
    "ss_main_score": self.config.runner.ss_main_score,
    "classifier": self.classifier,
    "level": self.level,
}

2. Automatic Metadata Restoration (pyprophet/scoring/runner.py)

When loading weights:

Detects new format (dict with metadata) vs old format (model only)
If --ss_main_score=auto (default), automatically uses the stored value
Validates level and classifier compatibility
Updates the ss_use_dynamic_main_score flag for correct semi-supervised learning behavior
Provides informative logging about what's happening

3. Backward Compatibility

Old weight files (without metadata) are automatically detected and still work with an appropriate warning:

[WARNING] Loading weights from old format file (no metadata).
[WARNING] Feature alignment cannot be automatically verified.
[WARNING] Make sure to specify the same --ss_main_score as used during training.

Usage

After this fix, applying weights no longer requires manually specifying --ss_main_score:

# Train with specific main score
pyprophet score --in data.osw --classifier=XGBoost --ss_main_score=var_dotprod_score

# Apply weights - ss_main_score automatically restored
pyprophet score --in data.osw --classifier=XGBoost --apply_weights=weights.bin
# Output: Using stored ss_main_score='var_dotprod_score' from weights file
# Result: Correct feature alignment and accurate scoring ✓

Benefits

User-friendly: No need to remember which ss_main_score was used during training
Error prevention: Automatic validation prevents feature misalignment
Clear feedback: Informative logging shows what's being applied
Backward compatible: Old weight files continue to work with warnings
No breaking changes: Existing workflows work as before

Testing

Comprehensive testing demonstrates:

Metadata is correctly saved and loaded
Automatic restoration works as expected
Old format files remain compatible
Feature alignment is maintained across training and application phases

This section details on the original issue you should resolve

<issue_title>--apply_weights requires --ss_main_score to be specified as in the original command</issue_title>
<issue_description>I get unexpected results when using --apply_weights with XGBoost if a --ss_main_score was set manually.

E.g.

pyprophet score --in in.oswpq --level=ms1ms2 --classifier=XGBoost --ss_main_score=var_dotprod_score

Trains the model as expected

However if this command is used
pyprophet score --in in.oswpq --level=ms1ms2 --classifier=XGBoost --apply_weights=weights.bin

The weights are applied to the incorrect scores
(Same weights on different columns)
To get the expected results the --ss_main_score has to be specified as in the original run
pyprophet score --in in.oswpq --level=ms1ms2 --classifier=XGBoost --ss_main_score=var_dotprod_score --apply_weights=weights.bin

It would be nice if this was detected automatically somehow</issue_description>

Comments on the Issue (you are @copilot in this section)
@singjc I think keeping track of the feature names, something similar to the draft PR #117, would probably be able to make this automatic. Need to check if the issue is the same with the other models, and file types.

Fixes #151

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: singjc <[email protected]>

Initial plan

b72b341

Copilot AI assigned Copilot and singjc Oct 21, 2025

Copilot started work on behalf of singjc October 21, 2025 01:37 View session

Copilot AI and others added 2 commits October 21, 2025 01:44

Store ss_main_score metadata with XGBoost/HistGradientBoosting models

d5e3637

Co-authored-by: singjc <[email protected]>

Update ss_use_dynamic_main_score flag when restoring ss_main_score

6268549

Co-authored-by: singjc <[email protected]>

Copilot AI changed the title ~~[WIP] Fix apply_weights to require ss_main_score input~~ Fix feature alignment when applying XGBoost/HistGradientBoosting weights without specifying ss_main_score Oct 21, 2025

Copilot AI requested a review from singjc October 21, 2025 01:55

Copilot finished work on behalf of singjc October 21, 2025 01:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix feature alignment when applying XGBoost/HistGradientBoosting weights without specifying ss_main_score #167

Fix feature alignment when applying XGBoost/HistGradientBoosting weights without specifying ss_main_score #167

Uh oh!

Copilot AI commented Oct 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Fix feature alignment when applying XGBoost/HistGradientBoosting weights without specifying ss_main_score #167

Are you sure you want to change the base?

Fix feature alignment when applying XGBoost/HistGradientBoosting weights without specifying ss_main_score #167

Uh oh!

Conversation

Copilot AI commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Implementation

Usage

Benefits

Testing

Related

Comments on the Issue (you are @copilot in this section)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Oct 21, 2025 •

edited

Loading