Skip to content

Conversation

awccopp
Copy link

@awccopp awccopp commented Dec 17, 2024

Purpose

The purpose of this PR is to parallelize the SNSTOP user callback function.
See #417 for commit history.
A --timeout option has been added to the testflo arguments to avoid tests to hang.

Expected time until merged

1 month

Type of change

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (non-backwards-compatible fix or feature)
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no API changes)
  • Documentation update
  • Maintenance update
  • Other (please describe)

Testing

Tests were added to ensure other processors are calling the snstop function.

Checklist

  • I have run flake8 and black to make sure the Python code adheres to PEP-8 and is consistently formatted
  • I have formatted the Fortran code with fprettify or C/C++ code with clang-format as applicable
  • I have run unit and regression tests which pass locally with my changes
  • I have added new tests that prove my fix is effective or that my feature works
  • I have added necessary documentation

* parallel snstop

* formatting fixes

* added comments

* added test - does it make sense?

* fixed MPI check

* actually fixing MPI check

* iSort fix

* maybe this time the test will be skipped?

* maybe like this?

* what about this

* cleanup

* add timeout option to testflo

* rerun tests

* updated test with send/receive

---------

Co-authored-by: Marco Mangano <[email protected]>
@awccopp awccopp requested a review from a team as a code owner December 17, 2024 20:03
@awccopp awccopp requested review from ArshSaja and lamkina December 17, 2024 20:03
Copy link

codecov bot commented Dec 17, 2024

Codecov Report

❌ Patch coverage is 73.91304% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.41%. Comparing base (b687be3) to head (a8132d8).

Files with missing lines Patch % Lines
pyoptsparse/pyOpt_optimizer.py 0.00% 5 Missing ⚠️
pyoptsparse/pySNOPT/pySNOPT.py 94.44% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #420      +/-   ##
==========================================
+ Coverage   86.22%   86.41%   +0.19%     
==========================================
  Files          24       24              
  Lines        3418     3438      +20     
==========================================
+ Hits         2947     2971      +24     
+ Misses        471      467       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@marcomangano marcomangano changed the title parallel snstop (#417) parallel snstop Dec 17, 2024
marcomangano
marcomangano previously approved these changes Feb 5, 2025
@marcomangano
Copy link
Collaborator

@ewu63 you talked with @awccopp about this PR, but might still want to take a look?
@eirikurj does this look good to you?

Copy link

@lamkina lamkina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I would still wait for @ewu63 or @eirikurj to have a look before merging if possible.

@marcomangano
Copy link
Collaborator

@ewu63 bumping this up

Copy link
Collaborator

@marcomangano marcomangano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it is finally working. Is 120s a reasonable timeout timeout for tests?

@ewu63
Copy link
Collaborator

ewu63 commented Aug 11, 2025

My main concern with this PR is that I think the behaviour of whether to call snSTOP on only the root proc or all procs should be a user configurable option, and I would prefer the default being the prior behaviour (i.e. only on the root proc).

@marcomangano
Copy link
Collaborator

I see your point @ewu63 . Would that be as simple as adding an additional bool option (say parallelSnStop) and then only run these lines if 'parallelSnStop is True'?

The rest of the edits in the _waitLoop() function are just to make the mode variable more explicit (before it was just checking for the -1 flag to break the wait), and add the parallel option just within pySNOPT.

@ewu63
Copy link
Collaborator

ewu63 commented Aug 12, 2025

Yeah, something like that. The rest of the changes look OK.

@ewu63
Copy link
Collaborator

ewu63 commented Aug 17, 2025

I thought about this PR some more and I have some more reservations

  • we currently don't have any MPI-based tests and those probably need to be added first to make sure we are not causing any regressions. Add more test coverage #256 lists some of the tests I had in mind
  • the MPI imports need to be adjusted. There is an implicit contract with OpenMDAO and the wider community that mpi4py is not imported unless instructed -- this is why we have the special pyOpt_MPI module. So the internal code needs to be changed here.

@marcomangano
Copy link
Collaborator

I see. About the second point though, that would only involve the testing script. The rest of the changes are just ineffective if the code is not run in parallel.
We should probably add some kind of flag though, if someone toggles the proposed parallel snStop flag without having PYOPTSPARSE_REQUIRE_MPI the code will just import the mock class

@ewu63
Copy link
Collaborator

ewu63 commented Aug 20, 2025

Yes, I misread that part of the code I think that's all OK. I can try to address some of this soon but I would prefer holding off on this PR for a bit longer. Trying to be diligent to not break people's stuff downstream.

@marcomangano
Copy link
Collaborator

I can try to address some of this soon but I would prefer holding off on this PR for a bit longer. Trying to be diligent to not break people's stuff downstream.

Agreed, there is no rush. I can add that flag we mentioned soon, so that the default behavior is unchanged. Down to brainstorm additional tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants