Changes since v0.6.0
- Features:
- Support custom cluster domain in MPI hostfile generation. (#704, #707, #738, @tenzen-y)
- Enable Service
publishNotReadyAddresseswhenrunLauncherAsWorkerto improve DNS discovery for workers. (#703, @tenzen-y) - Expose job controller workqueue rate-limiting configuration via operator flags to improve scalability tuning. (#674, @rotemelad)
- Bug fixes:
- Fix crash in PodGroup when
runLauncherAsWorker=true. (#669, @GonzaloSaez) - Fix missing ReplicaIndexLabel when
runLauncherAsWorker=trueso the launcher pod gets the expected pod index label (helps Kueue/TAS rank discovery). (#690, @GonzaloSaez)
- Fix crash in PodGroup when
- Clean ups:
Acknowledgments
Thank you to all the contributors (in no particular order): @rotemelad @mimowo @terrytangyuan @GonzaloSaez @vikas-saxena02 @tenzen-y