Update dependency pytorch-lightning to v1.6.0 [SECURITY] #340

renovate · 2024-08-18T17:52:41Z

This PR contains the following updates:

Package	Change	Age	Adoption	Passing	Confidence
pytorch-lightning	`==1.4.9` -> `==1.6.0`

Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.

GitHub Vulnerability Alerts

CVE-2021-4118

pytorch-lightning is vulnerable to Deserialization of Untrusted Data.

CVE-2022-0845

PyTorch Lightning version 1.5.10 and prior is vulnerable to code injection. An attacker could execute commands on the target OS running the operating system by setting the PL_TRAINER_GPUS when using the Trainer module. A patch is included in the 1.6.0 release.

Release Notes

Lightning-AI/lightning (pytorch-lightning)

`v1.6.0`: PyTorch Lightning 1.6: Support Intel's Habana Accelerator, New efficient DDP strategy (Bagua), Manual Fault-tolerance, Stability and Reliability.

Compare Source

The core team is excited to announce the PyTorch Lightning 1.6 release ⚡

Highlights
Backward Incompatible Changes
Full Changelog
Contributors

Highlights

PyTorch Lightning 1.6 is the work of 99 contributors who have worked on features, bug-fixes, and documentation for a total of over 750 commits since 1.5. This is our most active release yet. Here are some highlights:

Introducing Intel's Habana Accelerator

Lightning 1.6 now supports the Habana® framework, which includes Gaudi® AI training processors. Their heterogeneous architecture includes a cluster of fully programmable Tensor Processing Cores (TPC) along with its associated development tools and libraries and a configurable Matrix Math engine.

You can leverage the Habana hardware to accelerate your Deep Learning training workloads simply by passing:

trainer = pl.Trainer(accelerator="hpu")

### single Gaudi training
trainer = pl.Trainer(accelerator="hpu", devices=1)

### distributed training with 8 Gaudi
trainer = pl.Trainer(accelerator="hpu", devices=8)

The Bagua Strategy

The Bagua Strategy is a deep learning acceleration framework that supports multiple, advanced distributed training algorithms with state-of-the-art system relaxation techniques. Enabling Bagua, which can be considerably faster than vanilla PyTorch DDP, is as simple as:

trainer = pl.Trainer(strategy="bagua")

### or to choose a custom algorithm
trainer = pl.Trainer(strategy=BaguaStrategy(algorithm="gradient_allreduce")  # default

Towards stable Accelerator, Strategy, and Plugin APIs

The Accelerator, Strategy, and Plugin APIs are a core part of PyTorch Lightning. They're where all the distributed boilerplate lives, and we're constantly working to improve both them and the overall PyTorch Lightning platform experience.

In this release, we've made some large changes to achieve that goal. Not to worry, though! The only users affected by these changes are those who use custom implementations of Accelerator and Strategy (TrainingTypePlugin) as well as certain Plugins. In particular, we want to highlight the following changes:

All TrainingTypePlugins have been renamed to Strategy (#11120). Strategy is a more appropriate name because it encompasses more than simply training communcation. This change is now aligned with the changes we implemented in 1.5, which introduced the new strategy and devices flags to the Trainer.

Before

from pytorch_lightning.plugins import DDPPlugin

New

from pytorch_lightning.strategies import DDPStrategy
```

The Accelerator and PrecisionPlugin have moved into Strategy. All strategies now take an optional parameter accelerator and precision_plugin (#11022, #10570).
Custom Accelerator implementations must now implement two new abstract methods: is_available() (#11797) and auto_device_count() (#10222). The latter determines how many devices get used by default when specifying Trainer(accelerator=..., devices="auto").
We redesigned the process creation for spawn-based strategies such as DDPSpawnStrategy and TPUSpawnStrategy (#10896). All spawn-based strategies now spawn processes immediately upon calling Trainer.{fit,validate,test,predict}, which means the hooks/callbacks prepare_data, setup, configure_sharded_model and teardown all run under an initialized process group. These changes align the spawn-based strategies with their non-spawn counterparts (such as DDPStrategy).

We've also exposed the process group backend for use. For example, you can now easily enable fairring like this:

### Explicitly specify the process group backend if you choose to
ddp = pl.strategies.DDPStrategy(process_group_backend="fairring")
trainer = Trainer(strategy=ddp, accelerator="gpu", devices=8)

In a similar fashion, if installing torch>=1.11, you can enable DDP static graph to apply special runtime optimizations:

trainer = Trainer(devices=4, strategy=DDPStrategy(static_graph=True))

`LightningCLI` improvements

In the previous release, we added shorthand notation support for registered components. In this release, we added a flag to automatically register all available components:

from pytorch_lightning.utilities.cli import LightningCLI

LightningCLI(auto_registry=True)

We have also added support for the ReduceLROnPlateau scheduler with shorthand notation:

$ python script.py fit --optimizer=Adam --lr_scheduler=ReduceLROnPlateau --lr_scheduler.monitor=metric_to_track

If you need to customize the learning rate scheduler configuration, you can do so by overriding:

class MyLightningCLI(LightningCLI):
    @&#8203;staticmethod
    def configure_optimizers(lightning_module, optimizer, lr_scheduler=None):
        return {"optimizer": optimizer, "lr_scheduler": {"scheduler": lr_scheduler, ...}}

Finally, loggers are also now configurable with shorthand:

$ python script.py fit --trainer.logger=WandbLogger --trainer.logger.name="my_lightning_run"

Control SLURM's re-queueing

We've added the ability to turn the automatic resubmission on or off when a job gets interrupted by the SLURM controller (via signal handling). Users who prefer to let their code handle the resubmission (for example, when submitit is used) can now pass:

from pytorch_lightning.plugins.environments import SLURMEnvironment

trainer = pl.Trainer(plugins=SLURMEnvironment(auto_requeue=False))

Fault-tolerance improvements

The Fault-tolerance training under manual optimization now tracks optimization progress. We also changed the graceful exit signal from SIGUSR1 to SIGTERM for better support inside cloud instances.
An additional feature we're excited to announce is support for consecutive trainer.fit() calls.

trainer = pl.Trainer(max_epochs=2)
trainer.fit(model)

### now, run 2 more epochs
trainer.fit_loop.max_epochs = 4
trainer.fit(model)

Loop customization improvements

The Loop's state is now included as part of the checkpoints saved by the library. This enables finer restoration of custom loops.

We've also made it easier to replace Lightning's loops with your own. For example:

class MyCustomLoop(pl.loops.TrainingEpochLoop):
    ...

trainer = pl.Trainer(...)
trainer.fit_loop.replace(epoch_loop=MyCustomLoop)

### Trainer runs the fit loop with your new epoch loop!
trainer.fit(model)

Data-Loading improvements

In previous versions, Lightning required that the DataLoader instance set its input arguments as instance attributes. This meant that custom DataLoaders also had this hidden requirement. In this release, we do this automatically for the user, easing the passing of custom loaders:

class MyDataLoader(torch.utils.data.DataLoader):
    def __init__(self, a=123, *args, **kwargs):
-       # this was required before
-       self.a = a
        super().__init__(*args, **kwargs)

trainer.fit(model, train_dataloader=MyDataLoader())

As of this release, Lightning no longer pre-fetches 1 extra batch if it doesn't need to. Previously, doing so would conflict with the internal pre-fetching done by optimized data loaders such as FFCV's. You can now define your own pre-fetching value like this:

class MyCustomLoop(pl.loops.FitLoop):
    @&#8203;property
    def prefetch_batches(self):
        return 7  # lucky number 7

trainer = pl.Trainer(...)
trainer.fit_loop = MyCustomLoop(min_epochs=trainer.min_epochs, max_epochs=trainer.max_epochs)

New Hooks

`LightningModule.lr_scheduler_step`

Lightning now allows the use of custom learning rate schedulers that aren't natively available in PyTorch. A great example of this is Timm Schedulers.

When using custom learning rate schedulers relying on an API other than PyTorch's, you can now define the LightningModule.lr_scheduler_step with your desired logic.

from timm.scheduler import TanhLRScheduler

class MyLightningModule(pl.LightningModule):
    def configure_optimizers(self):
        optimizer = ...
        scheduler = TanhLRScheduler(optimizer, ...)
        return {"optimizer": optimizer, "lr_scheduler": {"scheduler": scheduler, "interval": "epoch"}}

    def lr_scheduler_step(self, scheduler, optimizer_idx, metric):
        scheduler.step(epoch=self.current_epoch)  # timm's scheduler need the epoch value

A new stateful API

This release introduces new hooks to standardize all stateful components to use state_dict and load_state_dict, mimicking the PyTorch API. The new hooks receive their own component's state and replace most usages of the previous on_save_checkpoint and on_load_checkpoint hooks.

def MyCallback(pl.Callback):
-   def on_save_checkpoint(self, trainer, pl_module, checkpoint):
-       return {'x': self.x}
    
-   def on_load_checkpoint(self, trainer, pl_module, checkpoint):
-       self.x = x

+   def state_dict(self):
+       return {'x': self.x}
    
+   def load_state_dict(self, checkpoint):
+       self.x = x

New properties

`Trainer.estimated_stepping_batches`

You can use built-in Trainer.estimated_stepping_batches to compute the total number of stepping batches needed for the complete training.

The property takes gradient accumulation factor and distributed setting into consideration when performing this computation so that you don't have to derive it manually:

class MyLightningModule(pl.LightningModule):
    def configure_optimizers(self):
        optimizer = ...
        scheduler = torch.optim.lr_scheduler.OneCycleLR(
            optimizer, max_lr=1e-3, total_steps=self.trainer.estimated_stepping_batches
        )
        return {"optimizer": optimizer, "lr_scheduler": scheduler}

`Trainer.num_devices` and `Trainer.device_ids`

In the past, retrieving the number of devices used, or their IDs, posed a considerable challenge. Additionally, doing so required knowing which property to access based on the current Trainer configuration.

To simplify this process, we've deprecated the per-accelerator properties to have accelerator agnostic properties. For example:

- num_devices = max(1, trainer.num_gpus, trainer.num_processes)
- if trainer.tpu_cores:
-    num_devices = max(num_devices, trainer.tpu_cores)
+ num_devices = trainer.num_devices

Experimental Features

Manual Fault-tolerance

Fault Tolerance has limitations that require specific information about your data-loading structure.

It is now possible to resolve those limitations by enabling manual fault tolerance where you can write your own logic and specify how exactly to checkpoint your own datasets and samplers. You can do so using this environment flag:

$ PL_FAULT_TOLERANT_TRAINING=MANUAL python script.py

Check out this video for a dive into the internals of this flag.

Customizing the layer synchronization

We introduced a new plugin class for wrapping layers of a model with synchronization logic for multiprocessing.

class MyLayerSync(pl.plugins.LayerSync):
    ...

layer_sync = MyLayerSync(...)
trainer = Trainer(sync_batchnorm=True, plugins=layer_sync, strategy="ddp")

Registering Custom Accelerators

There has been much progress in the field of ML Accelerators, and the list of accelerators is constantly expanding.

We've made it easier for users to try out new accelerators by enabling support for registering custom Accelerator classes in Lightning.

from pytorch_lightning.accelerators import Accelerator, AcceleratorRegistry

class SOTAAccelerator(Accelerator):
    def __init__(self, x):
        ...

AcceleratorRegistry.register("sota_accelerator", SOTAAccelerator, x=123)

### the following works now:
trainer = Trainer(accelerator="sota_accelerator")

Backward Incompatible Changes

Here is a selection of notable changes that are not backward compatible with previous versions. The full list of changes and removals can be found in the CHANGELOG below.

Drop PyTorch 1.7 support

Following our 4 PyTorch release window, this release supports PyTorch 1.8 to 1.11. Support for PyTorch 1.7 has been removed.

Drop Python 3.6 support

Following Python's end-of-life, support for Python 3.6 has been removed.

`AcceleratorConnector` rewrite

To support new accelerator and stategy features, we completely rewrote our internal AcceleratorConncetor class. No backwards compatibility was maintained so it is likely to have broken your code if it was using this class.

Re-define the `current_epoch` boundary

To resolve fault-tolerance issues, we changed where the current epoch value gets increased.

trainer.current_epoch is now increased by 1 on_train_end. This means that if a model is run for 3 epochs (0, 1, 2), trainer.current_epoch will now return 3 instead of 2 after trainer.fit(). This can also impact custom callbacks that acess this property inside this hook.

This also impacts checkpoints saved during an epoch (e.g. on_train_epoch_end). For example, a Trainer(max_epochs=1, limit_train_batches=1) instance that saves a checkpoint will have the current_epoch=0 value saved instead of current_epoch=1.

Re-define the `global_step` boundary

To resolve fault-tolerance issues, we changed where the global step value gets increased.

Access to trainer.global_step during an intra-training validation hook will now correctly return the number of optimizer steps taken already. In pseudocode:

  training_step()
+ global_step += 1
  validation_if_necessary()
- global_step += 1

Saved checkpoints that use the global step value as part of the filename are now increased by 1 for the same reason. A checkpoint saved after 1 step will be now be named step=1.ckpt instead of step=0.ckpt.

The trainer.global_step value will now account for TBPTT or multiple optimizers. Users setting Trainer({min,max}_steps=...) under these circumstances will need to adjust their values.

Removed automatic reduction of outputs in `training_step` when using DataParallel

When using Trainer(strategy="dp"), all the tensors returned by training_step were previously reduced to a scalar (https://github.com/PyTorchLightning/pytorch-lightning/pull/11594). This behavior was especially confusing when outputs needed to be collected into the training_epoch_end hook.

From now on, outputs are no longer reduced except for the loss tensor, unless you implement training_step_end, in which case the loss won't get reduced either.

No longer fallback to CPU with no devices

Previous versions were lenient in that the lack of GPU devices defaulted to running on CPU. This meant that users' code could be running much slower without them ever noticing that it was running on CPU.

We suggest passing Trainer(accelerator="auto") when this leniency is desired.

CHANGELOG

Added

Allow logging to an existing run ID in MLflow with MLFlowLogger (#12290)
Enable gradient accumulation using Horovod's backward_passes_per_step (#11911)
Add new DETAIL log level to provide useful logs for improving monitoring and debugging of batch jobs (#11008)
Added a flag SLURMEnvironment(auto_requeue=True|False) to control whether Lightning handles the requeuing (#10601)
Fault Tolerant Manual
- Add _Stateful protocol to detect if classes are stateful (#10646)
- Add _FaultTolerantMode enum used to track different supported fault tolerant modes (#10645)
- Add a _rotate_worker_indices utility to reload the state according the latest worker (#10647)
- Add stateful workers (#10674)
- Add an utility to collect the states across processes (#10639)
- Add logic to reload the states across data loading components (#10699)
- Cleanup some fault tolerant utilities (#10703)
- Enable Fault Tolerant Manual Training (#10707)
- Broadcast the _terminate_gracefully to all processes and add support for DDP (#10638)
Added support for re-instantiation of custom (subclasses of) DataLoaders returned in the *_dataloader() methods, i.e., automatic replacement of samplers now works with custom types of DataLoader (#10680)
Added a function to validate if fault tolerant training is supported. (#10465)
Added a private callback to manage the creation and deletion of fault-tolerance checkpoints (#11862)
Show a better error message when a custom DataLoader implementation is not well implemented and we need to reconstruct it (#10719)
Show a better error message when frozen dataclass is used as a batch (#10927)
Save the Loop's state by default in the checkpoint (#10784)
Added Loop.replace to easily switch one loop for another (#10324)
Added support for --lr_scheduler=ReduceLROnPlateau to the LightningCLI (#10860)
Added LightningCLI.configure_optimizers to override the configure_optimizers return value (#10860)
Added LightningCLI(auto_registry) flag to register all subclasses of the registerable components automatically (#12108)
Added a warning that shows when max_epochs in the Trainer is not set (#10700)
Added support for returning a single Callback from LightningModule.configure_callbacks without wrapping it into a list (#11060)
Added console_kwargs for RichProgressBar to initialize inner Console (#10875)
Added support for shorthand notation to instantiate loggers with the LightningCLI (#11533)
Added a LOGGER_REGISTRY instance to register custom loggers to the LightningCLI (#11533)
Added info message when the Trainer arguments limit_*_batches, overfit_batches, or val_check_interval are set to 1 or 1.0 (#11950)
Added a PrecisionPlugin.teardown method (#10990)
Added LightningModule.lr_scheduler_step (#10249)
Added support for no pre-fetching to DataFetcher (#11606)
Added support for optimizer step progress tracking with manual optimization (#11848)
Return the output of the optimizer.step. This can be useful for LightningLite users, manual optimization users, or users overriding LightningModule.optimizer_step (#11711)
Teardown the active loop and strategy on exception (#11620)
Added a MisconfigurationException if user provided opt_idx in scheduler config doesn't match with actual optimizer index of its respective optimizer (#11247)
Added a loggers property to Trainer which returns a list of loggers provided by the user (#11683)
Added a loggers property to LightningModule which retrieves the loggers property from Trainer (#11683)
Added support for DDP when using a CombinedLoader for the training data (#11648)
Added a warning when using DistributedSampler during validation/testing (#11479)
Added support for Bagua training strategy (#11146)
Added support for manually returning a poptorch.DataLoader in a *_dataloader hook (#12116)
Added rank_zero module to centralize utilities (#11747)
Added a _Stateful support for LightningDataModule (#11637)
Added _Stateful support for PrecisionPlugin (#11638)
Added Accelerator.is_available to check device availability (#11797)
Enabled static type-checking on the signature of Trainer (#11888)
Added utility functions for moving optimizers to devices (#11758)
Added a warning when saving an instance of nn.Module with save_hyperparameters() (#12068)
Added estimated_stepping_batches property to Trainer (#11599)
Added support for pluggable Accelerators (#12030)
Added profiling for on_load_checkpoint/on_save_checkpoint callback and LightningModule hooks (#12149)
Added LayerSync and NativeSyncBatchNorm plugins (#11754)
Added optional storage_options argument to Trainer.save_checkpoint() to pass to custom CheckpointIO implementations (#11891)
Added support to explicitly specify the process group backend for parallel strategies (#11745)
Added device_ids and num_devices property to Trainer (#12151)
Added Callback.state_dict() and Callback.load_state_dict() methods (#12232)
Added AcceleratorRegistry (#12180)
Added support for Habana Accelerator (HPU) (#11808)
Added support for dataclasses in apply_to_collections (#11889)

Changed

Drop PyTorch 1.7 support (#12191), (#12432)
Make benchmark flag optional and set its value based on the deterministic flag (#11944)
Implemented a new native and rich format in _print_results method of the EvaluationLoop (#11332)
Do not print an empty table at the end of the EvaluationLoop (#12427)
Set the prog_bar flag to False in LightningModule.log_grad_norm (#11472)
Raised exception in init_dist_connection() when torch distributed is not available (#10418)
The monitor argument in the EarlyStopping callback is no longer optional (#10328)
Do not fail if batch size could not be inferred for logging when using DeepSpeed (#10438)
Raised MisconfigurationException when enable_progress_bar=False and a progress bar instance has been passed in the callback list (#10520)
Moved trainer.connectors.env_vars_connector._defaults_from_env_vars to utilities.argsparse._defaults_from_env_vars (#10501)
Changes in LightningCLI required for the new major release of jsonargparse v4.0.0 (#10426)
Renamed refresh_rate_per_second parameter to refresh_rate for RichProgressBar signature (#10497)
Moved ownership of the PrecisionPlugin into TrainingTypePlugin and updated all references (#10570)
Fault Tolerant relies on signal.SIGTERM to gracefully exit instead of signal.SIGUSR1 (#10605)
Loop.restarting=... now sets the value recursively for all subloops (#11442)
Raised an error if the batch_size cannot be inferred from the current batch if it contained a string or was a custom batch object (#10541)
The validation loop is now disabled when overfit_batches > 0 is set in the Trainer (#9709)
Moved optimizer related logics from Accelerator to TrainingTypePlugin (#10596)
Moved ownership of the lightning optimizers from the Trainer to the Strategy (#11444)
Moved ownership of the data fetchers from the DataConnector to the Loops (#11621)
Moved batch_to_device method from Accelerator to TrainingTypePlugin (#10649)
The DDPSpawnPlugin no longer overrides the post_dispatch plugin hook (#10034)
Integrate the progress bar implementation with progress tracking (#11213)
The LightningModule.{add_to_queue,get_from_queue} hooks no longer get a torch.multiprocessing.SimpleQueue and instead receive a list based queue (#10034)
Changed training_step, validation_step, test_step and predict_step method signatures in Accelerator and updated input from caller side (#10908)
Changed the name of the temporary checkpoint that the DDPSpawnPlugin and related plugins save (#10934)
LoggerCollection returns only unique logger names and versions (#10976)
Redesigned process creation for spawn-based plugins (DDPSpawnPlugin, TPUSpawnPlugin, etc.) (#10896)
- All spawn-based plugins now spawn processes immediately upon calling Trainer.{fit,validate,test,predict}
- The hooks/callbacks prepare_data, setup, configure_sharded_model and teardown now run under initialized process group for spawn-based plugins just like their non-spawn counterparts
- Some configuration errors that were previously raised as MisconfigurationExceptions will now be raised as ProcessRaisedException (torch>=1.8) or as Exception (torch<1.8)
- Removed the TrainingTypePlugin.pre_dispatch() method and merged it with TrainingTypePlugin.setup() (#11137)
Changed profiler to index and display the names of the hooks with a new pattern []. (#11026)
Changed batch_to_device entry in profiling from stage-specific to generic, to match profiling of other hooks (#11031)
Changed the info message for finalizing ddp-spawn worker processes to a debug-level message (#10864)
Removed duplicated file extension when uploading model checkpoints with NeptuneLogger (#11015)
Removed __getstate__ and __setstate__ of RichProgressBar (#11100)
The DDPPlugin and DDPSpawnPlugin and their subclasses now remove the SyncBatchNorm wrappers in teardown() to enable proper support at inference after fitting (#11078)
Moved ownership of the Accelerator instance to the TrainingTypePlugin; all training-type plugins now take an optional parameter accelerator (#11022)
Renamed the TrainingTypePlugin to Strategy (#11120)
- Renamed the ParallelPlugin to ParallelStrategy (#11123)
- Renamed the DataParallelPlugin to DataParallelStrategy (#11183)
- Renamed the DDPPlugin to DDPStrategy (#11142)
- Renamed the DDP2Plugin to DDP2Strategy (#11185)
- Renamed the DDPShardedPlugin to DDPShardedStrategy (#11186)
- Renamed the DDPFullyShardedPlugin to DDPFullyShardedStrategy (#11143)
- Renamed the DDPSpawnPlugin to DDPSpawnStrategy (#11145)
- Renamed the DDPSpawnShardedPlugin to DDPSpawnShardedStrategy (#11210)
- Renamed the DeepSpeedPlugin to DeepSpeedStrategy (#11194)
- Renamed the HorovodPlugin to HorovodStrategy (#11195)
- Renamed the TPUSpawnPlugin to TPUSpawnStrategy (#11190)
- Renamed the IPUPlugin to IPUStrategy (#11193)
- Renamed the SingleDevicePlugin to SingleDeviceStrategy (#11182)
- Renamed the SingleTPUPlugin to SingleTPUStrategy (#11182)
- Renamed the TrainingTypePluginsRegistry to StrategyRegistry (#11233)
Marked the ResultCollection, ResultMetric, and ResultMetricCollection classes as protected (#11130)
Marked trainer.checkpoint_connector as protected (#11550)
The epoch start/end hooks are now called by the FitLoop instead of the TrainingEpochLoop (#11201)
DeepSpeed does not require lightning module zero 3 partitioning (#10655)
Moved Strategy classes to the strategies directory (#11226)
Renamed training_type_plugin file to strategy (#11239)
Changed DeviceStatsMonitor to group metrics based on the logger's group_separator (#11254)
Raised UserWarning if evaluation is triggered with best ckpt and trainer is configured with multiple checkpoint callbacks (#11274)
Trainer.logged_metrics now always contains scalar tensors, even when a Python scalar was logged (#11270)
The tuner now uses the checkpoint connector to copy and restore its state (#11518)
Changed MisconfigurationException to ModuleNotFoundError when rich isn't available (#11360)
The trainer.current_epoch value is now increased by 1 during and after on_train_end (#8578)
The trainer.global_step value now accounts for multiple optimizers and TBPTT splits (#11805)
The trainer.global_step value is now increased right after the optimizer.step() call which will impact users who access it during an intra-training validation hook (#11805)
The filename of checkpoints created with ModelCheckpoint(filename='{step}') is different compared to previous versions. A checkpoint saved after 1 step will be named step=1.ckpt instead of step=0.ckpt (#11805)
Inherit from ABC for Accelerator: Users need to implement auto_device_count (#11521)
Changed parallel_devices property in ParallelStrategy to be lazy initialized (#11572)
Updated TQDMProgressBar to run a separate progress bar for each eval dataloader (#11657)
Sorted SimpleProfiler(extended=False) summary based on mean duration for each hook (#11671)
Avoid enforcing shuffle=False for eval dataloaders (#11575)
When using DP (data-parallel), Lightning will no longer automatically reduce all tensors returned in training_step; it will only reduce the loss unless training_step_end is overridden (#11594)
When using DP (data-parallel), the training_epoch_end hook will no longer receive reduced outputs from training_step and instead get the full tensor of results from all GPUs (#11594)
Changed default logger name to lightning_logs for consistency (#11762)
Rewrote accelerator_connector (#11448)
When manual optimization is used with DDP, we no longer force find_unused_parameters=True (#12425)
Disable loading dataloades if corresponding limit_batches=0 (#11576)
Removed is_global_zero check in training_epoch_loop before logger.save. If you have a custom logger that implements save the Trainer will now call save on all ranks by default. To change this behavior add @rank_zero_only to your save implementation (#12134)
Disabled tuner with distributed strategies (#12179)
Marked trainer.logger_connector as protected (#12195)
Move Strategy.process_dataloader function call from fit/evaluation/predict_loop.py to data_connector.py (#12251)
ModelCheckpoint(save_last=True, every_n_epochs=N) now saves a "last" checkpoint every epoch (disregarding every_n_epochs) instead of only once at the end of training (#12418)
The strategies that support sync_batchnorm now only apply it when fitting (#11919)
Avoided fallback on CPU if no devices are provided for other accelerators (#12410)
Modified supporters.py so that in the accumulator element (for loss) is created directly on the device (#12430)
Removed EarlyStopping.on_save_checkpoint and EarlyStopping.on_load_checkpoint in favor of EarlyStopping.state_dict and EarlyStopping.load_state_dict (#11887)
Removed BaseFinetuning.on_save_checkpoint and BaseFinetuning.on_load_checkpoint in favor of BaseFinetuning.state_dict and BaseFinetuning.load_state_dict (#11887)
Removed BackboneFinetuning.on_save_checkpoint and BackboneFinetuning.on_load_checkpoint in favor of BackboneFinetuning.state_dict and BackboneFinetuning.load_state_dict (#11887)
Removed ModelCheckpoint.on_save_checkpoint and ModelCheckpoint.on_load_checkpoint in favor of ModelCheckpoint.state_dict and ModelCheckpoint.load_state_dict (#11887)
Removed Timer.on_save_checkpoint and Timer.on_load_checkpoint in favor of Timer.state_dict and Timer.load_state_dict (#11887)
Replaced PostLocalSGDOptimizer with a dedicated model averaging component (#12378)

Deprecated

Deprecated training_type_plugin property in favor of strategy in Trainer and updated the references (#11141)
Deprecated Trainer.{validated,tested,predicted}_ckpt_path and replaced with read-only property Trainer.ckpt_path set when checkpoints loaded via Trainer.{fit,validate,test,predict} (#11696)
Deprecated ClusterEnvironment.master_{address,port} in favor of ClusterEnvironment.main_{address,port} (#10103)
Deprecated DistributedType in favor of _StrategyType (#10505)
Deprecated the precision_plugin constructor argument from Accelerator (#10570)
Deprecated DeviceType in favor of _AcceleratorType (#10503)
Deprecated the property Trainer.slurm_job_id in favor of the new SLURMEnvironment.job_id() method (#10622)
Deprecated the access to the attribute IndexBatchSamplerWrapper.batch_indices in favor of IndexBatchSamplerWrapper.seen_batch_indices (#10870)
Deprecated on_init_start and on_init_end callback hooks (#10940)
Deprecated Trainer.call_hook in favor of Trainer._call_callback_hooks, Trainer._call_lightning_module_hook, Trainer._call_ttp_hook, and Trainer._call_accelerator_hook (#10979)
Deprecated TrainingTypePlugin.post_dispatch in favor of TrainingTypePlugin.teardown (#10939)
Deprecated ModelIO.on_hpc_{save/load} in favor of CheckpointHooks.on_{save/load}_checkpoint (#10911)
Deprecated Trainer.run_stage in favor of Trainer.{fit,validate,test,predict} (#11000)
Deprecated Trainer.lr_schedulers in favor of Trainer.lr_scheduler_configs which returns a list of dataclasses instead of dictionaries (#11443)
Deprecated Trainer.verbose_evaluate in favor of EvaluationLoop(verbose=...) (#10931)
Deprecated Trainer.should_rank_save_checkpoint Trainer property (#11068)
Deprecated Trainer.lightning_optimizers (#11444)
Deprecated TrainerOptimizersMixin and moved functionality to core/optimizer.py(#11155)
Deprecated the on_train_batch_end(outputs) format when multiple optimizers are used and TBPTT is enabled (#12182)
Deprecated the training_epoch_end(outputs) format when multiple optimizers are used and TBPTT is enabled (#12182)
Deprecated TrainerCallbackHookMixin (#11148)
Deprecated TrainerDataLoadingMixin and moved functionality to Trainer and DataConnector (#11282)
Deprecated function pytorch_lightning.callbacks.device_stats_monitor.prefix_metric_keys (#11254)
Deprecated Callback.on_epoch_start hook in favour of Callback.on_{train/val/test}_epoch_start (#11578)
Deprecated Callback.on_epoch_end hook in favour of Callback.on_{train/val/test}_epoch_end (#11578)
Deprecated LightningModule.on_epoch_start hook in favor of LightningModule.on_{train/val/test}_epoch_start (#11578)
Deprecated LightningModule.on_epoch_end hook in favor of LightningModule.on_{train/val/test}_epoch_end (#11578)
Deprecated on_before_accelerator_backend_setup callback hook in favour of setup (#11568)
Deprecated on_batch_start and on_batch_end callback hooks in favor of on_train_batch_start and on_train_batch_end (#11577)
Deprecated on_configure_sharded_model callback hook in favor of setup (#11627)
Deprecated pytorch_lightning.utilities.distributed.rank_zero_only in favor of pytorch_lightning.utilities.rank_zero.rank_zero_only (#11747)
Deprecated pytorch_lightning.utilities.distributed.rank_zero_debug in favor of pytorch_lightning.utilities.rank_zero.rank_zero_debug (#11747)
Deprecated pytorch_lightning.utilities.distributed.rank_zero_info in favor of pytorch_lightning.utilities.rank_zero.rank_zero_info (#11747)
Deprecated pytorch_lightning.utilities.warnings.rank_zero_warn in favor of pytorch_lightning.utilities.rank_zero.rank_zero_warn (#11747)
Deprecated pytorch_lightning.utilities.warnings.rank_zero_deprecation in favor of pytorch_lightning.utilities.rank_zero.rank_zero_deprecation (#11747)
Deprecated pytorch_lightning.utilities.warnings.LightningDeprecationWarning in favor of pytorch_lightning.utilities.rank_zero.LightningDeprecationWarning
Deprecated on_pretrain_routine_start and on_pretrain_routine_end callback hooks in favor of on_fit_start (#11794)
Deprecated LightningModule.on_pretrain_routine_start and LightningModule.on_pretrain_routine_end hooks in favor of on_fit_start (#12122)
Deprecated agg_key_funcs and agg_default_func parameters from LightningLoggerBase (#11871)
Deprecated LightningLoggerBase.update_agg_funcs (#11871)
Deprecated LightningLoggerBase.agg_and_log_metrics in favor of LightningLoggerBase.log_metrics (#11832)
Deprecated passing weights_save_path to the Trainer constructor in favor of adding the ModelCheckpoint callback with dirpath directly to the list of callbacks (#12084)
Deprecated pytorch_lightning.profiler.AbstractProfiler in favor of pytorch_lightning.profiler.Profiler (#12106)
Deprecated pytorch_lightning.profiler.BaseProfiler in favor of pytorch_lightning.profiler.Profiler (#12150)
Deprecated BaseProfiler.profile_iterable (#12102)
Deprecated LoggerCollection in favor of trainer.loggers (#12147)
Deprecated PrecisionPlugin.on_{save,load}_checkpoint in favor of PrecisionPlugin.{state_dict,load_state_dict} (#11978)
Deprecated LightningDataModule.on_save/load_checkpoint in favor of state_dict/load_state_dict (#11893)
Deprecated Trainer.use_amp in favor of Trainer.amp_backend (#12312)
Deprecated LightingModule.use_amp in favor of Trainer.amp_backend (#12315)
Deprecated specifying the process group backend through the environment variable PL_TORCH_DISTRIBUTED_BACKEND (#11745)
Deprecated ParallelPlugin.torch_distributed_backend in favor of DDPStrategy.process_group_backend property (#11745)
Deprecated ModelCheckpoint.save_checkpoint in favor of Trainer.save_checkpoint (#12456)
Deprecated Trainer.devices in favor of Trainer.num_devices and Trainer.device_ids (#12151)
Deprecated Trainer.root_gpu in favor of Trainer.strategy.root_device.index when GPU is used (#12262)
Deprecated Trainer.num_gpus in favor of Trainer.num_devices when GPU is used (#12384)
Deprecated Trainer.ipus in favor of Trainer.num_devices when IPU is used (#12386)
Deprecated Trainer.num_processes in favor of Trainer.num_devices (#12388)
Deprecated Trainer.data_parallel_device_ids in favor of Trainer.device_ids ([#12072](https://redirect.github

Configuration

📅 Schedule: Branch creation - "" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

Update dependency pytorch-lightning to v1.6.0 [SECURITY]

b9daf9d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update dependency pytorch-lightning to v1.6.0 [SECURITY] #340

Update dependency pytorch-lightning to v1.6.0 [SECURITY] #340

Uh oh!

renovate bot commented Aug 18, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Update dependency pytorch-lightning to v1.6.0 [SECURITY] #340

Are you sure you want to change the base?

Update dependency pytorch-lightning to v1.6.0 [SECURITY] #340

Uh oh!

Conversation

renovate bot commented Aug 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GitHub Vulnerability Alerts

CVE-2021-4118

CVE-2022-0845

Release Notes

v1.6.0: PyTorch Lightning 1.6: Support Intel's Habana Accelerator, New efficient DDP strategy (Bagua), Manual Fault-tolerance, Stability and Reliability.

Highlights

Introducing Intel's Habana Accelerator

The Bagua Strategy

Towards stable Accelerator, Strategy, and Plugin APIs

Before

New

LightningCLI improvements

Control SLURM's re-queueing

Fault-tolerance improvements

Loop customization improvements

Data-Loading improvements

New Hooks

LightningModule.lr_scheduler_step

A new stateful API

New properties

Trainer.estimated_stepping_batches

Trainer.num_devices and Trainer.device_ids

Experimental Features

Manual Fault-tolerance

Customizing the layer synchronization

Registering Custom Accelerators

Backward Incompatible Changes

Drop PyTorch 1.7 support

Drop Python 3.6 support

AcceleratorConnector rewrite

Re-define the current_epoch boundary

Re-define the global_step boundary

Removed automatic reduction of outputs in training_step when using DataParallel

No longer fallback to CPU with no devices

CHANGELOG

Configuration

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

renovate bot commented Aug 18, 2024 •

edited

Loading

`v1.6.0`: PyTorch Lightning 1.6: Support Intel's Habana Accelerator, New efficient DDP strategy (Bagua), Manual Fault-tolerance, Stability and Reliability.

`LightningCLI` improvements

`LightningModule.lr_scheduler_step`

`Trainer.estimated_stepping_batches`

`Trainer.num_devices` and `Trainer.device_ids`

`AcceleratorConnector` rewrite

Re-define the `current_epoch` boundary

Re-define the `global_step` boundary

Removed automatic reduction of outputs in `training_step` when using DataParallel