Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ This project includes the semi-supervised and semi-weakly supervised ImageNet mo

"Semi-supervised" (SSL) ImageNet models are pre-trained on a subset of unlabeled YFCC100M public image dataset and fine-tuned with the ImageNet1K training dataset, as described by the semi-supervised training framework in the paper mentioned above. In this case, the high capacity teacher model was trained only with labeled examples.

"Semi-weakly" supervised (SWSL) ImageNet models are pre-trained on **940 million** public images with 1.5K hashtags matching with 1000 ImageNet1K synsets, followed by fine-tuning on ImageNet1K dataset. In this case, the associated hashtags are only used for building a better teacher model. During training the student model, those hashtags are ingored and the student model is pretrained with a subset of 64M images selected by the teacher model from the same 940 million public image dataset.
"Semi-weakly" supervised (SWSL) ImageNet models are pre-trained on **940 million** public images with 1.5K hashtags matching with 1000 ImageNet1K synsets, followed by fine-tuning on ImageNet1K dataset. In this case, the associated hashtags are only used for building a better teacher model. During training the student model, those hashtags are ignored and the student model is pretrained with a subset of 64M images selected by the teacher model from the same 940 million public image dataset.

Semi-weakly supervised ResNet and ResNext models provided in the table below significantly improve the top-1 accuracy on the ImageNet validation set compared to training from scratch or other training mechanisms introduced in the literature as of September 2019. For example, **We achieve state-of-the-art accuracy of 81.2% on ImageNet for the widely used/adopted ResNet-50 model architecture**.

Expand Down
4 changes: 2 additions & 2 deletions nvidia_deeplearningexamples_efficientnet.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ for uri, result in zip(uris, results):
```

### Details
For detailed information on model input and output, training recipies, inference and performance visit:
For detailed information on model input and output, training recipes, inference and performance visit:
[github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/ConvNets/efficientnet)
and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:efficientnet_for_pytorch)

Expand All @@ -123,4 +123,4 @@ and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:efficientnet_for_py
- [pretrained model on NGC (efficientnet-widese-b4)](https://ngc.nvidia.com/catalog/models/nvidia:efficientnet_widese_b4_pyt_amp)
- [pretrained, quantized model on NGC (efficientnet-widese-b0)](https://ngc.nvidia.com/catalog/models/nvidia:efficientnet_widese_b0_pyt_amp)
- [pretrained, quantized model on NGC (efficientnet-widese-b4)](https://ngc.nvidia.com/catalog/models/nvidia:efficientnet_widese_b4_pyt_amp)


8 changes: 4 additions & 4 deletions nvidia_deeplearningexamples_fastpitch.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ In the example below:
- HiFiGAN generates sound given the mel spectrogram
- the output sound is saved in an 'audio.wav' file

To run the example you need some extra python packages installed. These are needed for preprocessing of text and audio, as well as for display and input/output handling. Finally, for better performance of FastPitch model, we download the CMU pronounciation dictionary.
To run the example you need some extra python packages installed. These are needed for preprocessing of text and audio, as well as for display and input/output handling. Finally, for better performance of the FastPitch model, we download the CMU pronunciation dictionary.
```bash
apt-get update
apt-get install -y libsndfile1 wget
Expand Down Expand Up @@ -99,7 +99,7 @@ Load text processor.
tp = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_textprocessing_utils', cmudict_path="cmudict-0.7b", heteronyms_path="heteronyms")
```

Set the text to be synthetized, prepare input and set additional generation parameters.
Set the text to be synthesized, prepare input and set additional generation parameters.
```python
text = "Say this smoothly, to prove you are not a robot."
```
Expand Down Expand Up @@ -136,7 +136,7 @@ plt.ylabel('frequency')
_=plt.title('Spectrogram')
```

Syntesize audio.
Synthesize audio.
```python
audio_numpy = audios[0].cpu().numpy()
Audio(audio_numpy, rate=22050)
Expand All @@ -149,7 +149,7 @@ write("audio.wav", vocoder_train_setup['sampling_rate'], audio_numpy)
```

### Details
For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/HiFiGAN) and/or [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/resources/fastpitch_pyt)
For detailed information on model input and output, training recipes, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/HiFiGAN) and/or [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/resources/fastpitch_pyt)

### References

Expand Down
2 changes: 1 addition & 1 deletion nvidia_deeplearningexamples_gpunet.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ for uri, result in zip(uris, results):
```

### Details
For detailed information on model input and output, training recipies, inference and performance visit:
For detailed information on model input and output, training recipes, inference and performance visit:
[github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/GPUNet)

### References
Expand Down
10 changes: 5 additions & 5 deletions nvidia_deeplearningexamples_hifigan.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ In the example below:
- HiFiGAN generates sound given the mel spectrogram
- the output sound is saved in an 'audio.wav' file

To run the example you need some extra python packages installed. These are needed for preprocessing of text and audio, as well as for display and input/output handling. Finally, for better performance of FastPitch model, we download the CMU pronounciation dictionary.
To run the example you need some extra python packages installed. These are needed for preprocessing of text and audio, as well as for display and input/output handling. Finally, for better performance of the FastPitch model, we download the CMU pronunciation dictionary.
```bash
pip install numpy scipy librosa unidecode inflect librosa matplotlib==3.6.3
apt-get update
Expand Down Expand Up @@ -92,7 +92,7 @@ Load text processor.
tp = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_textprocessing_utils', cmudict_path="cmudict-0.7b", heteronyms_path="heteronyms")
```

Set the text to be synthetized, prepare input and set additional generation parameters.
Set the text to be synthesized, prepare input and set additional generation parameters.
```python
text = "Say this smoothly, to prove you are not a robot."
```
Expand Down Expand Up @@ -129,7 +129,7 @@ plt.ylabel('frequency')
_=plt.title('Spectrogram')
```

Syntesize audio.
Synthesize audio.
```python
audio_numpy = audios[0].cpu().numpy()
Audio(audio_numpy, rate=22050)
Expand All @@ -142,12 +142,12 @@ write("audio.wav", vocoder_train_setup['sampling_rate'], audio_numpy)
```

### Details
For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/HiFiGAN) and/or [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/resources/hifigan_pyt)
For detailed information on model input and output, training recipes, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/HiFiGAN) and/or [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/resources/hifigan_pyt)

### References

- [HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis](https://arxiv.org/abs/2010.05646)
- [Original implementation](https://github.com/jik876/hifi-gan)
- [FastPitch on NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/resources/fastpitch_pyt)
- [HiFi-GAN on NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/resources/hifigan_pyt)
- [FastPitch and HiFi-GAN on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/HiFi-GAN)
- [FastPitch and HiFi-GAN on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/HiFi-GAN)
2 changes: 1 addition & 1 deletion nvidia_deeplearningexamples_resnet50.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ for uri, result in zip(uris, results):

### Details

For detailed information on model input and output, training recipies, inference and performance visit:
For detailed information on model input and output, training recipes, inference and performance visit:
[github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/ConvNets/resnet50v1.5)
and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:resnet_50_v1_5_for_pytorch)

Expand Down
2 changes: 1 addition & 1 deletion nvidia_deeplearningexamples_resnext.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ for uri, result in zip(uris, results):
```

### Details
For detailed information on model input and output, training recipies, inference and performance visit:
For detailed information on model input and output, training recipes, inference and performance visit:
[github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/ConvNets/resnext101-32x4d)
and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:resnext_for_pytorch)

Expand Down
2 changes: 1 addition & 1 deletion nvidia_deeplearningexamples_se-resnext.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ for uri, result in zip(uris, results):
```

### Details
For detailed information on model input and output, training recipies, inference and performance visit:
For detailed information on model input and output, training recipes, inference and performance visit:
[github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/ConvNets/se-resnext101-32x4d)
and/or [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/resources/se_resnext_for_pytorch).

Expand Down
2 changes: 1 addition & 1 deletion nvidia_deeplearningexamples_ssd.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ plt.show()

### Details
For detailed information on model input and output,
training recipies, inference and performance visit:
training recipes, inference and performance visit:
[github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD)
and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:ssd_for_pytorch)

Expand Down
2 changes: 1 addition & 1 deletion nvidia_deeplearningexamples_tacotron2.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ Audio(audio_numpy, rate=rate)
```

### Details
For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch)
For detailed information on model input and output, training recipes, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch)

### References

Expand Down
2 changes: 1 addition & 1 deletion nvidia_deeplearningexamples_waveglow.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ Audio(audio_numpy, rate=rate)
```

### Details
For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch)
For detailed information on model input and output, training recipes, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch)

### References

Expand Down
2 changes: 1 addition & 1 deletion pytorch_vision_deeplabv3_resnet101.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ To get the maximum prediction of each class, and then use it for a downstream ta
Here's a small snippet that plots the predictions, with each color being assigned to each class (see the visualized image on the left).

```python
# create a color pallette, selecting a color for each class
# create a color palette, selecting a color for each class
palette = torch.tensor([2 ** 25 - 1, 2 ** 15 - 1, 2 ** 21 - 1])
colors = torch.as_tensor([i for i in range(21)])[:, None] * palette
colors = (colors % 255).numpy().astype("uint8")
Expand Down
4 changes: 2 additions & 2 deletions pytorch_vision_fcn_resnet101.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ The images have to be loaded in to a range of `[0, 1]` and then normalized using
and `std = [0.229, 0.224, 0.225]`.

The model returns an `OrderedDict` with two Tensors that are of the same height and width as the input Tensor, but with 21 classes.
`output['out']` contains the semantic masks, and `output['aux']` contains the auxillary loss values per-pixel. In inference mode, `output['aux']` is not useful.
`output['out']` contains the semantic masks, and `output['aux']` contains the auxiliary loss values per-pixel. In inference mode, `output['aux']` is not useful.
So, `output['out']` is of shape `(N, 21, H, W)`. More documentation can be found [here](https://pytorch.org/vision/stable/models.html#object-detection-instance-segmentation-and-person-keypoint-detection).


Expand Down Expand Up @@ -73,7 +73,7 @@ To get the maximum prediction of each class, and then use it for a downstream ta
Here's a small snippet that plots the predictions, with each color being assigned to each class (see the visualized image on the left).

```python
# create a color pallette, selecting a color for each class
# create a color palette, selecting a color for each class
palette = torch.tensor([2 ** 25 - 1, 2 ** 15 - 1, 2 ** 21 - 1])
colors = torch.as_tensor([i for i in range(21)])[:, None] * palette
colors = (colors % 255).numpy().astype("uint8")
Expand Down
2 changes: 1 addition & 1 deletion pytorch_vision_googlenet.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ for i in range(top5_prob.size(0)):

### Model Description

GoogLeNet was based on a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The 1-crop error rates on the ImageNet dataset with a pretrained model are list below.
GoogLeNet was based on a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The 1-crop error rates on the ImageNet dataset with a pretrained model are listed below.

| Model structure | Top-1 error | Top-5 error |
| --------------- | ----------- | ----------- |
Expand Down
2 changes: 1 addition & 1 deletion pytorch_vision_once_for_all.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ model, image_size = ofa_specialized_get("flops@[email protected]_finetune@75", pret
model.eval()
```

The model's prediction can be evalutaed by
The model's prediction can be evaluated by
```python
# Download an example image from pytorch website
import urllib
Expand Down
4 changes: 2 additions & 2 deletions pytorch_vision_proxylessnas.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ demo-model-link: https://huggingface.co/spaces/pytorch/ProxylessNAS
```python
import torch
target_platform = "proxyless_cpu"
# proxyless_gpu, proxyless_mobile, proxyless_mobile14 are also avaliable.
# proxyless_gpu, proxyless_mobile, proxyless_mobile14 are also available.
model = torch.hub.load('mit-han-lab/ProxylessNAS', target_platform, pretrained=True)
model.eval()
```
Expand Down Expand Up @@ -87,7 +87,7 @@ for i in range(top5_prob.size(0)):

ProxylessNAS models are from the [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/abs/1812.00332) paper.

Conventionally, people tend to design *one efficient model* for *all hardware platforms*. But different hardware has different properties, for example, CPU has higher frequency and GPU is better at parallization. Therefore, instead of generalizing, we need to **specialize** CNN architectures for different hardware platforms. As shown in below, with similar accuracy, specialization offers free yet significant performance boost on all three platforms.
Conventionally, people tend to design *one efficient model* for *all hardware platforms*. But different hardware has different properties, for example, CPU has higher frequency and GPU is better at parallelization. Therefore, instead of generalizing, we need to **specialize** CNN architectures for different hardware platforms. As shown in below, with similar accuracy, specialization offers free yet significant performance boost on all three platforms.

| Model structure | GPU Latency | CPU Latency | Mobile Latency
| --------------- | ----------- | ----------- | ----------- |
Expand Down
8 changes: 4 additions & 4 deletions pytorch_vision_resnext.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: hub_detail
background-class: hub-background
body-class: hub
title: ResNext
title: ResNeXt
summary: Next generation ResNets, more efficient and accurate
category: researchers
image: resnext.png
Expand Down Expand Up @@ -87,9 +87,9 @@ for i in range(top5_prob.size(0)):

### Model Description

Resnext models were proposed in [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431).
Here we have the 2 versions of resnet models, which contains 50, 101 layers repspectively.
A comparison in model archetechure between resnet50 and resnext50 can be found in Table 1.
ResNeXt models were proposed in [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431).
Here are the two versions of ResNeXt models, which contain 50 and 101 layers, respectively.
A comparison of model architecture between ResNet-50 and ResNeXt-50 can be found in Table 1.
Their 1-crop error rates on ImageNet dataset with pretrained models are listed below.

| Model structure | Top-1 error | Top-5 error |
Expand Down
4 changes: 2 additions & 2 deletions sigsep_open-unmix-pytorch_umx.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Furthermore, we provide a model for speech enhancement trained by [Sony Corporat

* __`umxse`__ speech enhancement model is trained on the 28-speaker version of the [Voicebank+DEMAND corpus](https://datashare.is.ed.ac.uk/handle/10283/1942?show=full).

All three models are also available as spectrogram (core) models, which take magnitude spectrogram inputs and ouput separated spectrograms.
All three models are also available as spectrogram (core) models, which take magnitude spectrogram inputs and output separated spectrograms.
These models can be loaded using `umxhq_spec`, `umx_spec` and `umxse_spec`.

### Details
Expand All @@ -77,4 +77,4 @@ pip install openunmix
### References

- [Open-Unmix - A Reference Implementation for Music Source Separation](https://doi.org/10.21105/joss.01667)
- [SigSep - Open Ressources for Music Separation](https://sigsep.github.io/)
- [SigSep - Open Resources for Music Separation](https://sigsep.github.io/)
6 changes: 3 additions & 3 deletions test_run_python_code.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
@pytest.mark.parametrize('file_path', ALL_FILES)
def test_run_file(file_path):
if 'nvidia' in file_path:
# FIXME: NVIDIA models checkoints are on cuda
# FIXME: NVIDIA models checkpoints are on CUDA
pytest.skip("temporarily disabled")
if 'pytorch_fairseq_translation' in file_path:
pytest.skip("temporarily disabled")
Expand All @@ -26,11 +26,11 @@ def test_run_file(file_path):

# We just run the python files in a separate sub-process. We really want a
# subprocess here because otherwise we might run into package versions
# issues: imagine script A that needs torchvivion 0.9 and script B that
# issues: imagine script A that needs torchvision 0.9 and script B that
# needs torchvision 0.10. If script A is run prior to script B in the same
# process, script B will still be run with torchvision 0.9 because the only
# "import torchvision" statement that counts is the first one, and even
# torchub sys.path shenanigans can do nothing about this. By creating
# torchhub sys.path shenanigans can do nothing about this. By creating
# subprocesses we're sure that all file executions are fully independent.
try:
# This is inspired (and heavily simplified) from
Expand Down