Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3080 & 3090 coumpute capability 86 degraded performance after some updates #44116

Closed
ibmua opened this issue Oct 17, 2020 · 8 comments
Closed
Assignees
Labels
comp:gpu GPU related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author type:performance Performance Issue

Comments

@ibmua
Copy link

ibmua commented Oct 17, 2020

This issue is apparent from the difference in performance in NGC containers https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow . For 20.08 se-resnext101 example training performance

python nvidia-examples/resnet50v1.5/main.py --arch=se-resnext101-32x4d --batch_size=64  --warmup_steps 200 --data_dir=/hdd/datasets/imagenet/tf/train/ --gpu_memory_fraction 0.95  --precision fp32  --results_dir=/toy/tmp/results_dir/   --mode=training_benchmark   --use_tf_amp --use_xla

(have to adapt directories)
on 3080 is around 370-400 img/sec. While on 20.09 container it's more like 115 img/sec. This is also similar for resnet-50 and most likely all other CNN benchmarks. This is not an issue with my setup, it's the same for other folks - you can view discussion at https://www.pugetsystems.com/labs/hpc/RTX3090-TensorFlow-NAMD-and-HPCG-Performance-on-Linux-Preliminary-1902/

@ibmua ibmua added the type:performance Performance Issue label Oct 17, 2020
@amahendrakar amahendrakar added the comp:gpu GPU related issues label Oct 19, 2020
@gowthamkpr gowthamkpr assigned sanjoy and unassigned gowthamkpr Oct 20, 2020
@gowthamkpr gowthamkpr added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Oct 20, 2020
@sanjoy
Copy link
Contributor

sanjoy commented Oct 22, 2020

Can you please report this on the NVIDIA developer forum?

CC @nluehr

We can circle back here if/when this is triaged down to an issue with the TF nightly and/or TF release builds.

@tensorflowbutler tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Oct 24, 2020
@ibmua
Copy link
Author

ibmua commented Oct 24, 2020

3090's performance on 20.10 tf1 ngc container is even 15-20% better than on 20.08, so I guess we can just agree that we should never use 20.08 container because it sucks and let go of this issue. I'll later also try to run some tests on 20.10 tf2 and report back.

Edit: seems like might be very different for different cases, though. Got to test more. Will report later.

@ibmua
Copy link
Author

ibmua commented Nov 10, 2020

Ran extensive benchmarks https://fsymbols.com/3080-3090-benchmarks/

@ibmua
Copy link
Author

ibmua commented Nov 12, 2020

(And the performance was pretty inconsistent, you better take a look.)

@ibmua
Copy link
Author

ibmua commented Nov 25, 2020

Retested on 20.11 container. 3080 performance still effed up https://fsymbols.com/3080-3090-benchmarks/ It still has Cudnn 8.04 and same CUDA version as 20.10 container, though.

@sachinprasadhs
Copy link
Contributor

Could you please test in the latest 22.01-tf1 https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow container and let us know if you are getting good performance. Thanks

@sachinprasadhs sachinprasadhs self-assigned this Feb 19, 2022
@sachinprasadhs sachinprasadhs added the stat:awaiting response Status - Awaiting response from author label Feb 19, 2022
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Feb 26, 2022
@google-ml-butler
Copy link

Closing as stale. Please reopen if you'd like to work on this further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gpu GPU related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author type:performance Performance Issue
Projects
None yet
Development

No branches or pull requests

6 participants