AWS

p3.16xlarge

EC2 Instance

About the

p3.16xlarge

instance

GPU instance with 64 vCPUs, 488 GiB memory, and 8 NVIDIA V100 GPUs with 128 GB total GPU memory. Highest GPU density in P3 family for large-scale parallel processing.

Coming Soon...

Pricing of

p3.16xlarge

N/A

On Demand

N/A

Spot

N/A

1 Yr Reserved

N/A

3 Yr Reserved

sedai

Let us help you choose the right instance

Schedule a meeting

Spot Pricing Details for

p3.16xlarge

Here's the latest prices for this instance across this region:

Availability Zone	Current Spot Price (USD)

Frequency of Interruptions: n/a

Frequency of interruption represents the rate at which Spot has reclaimed capacity during the trailing month. They are in ranges of < 5%, 5-10%, 10-15%, 15-20% and >20%.

Last Updated On: December 17, 2024

Compute features of

p3.16xlarge

Feature	Specification

Storage features of

p3.16xlarge

Feature	Specification

Networking features of

p3.16xlarge

Feature	Specification

Operating Systems Supported by

p3.16xlarge

Operating System	Supported

Security features of

p3.16xlarge

Feature	Supported

General Information about

p3.16xlarge

Feature	Specification

Benchmark Test Results for

p3.16xlarge

CPU Encryption Speed Benchmarks

Cloud Mercato tested CPU performance using a range of encryption speed tests:

Encryption Algorithm	Speed (1024 Block Size, 3 threads)
AES-128 CBC	N/A
AES-256 CBC	N/A
MD5	N/A
SHA256	N/A
SHA512	N/A

I/O Performance

Cloud Mercato's tested the I/O performance of this instance using a 100GB General Purpose SSD. Below are the results:

	Read	Write
Max	N/A	N/A
Average	N/A	N/A
Deviation	N/A	N/A
Min	N/A	N/A

I/O rate testing is conducted with local and block storages attached to the instance. Cloud Mercato uses the well-known open-source tool FIO. To express IOPS the following parametersare used: 4K block, random access, no filesystem (except for write access with root volume and avoidance of cache and buffer.

See

Cloud Mercato's full benchmark results for this instance here

Community Insights for

p3.16xlarge

AI-summarized insights

Filter by:

All

he repeated the test against the largest of the P3 instances, the p3.16xlarge.

19-03-2025

benchmarking

Gartner Peer Insights content consists of the opinions of individual end users based on their own experiences, and should not be construed as statements of fact, nor do they represent the views of Gartner or its affiliates. Gartner does not endorse any vendor, product or service depicted in this content nor makes any warranties, expressed or implied, with respect to this content, about its accuracy or completeness, including any warranties of merchantability or fitness for a particular purpose. This site is protected by hCaptcha and its [Privacy Policy](https://hcaptcha.com/privacy) and [Terms of Service](https://hcaptcha.com/terms) apply.

19-03-2025

Gartner Peer Insights content consists of the opinions of individual end users based on their own experiences, and should not be construed as statements of fact, nor do they represent the views of Gartner or its affiliates. Gartner does not endorse any vendor, product or service depicted in this content nor makes any warranties, expressed or implied, with respect to this content, about its accuracy or completeness, including any warranties of merchantability or fitness for a particular purpose. This site is protected by hCaptcha and its [Privacy Policy](https://hcaptcha.com/privacy) and [Terms of Service](https://hcaptcha.com/terms) apply.

Thanks for your detailed suggestions txbob. I certainly don‚Äôt know what‚Äôs involved in eliminating the zero copy fallback just because non-NVLinked GPUs exist in the system. It just seems like it should be possible without changes to the HW when those links are not even requested enabled. I will investigate nccl and other solutions.

2018-08-02 00:00:00

benchmarking

I have run my code configured to use only 4 GPUs on the p3.16xlarge instance which runs very fast on the p3.8xlarge instance. The result is the same glacial performance as before.

2018-08-02 00:00:00

benchmarking, development

According to this: [CUDA Pro Tip: Control GPU Visibility with CUDA_VISIBLE_DEVICES | NVIDIA Technical Blog](https://devblogs.nvidia.com/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/) you are correct that CUDA_VISIBLE_DEVICES will enable me to run at full speed on 4 of the 8 GPUs. However, I have already verified that my code runs fast on 4 GPUs. Thanks for that suggestion. What I need is for NVidia/AWS to provide a solution that allows me to utilize UVM and Peer-to-Peer at full speed on an 8 GPU system. Any suggestion on how to get this fixed?

2018-08-02 00:00:00

benchmarking, development

I removed all the failed peer mapping requests, and all the cudaMallocManaged calls, but kernel execution time is still as slow as before. And when requesting peer mapping for GPU 0, it succeeds for GPUs 1-4, so does that mean there are 5 GPUs on that CPU node? I was able to map 1-3 (or 4) to GPU 0 and 5-7 to GPU 4.

2018-07-02 00:00:00

benchmarking

OK, thanks txbob. The kernel is running 900x slower. Apparently I just need to stop requesting unavailable peer mappings. But I will also stop using managed memory for inputs since that‚Äôs not even necessary. Seems odd that a request failure would cause a GPU wide default and the zero-copy memory is so drastically slower.

2018-07-02 00:00:00

memory_usage, benchmarking

I‚Äôve allocated all GPU global memory with cudaMallocManaged except for inter kernel global storage.

2018-07-02 00:00:00

memory_usage, benchmarking

Are you saying that it is not possible with any current system to enable Peer-to-Peer over NVLink between more than 4 GPUs? I find it odd that merely trying to enable and use UVM between GPUs would drastically slow down the kernel execution (according to NVProf) which is writing to GPU global memory on a single GPU.

2018-07-02 00:00:00

memory_usage, benchmarking

My code attempts to enable peer access by GPU 0 to the other 7 GPUs in the system. The first 4 pass cudaDeviceCanAccessPeer, but the last 3 fail. This causes the code to run much slower than it does on a 4 GPU instance.

2018-06-02 00:00:00

benchmarking, development

he repeated the test against the largest of the P3 instances, the p3.16xlarge.

19-03-2025

benchmarking

Gartner Peer Insights content consists of the opinions of individual end users based on their own experiences, and should not be construed as statements of fact, nor do they represent the views of Gartner or its affiliates. Gartner does not endorse any vendor, product or service depicted in this content nor makes any warranties, expressed or implied, with respect to this content, about its accuracy or completeness, including any warranties of merchantability or fitness for a particular purpose. This site is protected by hCaptcha and its [Privacy Policy](https://hcaptcha.com/privacy) and [Terms of Service](https://hcaptcha.com/terms) apply.

19-03-2025

he repeated the test against the largest of the P3 instances, the p3.16xlarge.

19-03-2025

benchmarking

he repeated the test against the largest of the P3 instances, the p3.16xlarge.

19-03-2025

benchmarking

he repeated the test against the largest of the P3 instances, the p3.16xlarge.

19-03-2025

benchmarking

If you need GPUs on your instances, p3 instances are a good choice. They are useful for video editing, and AWS also lists use cases of ‚Äúcomputational fluid dynamics, computational finance, seismic analysis, speech recognition, autonomous vehicles‚Äù ‚Äî so it‚Äôs fairly specialized.

They are useful for video editing, and AWS also lists use cases of ‚Äúcomputational fluid dynamics, computational finance, seismic analysis, speech recognition, autonomous vehicles‚Äù ‚Äî so it‚Äôs fairly specialized.

If you need GPUs on your instances, p3 instances are a good choice. They are useful for video editing, and AWS also lists use cases of ‚Äúcomputational fluid dynamics, computational finance, seismic analysis, speech recognition, autonomous vehicles‚Äù ‚Äî so it‚Äôs fairly specialized.

If you need GPUs on your instances, p3 instances are a good choice. They are useful for video editing, and AWS also lists use cases of ‚Äúcomputational fluid dynamics, computational finance, seismic analysis, speech recognition, autonomous vehicles‚Äù ‚Äî so it‚Äôs fairly specialized.

If you need GPUs on your instances, p3 instances are a good choice. They are useful for video editing, and AWS also lists use cases of ‚Äúcomputational fluid dynamics, computational finance, seismic analysis, speech recognition, autonomous vehicles‚Äù ‚Äî so it‚Äôs fairly specialized.

If you need GPUs on your instances, p3 instances are a good choice. They are useful for video editing, and AWS also lists use cases of ‚Äúcomputational fluid dynamics, computational finance, seismic analysis, speech recognition, autonomous vehicles‚Äù ‚Äî so it‚Äôs fairly specialized.

Similar Instances to

p3.16xlarge

Consider these:

c4.8xlarge

g5g.4xlarge

m5d.16xlarge

m5.2xlarge

Feedback

We value your input! If you have any feedback or suggestions about this t4g.nano instance information page, please let us know.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.