Cloud Mercato tested CPU performance using a range of encryption speed tests:
Cloud Mercato's tested the I/O performance of this instance using a 100GB General Purpose SSD. Below are the results:
I/O rate testing is conducted with local and block storages attached to the instance. Cloud Mercato uses the well-known open-source tool FIO. To express IOPS the following parametersare used: 4K block, random access, no filesystem (except for write access with root volume and avoidance of cache and buffer.
.png)


Cost Explorer takes [up to] 24 hours to set up, so it's not a good answer to support questions about billing.

AWS is incredibly complex. Are you complaining that their billing can get complex?

The local NVMe storage for i3.metal is the same as i3.16xlarge. There are 8 NVMe PCI devices. For i3.16xlarge those PCI devices are assigned to the instance running under the Xen hypervisor. When running i3.metal, there simply isn't a hypervisor and the PCI devices are accessed directly.- There is no hot swap for the NVMe storage.- The 8 NVMe devices are discrete, there is no hardware RAID controller- Anyone can get I/O performance stats on i3.16xlarge as a baseline. Intel VT-d can introduce some overhead from the handling (and caching) of DMA remapping requests in the IOMMU and interrupt delivery so I/O performance may be a bit higher on i3.metal, with a few microseconds lower latency.

For all this progress the billing on AWS is so damn confusing to figure out if some machine is left on unused that I won’t use AWS again. GCE and Azure miles ahead here.

Is this what khuey is referring to?:

Thanks.I guess some other open questions:- If one of those drives fails, will Amazon hotswap them out, or do you need to migrate to a new instance (moving TBs of data to a new box without causing outages can be painful.)- Is there a hardware RAID controller for those drives, or is it software only?- Can anyone with access to one of these boxes produce some IO performance stats on them? Bonus points for stats on single drive vs concurrent across all drives (i.e is there any throttling). More points for RAID10 performance across the whole 8.

It's exactly the same as with the i3.16xlarge instance type. There are eight 1900 GB drives. In an i3.16xlarge, those eight drives are passed through to the instance with PCIe passthrough but for the i3.metal instance, you avoid going through a hypervisor and IOMMU and have direct access.

Storage – 15.2 terabytes of local, SSD-based NVMe storage.That's probably the most interesting aspect for me.Does anyone know how that's provisioned? i.e 8x just under 2TB volumes, or something else?

Im assuming rr is only unavailable for multithreaded apps? How frequently is rr available for your use?

rr works fine on multithreaded (and multiprocess) applications. It does emulate a single core machine though, so depending on your workload and how much parallelism your application actually has it might be painful.

I have two use cases:- General performance analysis. For this more counters is generally incrementally better.- Running . This requires the retired-branch-counter to be available (and accurate - sometimes virtualization messes that up)The second one I actually care more about, because I've pretty much stopped trying to debug software when rr is not available, too painful ;). Feel free to email me (email is in my profile) for gory details.

Seconding paulie_a, We're running a Xen stack right now and I haven't heard of this. We've worked around a few nasty bugs with Xen and linux doms already, but I'm wondering if we have this problem you're referring to and don't even know it.

For the benefit of anyone reading this, KVM and VMWare virtualization generally work. Xen has problems because of a stupid Xen workaround for a stupid Intel hardware bug from a decade ago. I can provide more details about that via email (in my profile) if desired.

Can you please just post the info. Intel deserves to be shamed

One of the things the performance monitoring unit (PMU) is capable of doing is triggering an interrupt (the PMI) when a counter overflows. When combined with the ability to write to the counters, this lets you program the PMU to interrupt after a certain number of counted events. Nehalem supposedly had a bug where the PMI fires not on overflow but instead whenever the counter is zero. Xen added a workaround to set the value to 1 whenever it would instead be 0. Later this was observed on microarchitectures other than Nehalem and Xen broadened the workaround to run on every x86 CPU. Intel never provided any help in narrowing it down and there don't seem to be official errata for this behavior too.This behavior is ok for statistically profiling frequent events but if you depend on _exact_ counts (as rr does) or are profiling infrequent events it can mess up your day.[https://lists.xen.org/archives/html/xen-devel/2017-07/msg022...](https://lists.xen.org/archives/html/xen-devel/2017-07/msg02242.html) goes a little deeper and has citations.

I have two use cases:- General performance analysis. For this more counters is generally incrementally better.- Running . This requires the retired-branch-counter to be available (and accurate - sometimes virtualization messes that up)The second one I actually care more about, because I've pretty much stopped trying to debug software when rr is not available, too painful ;). Feel free to email me (email is in my profile) for gory details.

Hi Tibor, I forgot to mention, another useful thing to test out would be to stop and start the EC2 instance, this should move the instance to a less busy host, just a trick that usually works.

I think many people, but not you obviously, have a naive view of AWS hardware provisioning. They think that if they suddenly deploy 50 EC2 instances, that Amazon will rush out and buy more servers and have them installed in order to meet instant demand. Of course, when you think about it, Amazon rely on having fairly high levels of usage in order to maximise profits, and having idle hardware lying around just in case someone wants it is a way to lose money, hence the spot pricing scheme to make at least some money from idle hardware, they can get their hardware back for exclusive use at any time by simply bidding the price up!

Hi Tibor, I forgot to mention, another useful thing to test out would be to stop and start the EC2 instance, this should move the instance to a less busy host, just a trick that usually works.

I think many people, but not you obviously, have a naive view of AWS hardware provisioning. They think that if they suddenly deploy 50 EC2 instances, that Amazon will rush out and buy more servers and have them installed in order to meet instant demand. Of course, when you think about it, Amazon rely on having fairly high levels of usage in order to maximise profits, and having idle hardware lying around just in case someone wants it is a way to lose money, hence the spot pricing scheme to make at least some money from idle hardware, they can get their hardware back for exclusive use at any time by simply bidding the price up!

Hi Tibor, I forgot to mention, another useful thing to test out would be to stop and start the EC2 instance, this should move the instance to a less busy host, just a trick that usually works.

The g instance type uses Graphics Processing Units (GPUs) to accelerate graphics-intensive workloads, and also designed to accelerate machine learning inference.

This could include adding metadata to an image, automated speech recognition, and language translation, as well as graphics workstations, video transcoding, and game streaming in the cloud.

The g instance type uses Graphics Processing Units (GPUs) to accelerate graphics-intensive workloads, and also designed to accelerate machine learning inference. This could include adding metadata to an image, automated speech recognition, and language translation, as well as graphics workstations, video transcoding, and game streaming in the cloud.

The g instance type uses Graphics Processing Units (GPUs) to accelerate graphics-intensive workloads, and also designed to accelerate machine learning inference. This could include adding metadata to an image, automated speech recognition, and language translation, as well as graphics workstations, video transcoding, and game streaming in the cloud.

The g instance type uses Graphics Processing Units (GPUs) to accelerate graphics-intensive workloads, and also designed to accelerate machine learning inference. This could include adding metadata to an image, automated speech recognition, and language translation, as well as graphics workstations, video transcoding, and game streaming in the cloud.

The g instance type uses Graphics Processing Units (GPUs) to accelerate graphics-intensive workloads, and also designed to accelerate machine learning inference. This could include adding metadata to an image, automated speech recognition, and language translation, as well as graphics workstations, video transcoding, and game streaming in the cloud.

g4ad.xlarge with GPU RAM of 8 GiB has an on demand cost of $3315.9228 annually which appears to be the cheapest GPU VM option provided by AWS from my review. Is this correct or is there a cheaper option?