Breakbeat Technology

AWS EC2 T2 Instances Demystified: Don’t Learn The Hard Way

Summary

In my time as a freelancer I’ve come across a number of clients using T2 instances for their infrastructure requirements.

In my experience, these instances often seem to be chosen based largely on their low price compared to other instance types and are often poorly understood.

While T2 instances can offer great value, they come with a number of advantages and disadvantages that must be considered (and understood) before choosing them for your infrastructure.

Let’s examine what T2 instances are.

What Are T2 Instances?

EC2 T2 instances are CPU “burstable” virtual machine instances offered by AWS. This is opposed to the other types of instances which provide a fixed level of CPU performance.

These instances offer baseline CPU performance along with CPU Credits that can be used to “burst” above this baseline performance when required.

The baseline CPU performance, maximum amount of CPU credits that can be earned, and the rate at which these CPU credits are earned are all based on the size of the T2 instance in question.

In the following sections, we’ll use a t2.micro as an example of these important concepts

What is a CPU Credit?

A (single) CPU credit allows 1 vCPU to operate at 100% usage for 1 minute.

It’s important to remember this for some of the math in the following sections

When are CPU Credits Used?

Anytime your instance uses any amount of CPU, for any reason, CPU credits will be used.

Yes, this means even if the CPU usage is below the baseline performance rate, you will use CPU credits.

To understand this (Already confusing subject), let’s look at how AWS calculates the use and earning of CPU credits.

How is CPU Credit usage calculated?

Remember from above that a single CPU Credit allows 1 vCPU to run at 100% usage for one minute.

This means that 10% CPU usage (The baseline performance of a t2.micro instance) for 1 minute would use 1/10th (0.1) of a CPU Credit. For an hour that would be 6 credits (The amount of credits a micro earns each hour)

20% CPU usage for a minute would by 1/5th (0.2) of a CPU credit and so forth.

How are CPU credits accrued?

CPU Credits are ALWAYS earned if your instance is on but they are only accrued whenever your T2 instance is utilizing less than the baseline performance of your instance.

Continuing our t2.micro example, assuming your server uses no CPU for an hour (An unlikely scenario but this is an example), you would earn 6 CPU credits.

You could continue to earn these credits until you held 144, the maximum amount that can be held for the micro instance type.

The 6 CPU credits earned every hour allow the t2.micro instance to indefinitely maintain it’s baseline CPU performance.

0.1 (10% baseline CPU performance) *  60 (minutes in an hour) = 6 (The number of CPU credits earned in an hour)

Let’s finally look at a basic calculation of the usage and earning over a multi-hour period.

An Example Calculation

Let’s do a basic illustration that can show both the disadvantages and advantages of this instance type.

Note: This is not looking at other metrics of server performance. We are simply looking at how CPU credits are used and calculated

Assumptions:

  • The t2 micro instance has 138 CPU credits accrued.
  • The CPU usage will be consistent. This does not happen in reality but does help us illustrate the concepts easily.
  • You have a t2 instance hosting a really awesome website about cats.

Hour 1

On the first hour your micro instance has no traffic at all and uses 0% of the CPU for the entire hour (maybe the server was taking a nap?). Since no CPU time was used that means you’ll keep all 6 CPU credits you earned.

138 (CPU credits you already accrued) + 6 (The credits you earned over the hour) = 144 CPU Credits Accrued

You now have the maximum amount of CPU credits you can accrue for a t2.micro so even if you continue to have CPU usage below the baseline performance, you’ll no longer earn credits above this amount.

Hour 2

On the second hour your micro instance maintains 10% CPU usage (The baseline performance). Since it earns enough CPU credits to maintain this rate, no credits are burned from the 144 you already have and no credits are accrued (Because you used the same amount of credits that you earned and because you already have the maximum allowable amount).

Hour 3

On the third hour you just got featured on a popular site and you’re getting hammered with traffic causing 100% CPU usage for the entire hour (Yay! Traffic!).

1.0 (100% CPU usage) * 60 (number of minutes in the hour) = 60 credits used
144 (Accrued CPU credits) – 60 = 84 credits remaining

But wait, there’s more!

Since you always earn credits, you would actually have more than 84.

84 (Number of remaining CPU credits) + 6 (The amount you earn in an hour) = 90 credits remaining

Assuming there were no other issues with the server and that everything was fine, your website likely continued to work without issue.

Hour 4

After Hour 3 everyone has had their fill of your really awesome cat website and your server sits largely unused again (Who can get enough cat pictures?!). The instance uses 5% of the CPU for the entire hour.

0.05 (5% CPU usage) * 60 = 3 (Credits used that hour)
90 (Accrued credits) + 6 (The number of credits you accrue an hour) – 3 (The number of credits you used that hour) = 93 credits remaining

Since 5% CPU usage is less than the 10% baseline CPU performance, the server accrues some of CPU credits.

Hour 5

Apparently I was right and people really can’t get enough cat pictures. Your website experiences an increase in traffic causing 100% CPU usage for the entire hour.

1.0 (100% CPU usage) * 60 (Minutes in an hour) = 60 CPU credits used
93 (Accrued credits) – 60 (CPU credited used) + 6 (The CPU credits you earn in an hour) = 39 CPU credits remaining

As we can see above, our credits are really starting to dry up but thankfully the server is still holding up.

Better hope all that traffic dies down.

Hour 6 (uh oh!)

Unfortunately, people REALLY love those cat pictures and the traffic remains at the same level. Your instance continues to utilize 100% of it’s CPU for hour 6.

1.0 (100% CPU usage) * 60 (Minutes in an hour) = 60 CPU credits used
39 (Accrued credits) – 60 (CPU credited used) = OH NO! We don’t have any credits left!

About 40 minutes into Hour 6 your instance no longer has any credits remaining. At this point, you’re now limited to the baseline performance of the instance, in this case, 10% of the vCPU.

At this point your application grinds to a halt but since it’s under such heavy load, the instance is not able to earn/accrue as fast as it can use them, resulting in almost complete downtime until the server traffic lowers (Since your website is no longer working properly, that’s going to be pretty quickly).

You will likely find it is difficult or impossible to access the server to take any measures to solve the issue until the CPU credits have accrued.

So why is all this important?

The examples above are contrived but it does illustrate the danger of not properly architecting your AWS system and monitoring it once it’s in place (especially if you’re using T2 instances).
I often see this issue with a website that receives increases in the baseline traffic over long periods of time with occasional spikes. If the instance type is not changed and sized appropriately, this eventually leads to all the accrued CPU credits being used paired with downtime and a significant amount of head scratching.

A Note On T2 Unlimited

Amazon (somewhat) recently added a new feature to T2 instances called T2 Unlimited.

T2 Unlimited instances can burst over baseline performance as long as required at an additional cost.

I’ll likely be writing an article explaining T2 Unlimited concepts in the near future so stay tuned!

Thanks go to rosege over at Hackernews for the suggestion!

Closing

Hopefully you walk away having a slightly better understanding of T2 instances and how using them can effect your infrastructure if they are not properly managed.

If you have any questions or comments, please don’t hesitate to mention below. Thanks for reading!

Looking for someone to help architect, manage, or automate your infrastructure?

You can easily schedule a consultation below or by clicking the blue button at the bottom right.

Want to understand how I can help? Take a look at my article here.

Till next time.


6 comments

  1. Dan Farrell
    April 26, 2018 at 9:23 am

    Good coverage of T2 CPU characteristics. I encourage readers to carefully analyze their disk and especially network performance requirements as well, before choosing t2 instances. Network performance in particular is an important factor to consider!

    1. Robert Tisdale
      April 26, 2018 at 1:10 pm

      I’m glad you enjoyed the article Dan. Excellent advice regarding checking other performance characteristics. Disk IO in particular can be a massive bottleneck for certain applications.

  2. Supun Budhajeewa
    April 27, 2018 at 5:19 am

    Any idea whether is this the same behavior in comparable computing instances in GCP as well?

    1. Robert Tisdale
      April 27, 2018 at 12:03 pm

      Howdy Supan! I can’t say with absolute certainty but while they may be similar, it’s likely they are handled in a different manner.

      I would suggest consulting with the relevant GCP documentation to find a better answer.

  3. Al Chou
    April 27, 2018 at 9:17 am

    I’m confused by the Hour 3 assertion that 6 CPU credits would be added to the instance’s balance, given “CPU Credits are ALWAYS earned if your instance is on but they are only accrued whenever your T2 instance is utilizing less than the baseline performance of your instance.”

    1. Robert Tisdale
      April 27, 2018 at 12:11 pm

      Howdy Al! I’ve had a few questions on this on Reddit as well.

      The math for hour 3 works out in this way:

      You already have 144 credits accrued.

      In the math we used some of our existing 144 credits to pay down the above baseline usage.

      During this hour of time, we would still earn our hourly credits but would end up with less than we originally started with since we used far more than the baseline performance of the instance.

      Hopefully this helps explains the concept in better detail :).

Comments are closed.