Four Scalability Obstacles on AWS - First Cloud Consulting

Holes in AWS Cloud Scalability FeaturesWith Amazon Web Services’ announcement yesterday that CloudFront now supports POST, PUT, DELETE, OPTIONS, and PATCH requests, and a GIGAOM post, “Five features Amazon Web Services must fix”, I was prompted to complete an idea I had noted a couple of months back. Especially during the transition from traditional local or hosted data center infrastructures to the cloud I find that, especially larger companies, are slow to re-architect and rebuild their applications for the cloud. As such it is not yet possible to take advantage of many of the advantages and benefits the cloud provides without introducing single points of failure into your cloud architecture. Often these features are risks in existing infrastructure architectures but we accept them with a perceived low level of risk. In the cloud that is certainly a misconception and as we know, instances in the cloud are volatile and it is best to architect your infrastructure with the premise that these instances will fail at one time or another – so the risk is higher in the cloud.

This post is my attempt to identify four specific issues which I believe should be high priorities for AWS developers in order to address some of these scalability issues. One could argue whether or not they directly relate to platform scalability but I believe as you scale your infrastructure these issues become more relevant and so I have categorized them as such:

1. Global Load Balancing – The current Elastic Load Balancer (ELB) implementation allows one to assign multiple instances within a single region to an ELB – even distributed across multiple zones. The risk here is that each of these zones is really just a separate cluster, and a region resides within a single data center. So, if a hurricane takes out the US-East data center in Virginia then your applications are down and restoration into another region will be much more complex without great foresight in your Disaster Recovery planning.

Realistically you should plan for some sort of fail over, although perhaps more of a manual process in this case, and could leverage an asynchronous data transfer to keep data across regions semi-intact. In this case however we are looking at off-site redundancy in an extreme (yet very plausible) scenario versus actual global load balancing or scalability issues.

The fact is, your ELB is likely an EC2 instance (my supposition) and will physically reside in a single region (data center) anyway but I would like to see a service that implements a “global ELB” that would consist of  two or more ELBs in multiple regions but would represent itself to the end user as a single device with a single endpoint. The handshaking and fail over behind the scenes would be exactly that – behind the scenes and opaque to the end user.

2. Multiple SSL Certificates – I have a number of complaints regarding the SSL implementation and capabilities (or lack thereof) with EC2 but specifically, regarding multiple SSL certificates. I manage websites for example that require multiple SSL certificates on a single EIP. Sometimes these are not known up front (meaning we add front-ends that utilize the same back-end application in the case of a web-based Content Management System) and have to add these SSL certificates to the server when we do so. As the domains vary each virtual host cannot listen on the same port 443 as one another (these are common problems and not limited to the cloud). We work around this by configuring the virtual host to listen on a separate port, creating a new ELB and attaching the EC2 instance to that ELB which listens on port 443 and then redirects the request to the alternate port (maybe 8443 for example). As a side note, you can indeed attach an EC2 instance to multiple ELBs via the Command Line Interface (CLI) – something that is not exposed within the console. At any rate this works very well but it is tedious and could certainly be addressed automatically with a little bit of scripting, and exposed via the AWS Console.

3. Shared EBS Volumes – This was also noted as the number one feature that “Amazon Web Services must fix” in the previously mentioned post, Five features Amazon Web Services must fix so I won’t elaborate too much. Currently, the only work around that I am familiar with is to implement a single EC2 instance with NFS which creates a major performance bottleneck as well since it kills your IOPS as you add NFS clients. And, what about when your data exceeds 1 TB (the upper maximum size limit of an EBS volume) forcing you to combine multiple EBS volumes using XFS and LVM? S3 block storage would be a good alternative but it lacks the performance of EBS and currently we do not have the ability to create a filesystem on top of S3 (as you might with a SAN) so it doesn’t work as a mount point for most configurations (noting that there is an Open Source project, s3fs, which does this to a limited extent and while useful has its own limitations which I will not delve into in this post).

4. Multiple sub-domain support for statically hosted websites on S3 buckets – This line item comes specifically from my experience with Django web applications. Current back-end storage support (using s3boto or something similar) allow for configuration of a single S3 bucket but not multiple buckets. I prefer a decentralized approach whereby various static assets (images, video, CSS, JavaScript) are configured on separate sub-domains so that they can be isolated and relocated to other services if the need arises. Furthermore you might have a mix of sub-domains within those resources depending on their use. In order to implement this properly you would require multiple sub-domains to point to a single S3 bucket but the current implementation requires that the bucket name must exactly match the full host name for the DNS record pointing to it when configured as a static web host.

Once again, we work around this by leveraging a single subdomain for the S3 bucket (e.g. static.mydomain.com) and then implement additional “dummy” S3 buckets like videos.mydomain.com and images.mydomain.com that contain 0-byte files with redirects (implemented in the object meta data) to the primary bucket. This allows our application to work with the existing tools directly with a single S3 bucket while still referencing our files via multiple sub-domains which could be re-pointed in the future.

Feedback

Feel free to leave comments or create a discussion around this one. I normally spend more time writing my posts but I wanted to get this one out and I would love to hear some creative solutions that others have come up with surrounding these issues, or suggestions as to issues which I have overlooked or perhaps just not encountered personally!