Woodridge Tech Talk

Our ongoing series of bi-weekly technical presentations

Implementing High Scalability

High Availability Mobile Applications with Amazon Web Services

Over the past few years, as AWS has become the defacto standard for Woodridge Software’s back end systems, we have deployed dozens of mobile applications. Most applications are mission-critical to our customers, however, the total number of users varies widely. Some of our apps have maybe 10 or so users, others have 100,000+ per day. In this blog, we’ll discuss some of the similarities in the deployments of small and large scale applications and the strategies we use to help scale to our larger implementations.

Goals for Implementing Large Scale Mobile and Web Apps

Database Scalability

Database scalability is the ability to consistently handle the growth of data that an application will house during its lifetime. You can host any relational database on Amazon such as mySQL or PostGres or use noSQL databases such as Mongo. Amazon also has its own versions of both, Aurora is its relational database and DynamoDB is its version of noSQL. AWS has implemented autoscaling services for most relational databases using service they call RDS(Relational Database Service) these help you scale your database, however, we have found that using Aurora often is the easiest to implement. The same is true for DynamoDB where AWS guarantees throughput and single-digit millisecond latency. The choice of using a noSQL or SQL database is of course chosen based on the data being stored for the mobile app.

Minimize System Bottlenecks

System bottlenecks occur when several resources all have to rely on one resource. If that one resource becomes overwhelmed with requests then it can cause a severe decrease in system performance or even a loss of availability.

Fault Tolerance and Availability

Fault-tolerance allows a system to keep operating at normal (or close to normal) behavior in the case of a failure. Hardware failure, power loss, and physical damage caused by natural disasters are some of the most common non-software reasons for system failure. In order for a system to remain fault-tolerant, it must be able to continue functioning with a complete loss of a resource.

Ease of Maintenance

Ease of maintenance is often lost when providing a highly-scalable and reliable system due to an increase in number machines, resources that require specific attention, a more complicated codebase, and possibly having to build and maintain dedicated server farms.

Cost

Cost is often hard to balance when trying to provide a reliable web system because it is difficult to predict how many resources are needed, and you often end up paying for far more than is actually ideal.

Approaches

  1. Single machine dedicated to the entire system

    Benefits

    • Cost

    Drawbacks

    • The entire system is a system bottleneck and single point of failure
    • System maintenance can cause a temporary loss of availability
    • Back-ups kept on this machine are useless if there is a hardware failure that cannot be recovered from
    • Backups stored on separate machines must be kept safe in order to guarantee no loss of data (and should preferably be kept in a separate location to mitigate the risk of physical damage and complete loss of data)

    Recommended Usage

    • Small-scale applications that can afford to experience system downtime
  2. Machines dedicated to single tasks (application server, database server, etc)

    Benefits

    • Database-only operations won’t impact application-only operations (and vice versa)

    Drawbacks

    • Each machine now becomes a bottleneck, and loss of one machine could mean a complete loss of availability
    • If all machines are in one location, a power outage or physical damage can lead to a complete loss of availability or even a complete loss of the system
    • Maintenance to a single machine could lead to a loss of availability

    Recommended Usage

    • Small to medium-scale applications that utilize database-only or application-only operations that can afford occasional system downtime.

  3. Groups of machines dedicated to single tasks

    Benefits

    • Database-only operations won’t impact application-only operations (and vice versa)

    Drawbacks

    • Harder to maintain and higher cost
    • If the machines are in the same location then there is still a possibility of a loss of availability during a natural disaster or power loss. A natural disaster could also cause a loss of the entire system. This is mitigated by having servers located in multiple data centers in the US and/or globally
    • If database sharding is performed (split the database into unrelated chunks that live on different machines), backups become harder to maintain, and deciding how to divide the database properly is more challenging. Also, there needs to be backup machines in place just in case one machine fails. Adding new machines as the data grows is more cumbersome to implement.
    • Possible increase in the codebase complexity
    • While the drawbacks exist this method is necessary for high availability/high scalability deployments. Fortunately, there are systems and processes in place to assist with managing this method

    Recommended Usage

    • Large-scale systems that require high-availability and maximum fault-tolerance

Best Practices

Best practices for large-scale cloud applications utilize many AWS (Amazon Web Services), Google, and Apple utilities to provide the ideal web system.

  • Amazon EC2 – Resizable machines most commonly used for web systems and applications that can be placed into multiple availability zones to guarantee high-availability and fault-tolerance even in the extreme case of an entire Amazon availability zone going down.
  • Amazon Elastic Load Balancer – Automatically distributes network traffic to healthy/available EC2 instances. This service provides automatic scaling to accommodate the amount of network traffic and prevent system bottlenecks.
  • Amazon Auto Scaling – Can be used to automatically increase/decrease the amount of EC2 servers in the system to prevent system bottlenecks and maintain high availability and fault-tolerance. This also allows ease of maintenance because single machines can be added, removed, or upgraded without any loss of availability.
  • Amazon S3 – File storage that includes automatic scaling and automatic backups with replicated data across multiple availability zones to ensure fault-tolerance and high-availability.
  • Amazon Aurora – A high-performance database engine that provides automatic scaling and backups with replicated data across multiple availability zones to guarantee fault-tolerance and high availability. This is a great choice if your mobile app is best suited to a relational database
  • DynamoDB – However, if a noSQL backend is a better option, Amazon’s version called DynamoDB is a great choice. With single digit latency, it’s a good option for mobile, gaming, and IoT applications.
  • Google and Apple – Mobile applications are distributed directly through Google and Apple to provide high-availability and ease of distributing the application to millions of users.

Utilizing these services and utilities also has other benefits including additional security mechanisms and no need to maintain physical machines. Amazon, Apple, and Google’s facilities are all safe and secure with top of the line disaster-recovery mechanisms in place including physical security, power backups, etc. The automatic scaling provided by Amazon also ensures that the system only ever includes necessary resources, which keeps costs to a minimum.

Closing Remarks

This blog is just a high-level overview of the systems we put in place, but hopefully, it takes some of the mystery out of what Amazon is delivering. Those of us in the software development industry know well how Amazon is helping make our lives easier. Many of these services we used to have to set up manually, with countless cron jobs and monitoring. In the custom software world we work in, this used to mean more cost for our customers and lower reliability. By leveraging these services we can reduce our costs and keep our mobile apps humming along. Now if only Amazon had a way to update mobile code for each new iOS & Android release, that would be something… Then again we don’t want our business to be too easy.