Scale your application globally - Introduction (Part 1 of Series)

25 May 2019 ·  10 min read

This is part one of a three-part series that shows you how to use Azure to distribute your application globally and ensure that your users get the best experience.

In part 1 of this series, we talk about how to scale an application and look at a demonstration application.

In part 2 of this series, we look at using Azure CDN to distribute static content.

In part 3 of this series, we look at using Azure Front Door to globally distribute an application back end hosted in dual Azure Regions.

If your application has users located around the globe, you need to take extra architectural steps to maintain great performance, whilst presenting a consistent and seamless presence for all.

In this article, we’re going to discuss some of the issues that you need to consider.

We’ll also introduce you to two key Azure services that may help you to address those global distribution challenges:-

  • Azure CDN - for distributing static content
  • Azure Front Door - for distributing and accelerating all aspects of your application/service.

To demonstrate these services in action, we’re going to create a small demo application that comprises:-

We’ll then:-

  • configure services that scale the application globally
  • show one way to create a test that demonstrate how the system reacts under a simulated load.

Local Scaling

Before we dive into global-scale considerations, let’s just review how our computing resources can scale at a single location - a single datacentre or “Azure Region”.

  • One requirement is that we need to be able to minimize our hosting commitments, in order to keep costs down for those times when we only have a small number of users.
  • We also need to be able to quickly and dynamically increase our computing resources in order to serve significantly more users, should demand change.

Smartphone users, in particular, don’t adhere to predictable hours of access. When combined with unpredictable and spiky consumption patterns that often stem from social media trends and other viral behaviours, resource demands could potentially shift dramatically in a very short period of time.

  • Vertical scaling - Back in the early days of the internet, the provisioning of more powerful servers used to be our first port- of-call to address the issue of increased computing demand. It is still something we can do, but in a majority of cases, this shouldn’t be seen as your primary solution. This is because:-
    • It cannot be instantly and seamlessly provisioned. It potentially requires a period of downtime as we move from one server to a more powerful one. Solutions such as the Azure WebApp Service have improved matters by making this relatively quick and easy, but there is still a change-over.
    • Vertical scaling has diminishing returns - an increasingly more powerful server does not yield proportional returns in performance, relative to its cost.
    • The scope of improvement is ultimately finite if you are using a single instance. It doesn’t matter how powerful a single machine can be, the internet can and will overwhelm it.
    • Vertical scaling should only really be considered if your service is of some specialist type or resource, that can’t distribute across multiple systems (e.g. an API that processes an extremely complicated or large set of data, such as a simulation etc, which absolutely requires access to a large pool of CPU and memory resources). Even then, it’s likely that you will require a pool of machines to service a large audience.

  • Horizontal Scaling - For a vast majority of scenarios the best solution is to provision multiple separate servers, so as to distribute processing in a parallel, load-balanced, configuration.
    • This solution offers both the greatest outright performance potential, along with the best “bang for buck”.
    • Typically, by using a greater number of lower-specification, and therefore relatively less expensive servers, costs can usually be kept even further down (when compared to using fewer, but higher-specification servers).
    • When combined with elastic/automatic scaling, additional servers can be brought online automatically, as demand dictates. When server load is reduced, we can automatically de-provision the extra servers, allowing money to be saved.
    • Options such as Azure WebApp Service, a PaaS (Platform as a Service), easily provide you with these options.

  • Serverless architectures are an evolution of horizontal scaling.
    • Instead of invoicing resources in terms of “instances of a server”, when choosing the “consumption” pricing model, computing resources are provisioned completely dynamically, based upon a calculation of the actual time and resources used.
    • This architecture can be incredibly appealing as it offers the potential to make significant cost-savings, as we are no longer paying for dormant or underutilised servers.
    • Just like elastic-scaling, when demand rises, the cloud provider can keep allocating additional computing resource as needed.

n.b. You have the option to provision Azure Functions on a “traditional” service plan if your circumstances require this.

Global distribution

We’ve just talked about how, with modern cloud services, we can easily scale a system, in one location, to meet demand. But how does this work with respect to serving to a global audience?

As the physical distance between the data centre and your user(s) increases, there is an unavoidable rise in “latency” (not to be confused with “bandwidth”).

It doesn’t matter if you have unlimited computing resources in one location, basic physics (the speed of light for signalling over very long distances) and the limitations of our technology (the fact that traffic needs to route across multiple networks to reach its destination) will combine to slow down performance.

To give context to this discussion, I conducted a crude experiment to explore the impact upon latency as distance increased. I ran dozens of pings to a selection of different DNS servers around the world. DNS servers were chosen, as they should be inherently fast with low latency.

  • I’m based in the UK, so for my first test, I found a list of servers in London. My average ping across the sample was ~10ms.

  • Pinging a selection of servers in New York City, I averaged ~75ms.

  • To demonstrate communicating with somewhere on the opposite side of the globe, I then repeated the process down under, in Sydney, averaging a ping of ~260ms.

This shouldn’t be considered as an authoritative result. Network performance is a transient thing and this was a brief one-off test. For example, I may have been crossing a part of the internet that had a largely sleeping population at the time!.

Even so, this quick experiment does indicate that a global round-trip could incur an increase in latency in the region of 26x!

By this point, there’s a good chance that you’ve already guessed that part of the solution is to bring your application servers physically closer to your users.

You can address this by hosting multiple instances of your service in different, geographically distributed, data centres. Cloud providers have amazing global coverage and make it easy for you to do this.

So the proposition becomes:-

  • Q:   “how do we set up multiple geographic regions, with users automatically connecting to the fastest host, whilst still presenting a common set of endpoints?”

  • A:   we require a service that offers “Geographic Load Balancing”

… and this is where Azure Front Door comes in.

optimus prime

What’s our demonstration going to be?

For our demonstration application, we’re going to:

  • create a basic web client, using static content.
  • use Azure Functions to create an HTTP-triggered endpoint.

From my exhaustive research, I have discovered that there is an international shortage of APIs that calculate prime numbers, so we’re going to create an Azure Function that fills this shortfall.

Ok, I may have completely made up the requirement in that last sentence.

The real reason we’re going to be calculating prime numbers, is because it creates a computational load and can typically take a few moments to complete.

Later in this article, we’ll be looking at one way to visually demonstrate how Azure Functions scale with demand, so we want to do something that will make the platform work a little bit harder!

Create an Azure Function to calculate prime numbers

This article is not intended to be an introduction or tutorial on how to use Azure Functions. If you are not already familiar with this subject (either how to write the code, or how to create and publish them to Azure), please refer to my previous series Chat using Twilio WhatsApp API, SignalR Service & Azure Functions . You should also read the Microsoft Documentation: An introduction to Azure Functions.

In the demonstration source code, we have a Visual Studio 2019 project called GlobalScalingDemo.Function and specifically we are interested in the code for the Function itself which is ....\GlobalScalingDemo\src\GlobalScalingDemo.Function\CalculatePrimeNumberFunction.cs:

The complete source code for this file looks like this:-

using System;
using System.Threading.Tasks;

using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.AspNetCore.Http;
using System.Linq;

namespace GlobalScalingDemo.Function
    public class CalculatePrimeNumberFunction
        public async Task<IActionResult> Run(
            [HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = null)] HttpRequest req)
            int primeNumberRangeSize = 500000;
            long result = CountPrimeNumbers(primeNumberRangeSize);

            string currentHostingLocation = Environment.GetEnvironmentVariable("CurrentHostingLocation");

            return new OkObjectResult($"Hello, there are {result} prime numbers in the range 0 - {primeNumberRangeSize}.   This result was calculated in {currentHostingLocation}");

        public long CountPrimeNumbers(int rangeSize)
            return ParallelEnumerable.Range(1, rangeSize)
                .Count(n => Enumerable
                    .Range(2, (int)Math.Sqrt(n) - 1)
                    .All(i => n % i > 0)

In the above code, I would bring your attention to a couple of things:-

  • The variable primeNumberRangeSize is used to define a range of numbers to check for prime numbers. This effectively controls the amount of processing work that the Function needs to undertake.
    • If you want to increase the amount of time the Function needs to run for, put a larger number in here.
  • The variable currentHostingLocation is used to obtain a label declaring the location that the code is running from (e.g. “Local”, “London”, or “California”).
    • This is added to the textual output of the Function.
    • Later in the article, when we come to test the system, this will help us to easily recognise from which data centre the function being executed.
    • This configuration information will need to be provided either locally in the local.settings.json or defined in Azure for each instance of the Function App.
  • Don’t worry too much about the sample code itself. The point of this exercise is to create an arbitrary and artificial CPU load, so slow code is fine!

Create a basic web client

Our web client is intended to be as simple as possible, so comprises:-

  • a single html page : ...\GlobalScalingDemo\src\GlobalScalingDemo.Web\wwwroot\index.html
  • a simple JavaScript file : ...\GlobalScalingDemo\src\GlobalScalingDemo.Web\wwwroot\js\client.js
  • minor cosmetic components, such as a .css file and use of Bootstrap

The purpose of the client app is simply to trigger an HTTP request to our backend, when the page is first loaded.

When a response is received, a default message that reads Waiting for server... is replaced with the message returned from the back end.

The JavaScript client code looks like this:-

const serviceEndpoint = 'http://localhost:7071/api/';   // include trailing slash
//const serviceEndpoint = 'https://{your function app hostname}';
const serverApiMethod = "CalculatePrimeNumber"; // do not include any slashes

window.onload = function () {

    var messageElement = document.getElementById("messageElement");

    fetch(serviceEndpoint + serverApiMethod)
         .then(response => response.text())
         .then(text => {
            messageElement.innerHTML = text;
         }).catch(err => {
             messageElement.innerHTML = "Failure connecting to server";

Next, in part two, we’ll be looking at how to use Azure CDN to globally distribute static content.

NEXT: Read part 2


2019 (21)