The “How” of Cloud Native: Architecture and Design Perspective

Kyle Gene Brown
The Startup
Published in
10 min readDec 14, 2020

--

By Kyle Brown and Kim Clark

Note: This is part 3 of a multipart series. For the first article in the series, start here, or jump to Part 2, Part 4, Part 5.

In our previous article in this series we discussed how a move to a cloud native approach might affect how you organize your people and streamline your processes. In this post we will drill down on how it relates to architecture and design principles.

Ingredients of cloud native — Architecture and Design

It is the architectural approach that brings the technology to life. It is possible to deploy traditional, siloed, stateful, course-grained application components onto a modern container-based cloud infrastructure. For some, that’s a way to start getting their feet wet with cloud, but it should only be a start. If you do so, you will experience hardly any of the advantages of cloud native. In this section we will consider how to design an application such that it has the opportunity to fully leverage the underlying cloud infrastructure. It should quickly become apparent how well-decoupled components, rolled out using immutable deployments, is just as essential as embracing the agile methods and processes discussed already. We show the pieces of cloud native architecture below:

Components of Cloud Native Architecture

Fine-grained components

Until relatively recently, it was necessary to build and run software in large blocks of code in order to use hardware and software resources efficiently. More recent developments in technology, such as containers, have made it realistic to break up applications into smaller pieces and run them individually. There are a few different aspects to what we mean by fine-grained:

  • Function driven granularity — each component performs a single well defined task
  • Self-contained components — the component includes all of its dependencies where possible
  • Independent lifecycles, scaling and resilience — The component is built from a single code repository, through a dedicated pipeline, and managed independently at runtime.

When building applications in this way this is typically known as microservices, although it should be noted that a true “microservices approach” is much broader than just fine-grained components, and indeed overlaps significantly with the concepts of cloud-native described here.

The core benefits of more fine-grained components are:

· Greater agility: They are small enough to be completely understood in isolation and changed independently.

· Elastic scalability: Each component can be scaled individually maximizing the efficiencies of cloud native infrastructure.

· Discrete resilience: With suitable decoupling, instabilities in one microservice do not affect others at run time.

While what is above can provide dramatic benefits in the right circumstances, designing highly distributed systems is non-trivial, and managing them even more so. Sizing your microservice components is a deep topic in itself, and then there are further design decisions around just how decoupled they should be, and how you manage versioning of the coupling that remains. Spotting necessary cohesion is just as important as introducing appropriate decoupling, and it is common to encounter projects that have gone too fine grained and have had to pull back. In short, your microservices application is only as agile and scalable as your design is good and your methods and processes are mature.

Note: Microservices architecture is often inappropriately compared to service-oriented architecture (SOA) because they share words in common and seem to be in the same conceptual space. However, they relate to different scopes. Microservices is about application architecture, and SOA is about enterprise architecture. This distinction is critical, and is explored further in “Microservices versus SOA: How to start an argument”.

Appropriate decoupling

Many of the benefits of fine grained components (agility, scalability, resilience) are lost if they are not decoupled from one another. They need to have:

  • Clear ownership boundaries
  • Formalized interfaces (API and event/message)
  • Independent persistence

Writing modular software is hardly new. All design methodologies from functional decomposition through object oriented programming to service oriented architecture, have aimed to break up large problems into smaller, more manageable pieces. The opportunity in the cloud native space is that by taking advantage of technologies such as containers we have the opportunity to run each as a truly independent component. Each component has its own CPU, memory, file storage, and network connections as if it were a full operating system. It is therefore only accessible over the network and this alone creates a very clear and enforceable separation between components. However, the decoupling provided by the underlying platform is only part of the story.

From an organizational point of view, ownership needs to be clear. Each component needs be completely owned by a single team who has control over it’s implementation. That’s not to say that teams shouldn’t accept requests for change, and indeed pull requests from other teams, but they have control over what and when to merge. This is key to agility since it ensures the team can feel confident in making and deploying changes within their component so long as they respect their interfaces with others. Of course, even then, teams should work within architectural boundaries set by the organization as a whole, but they should have considerable freedom within those boundaries.

Components should explicitly declare how you can interface with them and all other access should be locked down. They should only use mature, standard protocols. Synchronous communication is the simplest and HTTP’s ubiquity makes it an obvious choice. More specifically, we typically see RESTful APIs using JSON payloads, although other protocols such as gRPC can be used for specific requirements.

It is important to differentiate between calls across components within the same ownership boundary (e.g., application boundary) and calls to components in another ownership boundary. This is important but beyond the scope of this article. See this post for more information.

However, the synchronous nature of HTTP APIs binds the caller to the availability and performance of the downstream component. Asynchronous communication through events and messages, using a “store and forward” or “publish/subscribe” pattern, can more completely decouple themselves from other components.

A common asynchronous pattern is that owners of data publish events about changes to their data (creates, updates, and deletes). Other components that need that data listen to the event stream and build their own local datastore so that when they need the data, they have a copy. This process is related to other event driven architecture patterns such as event sourcing and CQRS.

Although asynchronous patterns can improve availability and performance, they do have an inevitable downside: They result in various forms of eventual consistency, which can make design, implementation, and even problem diagnosis more complex. Use of event-based and message based communication should therefore be suitably qualified.

Minimal state

Clear separation of state within the components of a cloud native solution is critical. There are three key common topics that come up:

  • Uncomplicated horizontal scaling
  • No caller or session affinity
  • No two phase commits across components

Statelessness enables the orchestrating platform to manage the components in an optimal way, adding and removing replicas as required. Statelessness means there should be no changes to configuration or the data that is held by a component after it starts that makes it different from any other replica.

Affinity is one of the most common issues. Expecting a specific user or consumer’s requests to come back to the same component on their next invocation, perhaps due to specific data caching. Suddenly, the orchestration platform cant do simple load-balanced routing, relocation, or scaling of the replicas.

Two phase commit transactions across components should also be ruled out, as the semantics of the REST and Event-based protocols do not allow the communication of standardized transaction coordination. The independence of each fine-grained component, with its minimal state, makes the coupling required for a 2PC coordination problematic in any case. As a result, alternative ways of handling distributed updates, such as the Saga pattern, must be used, taking into account the issues of eventual consistency already alluded to.

Note that this concept of minimal state should not be confused with a component interacting with a downstream system that holds a state. For example, a component might interact with a database or a remote message queue that persists state. However, that does not make our component stateful, it is just passing stateful requests onto a downstream system.

There will always be some components that require state. Platforms such as Kubernetes (K8s) have mechanisms for handling stateful components with extra features, and associated restrictions. The point is to minimize it, and to clearly declare and manage it when it does occur.

Immutable deployment

If we are to hand over control to a cloud platform to deploy and manage our components, we need to make it as straightforward as possible. In short we want to ensure there is only one way to deploy the component, and once deployed, it cannot be changed . This is known as immutable deployment and is characterized by the following three principles

  • Image based deployment — We deploy a fixed “image” that contains all dependencies
  • No runtime administration — No changes are made directly to the runtime once deployed
  • Updates and rollbacks by replacement — Changes are performed by deploying a new version of the component’s image.

From “appropriate decoupling” we already know that our components should be self contained. What we need is a way to package up our code and all it’s dependencies that will enable extremely consistent deployment. Languages have always had mechanisms to build their code into a fixed “executable”, so that’s not new. Containers bring us the opportunity to go a step further than that, and package up that code/executable along with the specific version of the language runtime, and even the relevant aspects of the operating system in to an fixed “image”. We can also include security configuration such as what ports to make available, and key metadata such as what process to run on startup.

This allows us to deploy into any environment consistently. Development, test, and production will all have the same full stack configuration. Each replica in a cluster will be provably identical. The container image is a fixed black box, and can be deployed to any environment, in any location, and (in an ideal world) on any container platform, and will still behave the same.

Once started, the image must not be further configured at runtime, to ensure it’s ongoing consistency. This means no patches to the operating system, no new versions of the language runtime, and no new code. If you want to change any of those things, you must build a new image, deploy it, and phase out the original image. This ensures we are absolutely confident of what is deployed to an environment at any given time. It also provides us with a very simple way of rolling back any of these types of change. Since we still have the preceding image, we can simply re-deploy it — assuming, of course, that you adhered to “minimal state”.

Traditional environments were built in advance, prior to deployment of any code, and then maintained over time by running commands against them at runtime. It’s easy to see how this approach could often result in configuration divergence between environments. The immutable deployment approach ensures that code is always deployed hand-in-hand with its own copy of all the dependencies and configuration it was tested against. This improves testing confidence, enables simpler re-creation of environments for functional, performance and diagnostics testing, and contributes to the simplicity of elastic scaling.

Note that in theory we could have done image based deployment with virtual machine images, but they would have been unmanageably large. It was thus more efficient to run multiple components on a virtual machine instance, which meant there was no longer a one to one relationship between the code and it’s dependencies.

Zero trust

Put simply, zero trust assumes that all threats could potentially occur. Threat modelling has a long history, but the nature of cloud native solutions forces us to reconsider those threats and how to protect against them. Some (but not all) of the key tenets for a zero trust approach are:

  • Minimal privileges — Components and people should have no privileges by default. All privileges are explicitly bestowed based on identity.
  • Implicit data security — Data should always be safe, whether at rest or in transit.
  • Shift Left security (DevSecOps) — Security should be included at the earliest possible point in the lifecycle.

It has long been known that traditional firewall-based access control results in inappropriate trust of the internal network. Indeed the assumption that you can create trusted zones in a network is, at best, only a first line of defense. Identity needs to become the new perimeter. We should aim for fine grained access control based on the identity of what we are trying to protect: users, devices, application components, data. Based on these identities, we should then offer only the least amount of privileges. Connectivity between components should be explicitly declared and secured (encrypted) by default. Administrators should only be given the precise privileges they need to perform their role. We must also regularly perform vulnerability testing to ensure that there are no paths to permissions escalation.

Applications must accept that they have a responsibility to keep their user’s data safe at all times. There are ever more sophisticated ways to correlate data from multiple sources, deriving new information for malicious purposes. Applications should consider privacy of all data they store, encrypt any sensitive data both at rest, and in transit, and ensure it is only accessible by identities with explicit permission.

Application components must be built secure from the start. We must assume that all environments, not just production, are vulnerable to attack. But more than that, through a “shift left” of security concerns, we should ensure that application designers collaborate early on with the security team, and embed secure practices seamlessly for example in build and deploy pipelines. This further improves the speed, consistency, and confidence with which we can continuously deliver code to production.

In the next article, we’ll look at the unique aspects of cloud technology and infrastructure to see how they enable the cloud native approach.

--

--

Kyle Gene Brown
The Startup

IBM Fellow, CTO for Cloud Architecture for the IBM Garage, author and blogger