Clevyr Blog

That New Infrastructure Smell

Written by Clevyr TechOps Team | Mar 28, 2025 5:00:00 AM

When clients choose to host their applications with Clevyr, they have several choices. The most cost-effective is to utilize our multi-tenant Kubernetes hosting. This is a highly-available and scalable solution, and instead of paying for dedicated compute resources and overhead, clients only pay for what their apps need.

Our primary Kubernetes cluster (and the DevOps processes we created around it) had served us well for several years, but we discovered new processes, tools, and paradigms that we wanted to take advantage of.

Here's the list of what we wanted:

  • GitOps

  • Infrastructure Agnostic

  • Quicker Application Configuration Changes

  • Reusable CI/CD Workflows

  • Better Secret Management

  • Private Cluster Access

  • Highly-Available Databases

After over a year of putting things on the wishlist, evaluating them against different solutions, and making small-scale trials, it was time to take the plunge and migrate every application to the new infrastructure. In the middle of 2024, we performed the final checks and began moving over workloads. Let’s explore how we accomplished each item from our list of requirements for our new platform.

GitOps & Infrastructure Agnostic

Infrastructure as Code (IaC) is a core part of our DevOps practice. Our previous deployment strategy was a custom mix of Terraform, Terragrunt, and Helm that ran via GitHub Actions. It enabled us to store the desired application state in Git, and gave us blue/green deployments with no down-time for our applications. It worked well enough, but there were several drawbacks.

This workflow was completely tailored to our infrastructure stack, it required GitHub Actions to reach into the Kubernetes cluster to deploy updated application configurations, and it didn't solve for config drift effectively. When we came across the practice of GitOps, it resonated with us. It seemed like the next logical step from what we had built ourselves: application state in Git, constantly reconciled inside Kubernetes. We evaluated both ArgoCD and FluxCD, and found that Flux was a better fit for us. Both of these tools run inside Kubernetes, and synchronize state from a Git repository. They run on any flavor of Kubernetes, and since they are pull-based, the CI/CD platform doesn't have to reach into the cluster.

Quicker Application Configuration Changes

In our previous setup, we stored application IaC alongside application source code. Our deployment process built the container and then deployed it via Helm. The biggest downside to this method was that simple things like updating an app's environment variables, which didn't technically need a container build, were stuck waiting on one anyway. This method, paired with our branch-based environment promotion strategy, resulted in some messy merges.

We decided to pull the application configuration (in the form of Flux HelmReleases) into a separate repository from the app source code. This does require a cross-repo commit to update application code, so we wrote a tool to do just that. There are other potential solutions to this, such as Flux's image scanner.

Reusable CI/CD Workflows

Our previous GitHub Actions workflows grew and developed organically over time, with new apps having improvements that rarely trickled down to older applications. As part of our modernization efforts, we wanted to reduce the duplication of these workflows.

Fortunately, GitHub Actions has greatly improved their reusable workflows and our team rewrote all of our build and deployment jobs to take advantage of them. Our deployment job in each app's repo went from around 100 lines to 17, and since every application points to our centralized workflows, even our oldest applications get the latest improvements to the CI/CD process.

Better Secret Management

Secrets used to exist as Kubernetes secrets, which we'd also store in our organization's password manager. This process wasn't automated, leaving room for human error and extending downtime if we needed to restore the environment. We decided to store our secrets in Git and have them automatically deployed via Flux like everything else. Flux natively supports SOPS and it has been flawless for us. It supports PGP, Age, and several cloud KMS providers, which are what we use in most cases.

The best secret is no secret, so where we can, we've eliminated IAM credentials by taking advantage of Kubernetes Workload Identity. We also moved to authenticating GitHub to the image repositories via OIDC.

Private Cluster Access

It really isn't a good idea to have the Kubernetes cluster API exposed to the public internet. Having our cluster API private but accessible to our team was a firm requirement. In the post-COVID work-from-home world, we needed to account for remote team members without requiring them to route all of their traffic over a VPN. Tailscale has been perfect for this.

We install Tailscale into our Kubernetes clusters, and through the magic of 4via6 routing, we can connect to every cluster's kubeapi directly from our laptops without needing to manage cloud security group rules or route traffic through a central subnet router.

Highly-Available Databases

Many of our applications use databases inside Kubernetes. A simple StatefulSet can run a database just fine, but there were times when it isn’t ideal. Cluster upgrades require rolling restarts of nodes and while our application containers could gracefully RollingUpdate to another node, our databases could not, so brief outages would result. Depending on the cloud provider, a single database pod’s storage volume may be locked into one Availability Zone, which adds a single point of failure. Database version upgrades were also an involved process.

CloudNativePG is a Kubernetes operator specifically built for running Postgres. With it, we can run multiple Postgres instances with no management overhead on our part. Version upgrades automatically roll through the instances, and the operator can promote an instance on another node when the primary is on a node going down for reboot during cluster maintenance.

Mission Accomplished, For Now...

Over the past few months, we've migrated our workloads to the new cluster incorporating all of these features, and it's been fantastic. This is DevOps at its best: reducing tedious tasks, decreasing deployment time, and eliminating repetition.

At Clevyr, we're committed to continuous improvement. As technologies evolve, our TechOps team will keep evaluating and implementing the innovations that improve our platform for our developers and for our clients. We're excited about what we'll learn and do next, and we're proud to offer hosting solutions that combine cost-effectiveness with enterprise-grade resilience and security. If you're looking to raise your TechOps game, or looking for a hosting partner to focus on continuous improvement on your behalf, reach out to learn what Clevyr can do for you.

Glossary

Age: A simple, modern, and secure file encryption tool that uses public key cryptography.

CD: Continuous Delivery/Deployment - The practice of automatically deploying code changes to production environments after passing automated tests.

CI: Continuous Integration - The practice of frequently merging code changes into a shared repository where automated builds and tests run.

Git: A distributed version control system for tracking changes in source code during software development.

Helm: A package manager for Kubernetes that helps define, install, and upgrade applications.

IAM: Identity and Access Management - A framework for managing digital identities and their access to resources.

IaC: Infrastructure as Code - Managing and provisioning infrastructure through code instead of manual processes.

KMS: Key Management Service - A service that helps create and control the encryption keys used to encrypt data.

Kubernetes: An open-source platform for automating deployment, scaling, and management of containerized applications.

OIDC: OpenID Connect - An identity layer built on top of the OAuth 2.0 protocol that allows clients to verify the identity of end-users based on the authentication performed by an authorization server, as well as to obtain basic profile information about the end-user.

PGP: Pretty Good Privacy - An encryption program that provides cryptographic privacy and authentication.

Terraform: An open-source infrastructure as code software tool that enables defining and provisioning infrastructure using a declarative configuration language.

Terragrunt: A thin wrapper for Terraform that provides extra tools for working with multiple Terraform modules.

VPN: Virtual Private Network - A service that creates a secure, encrypted connection over a less secure network.