The problem with environments that are always on
Many development and test environments run as if they are production: 24 hours a day, 7 days a week. That feels safe because the environment is always available. In practice, a DTA cluster is mostly used during office hours, demos, tests and deployments.
Outside those moments, compute often waits for work that never arrives. For a large enterprise estate that may look like a small detail. For a startup, it is exactly the kind of structural waste you want to solve early, before the environment grows and the bill grows with it.
Why EKS Auto Mode mattered here
EKS Auto Mode changes the compute layer compared with classic managed node groups. AWS manages more of the node operation, while using the same model you know from Karpenter: NodePools and NodeClasses define which compute may become available when workloads need it.
Autoscaling only helps so much if all workloads remain active. You scale more efficiently, but you still pay for an environment that has no value outside working hours. The move at Bettr was therefore not only to scale smarter, but to give the whole DTA layer a day and night rhythm.
The cheapest node is still the node you do not need to run.
The order matters
You cannot simply switch off node pools and hope the rest follows cleanly. The order determines whether this feels reliable or like a hard power-off. At Bettr, the shutdown deliberately moves from application to database to compute.
First, the applications are switched off. Then the CNPG clusters are moved to the desired off state through ArgoCD. Only when workloads no longer need compute are the node pools in the cluster disabled. Because those node pools also live in ArgoCD, the whole operation remains declarative and repeatable.
- Applications off first, so no new workload demand is created
- CNPG then moved to the desired state through GitOps
- Node pools off last, allowing compute to drop practically to zero
Visible in the OPS channel
Every scheduled action reports back. The team can see not only that the environment was switched off or on, but also why the workflow ran and which environments were affected.
Why notifications are not a side detail
Scheduled automation is only mature when you can see what happened. Otherwise it becomes a hidden process everyone trusts until one morning something is not running. That is why the workflow posts to the OPS channel when the DTA environment is scheduled off and when the enable window activates it again.
Those notifications are small, but important. They include context, such as the reason for the run, and they make failures visible immediately. If an action fails, that also lands in the same channel. The flow stays operationally controlled without anyone manually checking it every evening or morning.
Developers stay in control of exceptions
A scheduled shutdown should not become a blocker for developers. If someone works later in the evening or wants to finish a test, they can manually enable the DTA environment through the GitHub Actions workflow. The developer selects the desired action and which parts should be included. The default stays cost-aware, while the exception remains self-service, visible and controlled.
What the 50 percent actually means
For the DTA cluster, this approach reduces compute cost by around 50 percent. That makes sense: evenings, nights and weekends are a large part of the week. If compute does not need to run during those windows, a large share of structural cost disappears.
That does not mean the entire AWS bill is cut in half. The EKS control plane, storage, backups, load balancers, networking and other fixed components may still keep running. The claim is deliberately about compute. That distinction matters: this is not a magic cost button, but it is a concrete reduction in the part that often grows fastest.
FinOps becomes mature when cost reduction becomes part of the normal platform flow.
Why this still matters next to reservations
Right now, the euro impact is not extreme yet because Bettr also lowers cost through reservations. That does not make this optimisation less relevant. It means there are already multiple layers of cost control working next to each other.
Reservations reduce the price of compute you use. Scheduled shutdown prevents you from using compute when it adds no value. Those are different mechanisms. In a startup environment, where clusters, workloads and extra environments can grow quickly, that second category becomes more valuable over time.
The lesson for platform FinOps
FinOps often gets stuck in reporting: where money goes, which service is expensive, which trend is increasing. That is useful, but it does not solve the problem by itself. The mature step is turning cost decisions into platform behaviour.
At Bettr, that means the DTA layer has a rhythm, ArgoCD guards the desired state, EKS Auto Mode provides compute when demand exists and the workflow visibly reports what happened. Cost reduction stops being a separate monthly exercise and becomes part of how the platform works every day.
What it creates
This approach makes the DTA environment cheaper without making it unpredictable. Teams can work normally during the day, while the environment scales back in a controlled way outside working hours. And if developers work late once in a while, they can temporarily switch the environment back on themselves. That is where platform engineering and FinOps meet: not only visibility into cost, but technical defaults that automatically reduce waste.
