Autoscale Kubernetes workloads on any cloud using any event

Aldipower · 2024-04-30T19:33:01

Who needs autoscaling? I mean this as a serious question. Has somebody a real story where autoscaling helped out the company or product?

If you have the hardware resources, why not just scale up from the beginning on? If you do not have the resources, you need a lot of money anyways to pay the upscaled rent afterwards.

jrockway · 2024-04-30T19:44:23

For people on the cloud, you can buy new computers and sell them back in a period of time measured in hours, so that's how people are using these systems. It doesn't make a ton of economic sense to me in general, because your periods of high demand are going to be the same periods for everyone else in that datacenter (you want to serve close to your users to minimize latency, and most of the users in a geographical region go to work and sleep at the same times). That said, it's not priced like that. Having burstable core instances "always on" can end up being more expensive than buying guaranteed capacity instances for a short period of time.

I never auto-scale interactive workloads, but it's good for batch work.

Other people have different feelings. Consider the case where you release software multiple times a day, but it has a memory leak. You don't notice this memory leak because you're restarting the application so often. But the Winter Code Freeze shows up, and your app starts running out of memory and dying, paging you every day during your time off. If you had horizontal autoscaling, you would just increase the amount of memory that your application has until you come back and fix it. Sloppy? Sure. But maybe easier to buy some RAM for a couple weeks and not disrupt people's vacation. (The purist would argue their vacation was ruined the day they checked in the memory leak.) This gets all the more fun when the team writing the code and the team responsible for the error rate in production are different teams in different time zones. I don't think that's a healthy way to structure your teams, but literally everyone else on earth disagrees with me, so... that's why there's a product that you can sell to the infrastructure team instead of telling the dev team "wake up and call free() on memory you're not using anymore".

slyall · 2024-04-30T22:18:14

I had a simple one on a Kubernetes cluster in AWS.

What happened is we'd have a queue processor that normally needed a couple of pods to handle events. Except that once a day another process would drop in 5 million requests into the queue.

So I just had a simple keda autoscaler based on the length of the queue. One pod for every 10,000 items in the queue with a minimum of 2 pods and a maximum of 50 pods.

It would scale up after the big queue dumps, chew threw the backlog and then scale back down again.

John23832 · 2024-05-01T00:00:28

Keda was really good. I just wish that they had actually developed an http scaler. Having to use Knative scaling as an alternative is a pain.

dijit · 2024-04-30T19:37:14

I gave a talk on this[0], but I've had some moderate success doing autoscaling based on region for AAA always-online games.

That said, you could conceivably live at a higher abstraction.

Take dev environments for example. Ideally the team working on infra problems does not need to care how many versions of a backend are operating on the dev environment.

The only thing infra needs to take into account the requested resources.

Perverse incentives on wasting resources aside, it's nice when you can have fewer variables in your mind when focusing on your responsibility areas, it allows deeper intuition and creativity - at the sacrifice of some cross cutting creativity across teams.

[0]: https://sh.drk.sc/~dijit/devfest2019-msv.pdf

deathanatos · 2024-05-01T05:55:38

> Ideally the team working on infra problems does not need to care how many versions of a backend are operating on the dev environment.

> The only thing infra needs to take into account the requested resources.

> Perverse incentives on wasting resources aside

(I do infra.) That's like, 95% of the problem. AFAICT, most devs have absolutely no idea how powerful a computer is.

My last change was to resize a 400 core, 800 GiB set of compute into a 100 core, 150 GiB set. It was just ludicrously over-provisioned, because the dev teams isn't incentivized to care at all. (…sadly, I'm not allowed to go out an hire a dev now, even though I literally just saved that amount of money in cloud costs…)

(It's still over-provisioned, but that was the easy "we can lop this compute off and I promise you you won't notice.)

The economics/incentives at play are the hard part. Getting management to not look at infra for "ehrmagerd the cloud bill" but instead devote dev time into getting them to dig into "why is this app, which ostensibly just shuffles JSON about the landscape, using 8 cores and all the RAM?" is … tough. And not what I signed up for in SWE, damn it.

The other way is equally bad: devs find they've run out of resources? Knee-jerk is "resize compute upwards" not some introspection of "wait, what is a reasonable amount of CPU use for JSON shuffling?"

Usage graphing is the other tool that really puts some devs' work in a rather bad light: resource requests of like 20 CPU, but the usage graph says "0.02 CPU". So … at least the code's not inefficient, but the requested resources are wasted.

zbynek · 2024-04-30T20:06:31

(To be transparent, CTO of Kedify and a maintainer of KEDA here)

Folks in other comments have answered this pretty well. Over the past couple of years, I've talked to many companies and individuals who have greatly benefited from autoscaling on k8s. Generally, it has helped in these areas:

1. Obvious case: if you run your environment on cloud providers, it can significantly save costs and improve throughput.

2. It's not just about autoscaling workloads, but also about managing batch jobs (K8s Jobs) that are triggered by events or custom metrics on demand (you can think of this as a CronJob on steroids).

3. On-prem solutions: You're right; you can use the resources you've already paid for. However, by enabling autoscaling, you can also improve the distribution and utilization of those resources. In large organizations, it is common practice for individual teams to be treated as "internal customers" with assigned quotas they can use. Autoscaling can be helpful in these scenarios as well.

If you are interested in the area, I've given several talks on K8s autoscaling, for example, our latest talk from KubeCon: https://sched.co/1YhgO

tomasGiden · 2024-04-30T20:01:17

We’re using KEDA and ScaledJob to scale tomographic reconstructions in the cloud. When a CT scanner has finished uploading a scan, we let a ScaledJob create a Job to process the data. A scan is maybe 8 hours and during that time we don’t need any compute resources. But when it’s done we need both lots of CPU and GPU power to process GBs and TBs of data rapidly to show previews to the user.

Also, when a user triggers new previews we scale up nodes to process that data. The problem there though is the scale up time of the node pool which is a few minutes for a GPU node on Azure.

We payed to have a GPU running all the time before but that got too expensive.

As a side note, would I do it again I probably wouldn’t build a data pipeline on top of KEDA ScaledJobs and possibly not use Kubernetes at all.

7thpower · 2024-04-30T20:02:51

What would you use if you were to start fresh?

jacques_chester · 2024-04-30T19:38:59

> If you have the hardware resources, why not just scale up from the beginning on?

For most workloads it's wasteful to have max capacity provisioned at all times if you can instead provision on-demand.

This is true in general. For example, electricity supply is a mix of baseload power (cheap but only if left running constantly) and peaking (expensive but easy to turn on and off). It wouldn't be economical to have baseload capacity equal to maximum demand. Instead it is aimed at minimum demand and other sources make up the difference depending on demand.

temp_praneshp · 2024-04-30T20:51:06

I'm not sure if this answers your question but the last 2 companies I worked at (~7 years) both had very clear traffic spikes 9a-5p US east coast hours on weekdays. My current place actually sees more than 20-30% drop sunday nights compared to monday morning, and it's constantly going up because we have a lot of American enterprise customers.

Maybe I misunderstood your question but is there a case where you can keep your entire capacity running for free? I'd assume you pay AWS/other cloud or your electricity provider.

vidarh · 2024-05-01T07:10:19

> is there a case where you can keep your entire capacity running for free? I'd assume you pay AWS/other cloud or your electricity provider.

Colo providers charge by the [rack with given network port size and power delivery], so unless you literally host on premises which almost nobody does even when they talk about on prem, once you get outside of a cloud environment it is rare for it to pay to shut down servers unless they'll be down for a long time. Maybe there'd be a business there for colo providers to offer pricing that incentivises powering down machines (almost all modern servers have IPMI, and so as long as you provide the trickle - relatively speaking - of power for the IPMI board you an power the servers down/up over the network on demand), but it's not the norm.

The problem with these traffic spikes you mention is that "everybody" has them, and the overlap is significant, and so they're priced in because the cloud providers needs capacity to handle the worst overlap in spikes, plus margin. 20%-30% drop is way too low to cover the cost gap between even managed servers with a huge capacity margin and most cloud providers. I've worked for a lot of different companies where we've forecast our capacity requirements, and the graphs look almost identical. Sometimes shifted n hours to account for different in timezones, but for a lot of companies the graph is near identical globally because of similar distributions of userbase.

(If you do think you can do scaling up/down for daily spikes cost effectively, you can typically do it even more cost effectively by putting your base load in a colo'ed environment, and scale into a cloud environment if you hit large enough spikes; the irony is that in environments where I've done that, we've ended up cutting the cost of the colo'ed environment by cutting closer to the margin and end up almost never end up scaling into the cloud, but it gives peace of mind, and so being prepared to use cloud has made actually using cloud services even less cost attractive).

In practice, most places - there are exceptions that makes good use of it - just set up autoscaling so they don't need to pay attention to creeping resource use. Which is rarely a good use of it.

There are good uses for autoscaling, but it's very rare for day/night or weekend/weekday cycles to be significant enough that it isn't still cheaper to buy enough capacity to take all or most of the spikes (but having the ability to scale into a cloud service might mean you only buy just enough for the "usual" weekday cycles, or even shave a little bit of the top, instead of buying enough for unexpected surges on top).

benced · 2024-04-30T21:43:51

I have the opposite question: who is on the cloud and has such consistent workloads they never need to scale up or down? I'm sure those users exist but they must be the minority, right?

deathanatos · 2024-05-01T06:03:14

Well … I don't have a consistent workload (our load is highly diurnal, enough so that you can spot lunch) but we just mostly don't deal with the complexity of it. For many things, we just alloc 3 VMs, across 3 zones — we really don't need more?¹ — and so the next scaling step is to 0.

¹I think the thing here is that for most of the jobs I've worked … we're really not doing "big" things. The complexity is all business logic or how the product does what it does, not scale.

(We do provision some CI compute on-demand, so that scales with load, so it's not all fixed.)

I think the last "ooh, fun compute!" thing I did was like almost 10 years ago now where we had a huge job that needed to run. But it was sort of the opposite of the stuff in this thread: since it wasn't on-demand, we could run it whenever. That ended up being at night on spot-priced VMs, when they were cheap.

rikthevik · 2024-04-30T19:51:38

Our customer workloads are bursty, so being able to scale down to 0 (or close to it) saves us a lot of CPU and memory that would otherwise do nothing for most of the day.

liveoneggs · 2024-04-30T19:37:10

In the cloud you pay by the minute (or less) so scaling down saves money. Every single day my services scale up during peak times and down in the evenings.

JojoFatsani · 2024-04-30T23:41:12

This is the cloud so we are talking about paying for servers running/provisioned at any given time. If you’re cronjob heavy, or if you have consistent cycles of traffic patterns (say it follows the working day), you can save a LOT of money by not running idle servers all night or on the weekends or whatnot. This was like a 40k per month delta at my last gig.

We could also talk about optimizing the costs of development and staging and sales demo workloads that don’t need to run 24/7 or even 8/5 as well.

vidarh · 2024-05-01T07:52:28

I think this is part of the problem. We're talking about the cloud because "everyone" think they a) need autoscaling (some do, many don't, and even fewer would need it if they weren't paying inflated cloud costs), b) never consider that they can keep base load on servers provisioned one way and auto-scale additional capacity only, and actually considering that tends to change the economics dramatically.

I've set up multiple hybrid setups over the years, and what I've consistently found is that we can provision 2x-3x (more if egress is high) the amount of server capacity for the same price with managed hosting providers or in colo'ed environments than with cloud providers. That's fully loaded cost including rates for contracts for devops etc..

Very few people need to auto-scale up more than that. But since most people still want orchestration, it tends to cost little to set up their system so that if they have a spike, they can scale up extra capacity in a cloud. And in doing so, they can cut the amount of hardware they provide for the base load to whatever is cheapest.

The first times I did this, I was fully convinced going in that this would mean we'd set the base load around the lowest utilization over a typical 24/7 cycle, and spin up some cloud instances during daily peaks etc.

In practice, after actually testing what pays for a given scenario, I've yet to see that (scaling up/down for a typical 24/7 cycle or 8/5) pay off, though I'm sure it can for some people. .

Managed servers proved in actual, real-life usage scenarios, to be sufficiently cheaper that unless your spikes were very brief and sharp[1], it was cheaper to provision enough base capacity to handle most or all of the normal daily spikes, and what a hybrid setup bought us was the freedom to not overprovision for "what-if" scenarios.

That effect was significant enough that even e.g. SaaS services used almost entirely by office staff within a single time zone often do not save on auto-scaling vs. non-cloud servers scaled for peak use, because the 8-10 hour window of use that creates is far too long - depending on your specific cloud cost and what your cheapest alternative is, it may vary, but I've rarely found it pays to spin up cloud resources for spikes that are on average any longer than 4-6 hours in a day, and that tends to rule out most "normal" cyclical use, especially as you can often adjust "cronjob heavy" parts of your workload to fall outside of the window, for example, to even out the load.

Auto-scaling absolutely pays at far smaller variations in load if your only option is to have all your load in a cloud environment, but even then I see a lot of people resort to auto-scaling before they've even though of cutting the cost of their base load by e.g. ensuring they use reserved instances where it makes sense etc., or negotiating. Often ticking those boxes will have a much larger impact.

By all means ensure your system is built so that it can handle auto-scaling gracefully, though - it will benefit you whether or not you end up making much use of it.

[1] As an example where we got "close", one company I worked had several clients that did large e-mail sends for restaurant chains that often included massive discounts. On the e-mails with highest open rates, you'd then very predictably get traffic spikes from 8:00-8:15, 9:00-9:15, and 10:00-10:15 that were massively, as people checked their e-mail when they got into the office, with the 9:00-9:15 peak being several times higher than their normal daily use. If they had been hosting this themselves, it'd have paid to auto-scale into a cloud env. to handle those spikes, especially as they didn't send such campaigns every day. In our case, most of our other customers had reasonably quiet mornings and so we can could overprovision VM's from our base capacity for them at no extra cost to us. But this was also a rare exception.

emmanueloga_ · 2024-04-30T20:53:10

Linode has a series of practical articles on how to use autoscaling with Keda in LKS [1]. The ability to scale up/down has obvious cost saving benefits, while retaining the ability to serve peak traffic or optimize background work.

--

1: https://www.linode.com/blog/?sq=KEDA

abhiyerra · 2024-04-30T23:28:08

We work with customers with various workloads and some ways we help them scale:

1. Have a baseline amount of resources and use keda to scale up using spot instances. 2. For video call bots scale up during the calls and then scale down once the call is completed. Some of our customers can scale up to hundreds of machines during the day then scale down to a couple during the end of the business day. 3. Scale up for cronjobs that require a lot of resources but where the web traffic can use significantly less resources.

mikedelago · 2024-05-01T13:09:14

I worked on a project with an e-commerce company a few months back. They use autoscaling in their Kubernetes cluster to account for high load during peak hours (generally noon until 7PM or so). It would go to 10-ish instances of 2 apps during this peak, and then during the non-peak times it'd drop back down to 2 instanches.

This is pretty significant, since the 2 different apps are relatively large JVM apps, each requiring ~16GiB of memory

turtlebits · 2024-04-30T22:03:02

Anyone who doesn't have the same load 24 hours a day?

Ed-tech is a big one where you may have extremely low traffic on weekends/summer/holidays/breaks.

kube-system · 2024-04-30T19:36:28

1. Scaling up beyond the level of resources you anticipated may help you maintain better uptime. This could be useful if uptime is very valuable for you.

2. Hopefully, if you scale up, you can also scale down, which will save you money when you don't need to rent the resources.

ses1984 · 2024-04-30T21:05:24

The New York Times crossword comes out every night at 10pm. There is a traffic spike. It’s huge.

mplewis · 2024-04-30T19:57:31

I save a bunch of money every month by running my nightly tasks on ephemeral nodes.

poshmosh · 2024-05-01T01:37:50

auto scaling could also mean scaling to 0. If you are running GPU workloads in k8's, you would typically setup a node pool with gpus that can scale to 0 after the job runs.

Germanion · 2024-04-30T22:49:46

We scale our cicd.

~8-~24 Mo-fr 10-350vms

innovate · 2024-04-30T17:10:12

Kedify has recently launched a SaaS-based Kubernetes event-driven autoscaling service that is powered by the popular OSS project KEDA.

acedTrex · 2024-04-30T19:27:29

so basically this is a gui abstraction over "kubectl apply -f scaledObject.yaml" ...

zbynek · 2024-04-30T20:12:42

It is a single place for installation, updates, security fixes and the whole administration of KEDA across the whole fleet of k8s clusters. Very soon there will be configuration optimization and recommender, dynamic scaling policies and much more. And yeah, also gui abstraction over `kubectl...` is there :)

mplewis · 2024-04-30T20:07:48

Not really.

fbergen · 2024-05-01T16:51:18

What part of the stack takes the majority of the time for spawning a new replica? Is it the time to boot a VM/environment or is it application doing bunch of init work setting up connections etc?

benced · 2024-04-30T21:44:39

The march away from companies directly managing Kubernetes to Kubernetes being the layer that every future abstraction will be built on continues.

anonymous_union · 2024-04-30T19:21:39

kubernetes is dying, isn't it?

dijit · 2024-04-30T19:39:50

Kubernetes is a framework, it'll take a long time to die, it will likely contort itself into fitting whatever paradigm is needed.

However, I wonder what you mean? Kubernetes from where I sit has almost complete ubiquity across most companies. Even in places where it's a poor fit.

vidarh · 2024-05-01T07:59:46

People starting to put abstractions over it means a) there's a chance people will start asking for a given abstraction rather than Kubernetes, and not care if that abstraction eventually subsumes or replaces Kubernetes, b) at least some people think Kubernetes is enough of a nuisance to deal with to be looking for alternatives.

Whether that means Kubernetes is dying, I'm not so sure. But Kubernetes is extremely complex for a lot of workloads that it's total overkill for, so I'm not surprised people are looking for options.

kobalsky · 2024-04-30T20:58:35

Could you explain how you arrive to that conclusion from seeing some proyect offering an alternative autoscaling engine?

The only issue I have with the default HorizontalPodAutoscaler is that I cannot scale down to 0 when some processing queues are empty. Other than that, we have shrines erected to k8s.

lifty · 2024-04-30T19:23:55

Yes, just like Linux.

zbynek · 2024-04-30T19:45:33

Linux is dying? I though that year 2024 is the year of Linux on the desktop!

vidarh · 2024-05-01T07:55:03

I just realised I've used Linux on the desktop for 30 years this year.

7thpower · 2024-04-30T20:04:50

No it’s 2025, but definitely going to happen.

bigstrat2003 · 2024-04-30T21:05:22

Sadly yes. Netcraft confirms it.

remram · 2024-04-30T20:13:44

I'm not sure what in the article makes you say that? Anyway, the answer is no.

nilamo · 2024-04-30T19:24:24

Commenting so I can see the replies later...