Saturday, March 27, 2021

Greetings from AWS

For the past few months I have been working inside AWS using the CDK to deploy a Kubernetes application inside EKS.  I have learned a few things along the way.  First, I have finally found a way to provision nodes and resources via code.   Second, I have found ways to provision Kubernetes applications on these nodes with the same code.  Third, creating infrastructure is a one time deployment operation in the CDK.  

Whereas Operators are about adjusting at runtime, the CDK is about provisioning and mainly static in this sense.   What is needed is to be able to change the fabric at runtime, to start with a base and then have the ability to change it in response to the current compute needs.   There is the ability to create auto-scaling groups in EKS - it may be be possible to update the number of requested nodes via Boto3 from outside of the cluster.   This might make the system more dynamic.  In fact, additional auto-scaling groups might be provisioned and removed according to an execution blueprint working from outside the cluster.

I am exploring creating a Django application that controls the deployment of EKS clusters to AWS via the CDK.  It might be possible to expand this application to handle sending the messages from boto3 vs directly in an operator.  This might be cleaner than previous designs in that Operators would not need to connect outside of K8's to the cloud provider.  The management application would call down to the K8's layer and Operators would take over from there.  

This would give the system the ability to be more responsive than out of the box auto scaling could provide.  As was mentioned several times in previous posts, the key to scaling is providing the extra resources with minimal delay to the flow.  As with runners transferring a baton in relay race, the next runner starts before the hand-off not at the hand-off.   This separation of concerns would allow that.  The Django CDK layer could create blueprints for execution and start up K8's nodes ahead of time so they would be ready to accept input at the correct point in the chain.   So there would be no lag waiting for the system to scale up.


No comments: