Thursday, July 16, 2020

The divide and conquer approach and the affect on resource consumption

If you want to get to the point where you are creating a single container to solve a specific business use case, you could get into the situation where a container goes out of scope. This could lead to wasted resources.

There is one solution to this problem that I had previously blogged on -- the research done by Allibaba in switching containers without reloading the pod.

 http://jmenke.blogspot.com/2019/12/revisit-alibaba-container-framework.html

Reading on the vertical autoscaler recently, I am thinking maybe there is another path to accomplish the same thing?  If this is true, it might be possible to pre-deploy containers in a pod and "activate" them during a phase change.  This could lessen the "churn" required in deploying a new pod.

Update:  7/17/2020:

It appears the VerticalPodAutoscaler  will replace the entire pod:

"That is, the VerticalPodAutoscaler can delete a Pod, adjust the CPU and memory requests, and then start a new Pod."

https://cloud.google.com/kubernetes-engine/docs/how-to/vertical-pod-autoscaling

So the VerticalPodAutoscaler will actually recreate the pod.  The alibaba solution did not do this

"We enhanced the upstream services from two aspects; we call one the new controller, we call the other one statefulsets. The first thing we do is we introduce a so-called "in-place upgrade," which means if the container images’ only component is getting upgraded in a spec, the controller will not recreate a new pod. Instead, the Kubelet will just restart the pod with the new image, and the pod is kept the same

The takeaway here is that it appears the mesh has the potential to change during the course of processing - this can work to mitigate potential resource waste.  Another option would be to code this logic into an Operator that would dynamically adjust the components in the grid.

In Summary:  3 options to control resource usage during a compute sequence.

1. Alibaba approach:  would be the fastest way to change resources
2. VerticalPodAutoScaler : might be too slow but still has some capabilities
3. Control via Operator:  again might be slow - I had blogged about runners starting early during a baton transfer - something like this could be used to prevent waiting for resources to appear.

Note in options 1 and 2 the Execution DSL would control this.


No comments: