Friday, May 08, 2020

The concept of phase and positioning of containers in Pods

Co-location of containers can optimize resource usage


In the past few blog posts I have spoken about how divide and conquer, shared nothing, partitioning, and parallel execution can increase throughput; but how to ensure that compute resources are being utilized efficiently throughout the entire flow of operations in a less than trivial compute fabric.

In order to analyze this we think about breaking the flow of computation into phases.  It may be the case that different phases will require different containers to be active.  Containers in a pod can share data via the local file system and all the containers in a given pod involved in a calculation do not need to be active at the same time.  This is where phase comes in.  The idea is that as the state of calculation changes from phase to phase different containers in the pod become active. As the containers in a pod share the resources available to them it should be possible to keep the granted resources fully utilized or optimally utilized throughout the flow of operations.

This may be where the the Execution DSL needs to relay information back to the Resource DSL.  The correct positioning of containers may be able to be determined by understanding the metrics that can be obtained during execution.  Resource limits may be adjusted in response to information that is gathered during processing.  As more data is run through the system, the ability to fine tune the resource configuration may be possible.

The ability to share data between containers should be considered an advantage of co-locating containers in a single pod.  It is true that pods on the same node probably quickly share data, but that requires more careful attention to scheduling.  Containers in the same pod are guaranteed to be able to transfer data quickly.

Although textual DSL's are easier to edit, this may be where graphical DSLs could prove to be valuable.  Visualizing allocation might be easier than seeing allocation in text form.  It might be easier to see optimization opportunities.

In previous posts, I identified two types of data flow.  One being between pods in the system and being inside the pods themselves.  The Execution DSL should be able to orchestrate both types of flows:  Intra-Pod and Inter-Pod.  The choice of which to use or both is expected to be dependent on the needs of the particular calculation.  The main point is that the Execution DSL should be able to handle both types of orchestration in the same system.


No comments: