Saturday, November 23, 2019

Where's the batch?





So it just dawned on me that I have been talking about a "batch" system but I have really been talking about the conditions under which calculations should be executed.

The three DSLs:  Resource Execution and Data describe how Kubernetes resources should be provisioned, how the flow of execution should be split up by containers, and how data should be passed to the calculations.  However, this does not a batch framework make.  

Enter kube-batch and volcano, developed by the kubernetes-sig and used by many other projects already.  Could the ideas of data locality, isolation of execution environment with snapshots, and execution flow between containers be combined with kube-batch?


Rather than re-invent the wheel to implement the ideas we have been discussing I think I should be looking into how much can be re-used from kube-batch and volcano.  Could kube-batch/volcano be used to execute the flow DSL in resources that we provision with our Resource DSL using data provided by the Data DSL?  

As is often the case, when you climb one mountain, you find another lies behind it.  This actually may end up being a simplification and will reduce the overall coding effort while decreasing the amount of code that needs to be maintained internally.

It may be possible to create a new type of task for our execution requirements (executing the Flow DSL with resources defined by the Resource DSL and Data DSL)

If we would substitute in our CRDs into the pod template then our operators would handle the creation of all the related structures (data DSL).

It appears Volcano is set up for this.  Could our method of execution be just another type of task?  It seems like we can map our DSLs on top of the Volcano architecture.  



So could it be possible that we have our Controller pod and a set of worker pods as the different pods in a multi pod deployment Job Spec here?   Our SyncSet definition would become a template to generate Volcano JobSpecs.  This would handle mapping the Resource DSL into Volcano? 

Note: The original design for SyncSet theorized that it could represent a complex graph of related k8's services.  It might be possible to adjust the JobSpec to take a higher-level abstraction that would allow the creation of more complex structures for a job.  (not only pods) 

We had some tight restrictions on the placement of the Data Pond in our design.  It was going to be driven by node-selector on the worker pods.  This could probably still work.  The Data Pond would have to watch for Pod creation via Volcano and bootstrap itself on the same node as the resources.  A watch on the Data Pond could signal that the set of pods was ready for execution.  The flow DSL could still be executed as we have defined the "Controller" Pod and the worker Pods in the multi pod template.  

On a related note:  The people at solo.io have been working on workflows also.


At Solo.io, we believe the true power in service mesh comes from their respective programmable interfaces. Autopilot allows you to easily drive the service mesh interface to do interesting things that in the past would have been hand-crafted and bespoke.    






No comments: