jmenke blog: A DSL for execution of batch operations

Snapshotting and CPU Manager optimizations within the context of Channel-based graph processing.

In previous posts, I have described a system that will process time-sliced data sets. In this article, I will explore how by using Kubernetes CPU Manager and Go Channels a distributed batch processing system could be provided with a compute fabric for each problem set that spans machine boundaries.

Data would come into a controller object written in Go that would use a network of channels to execute an orchestration DSL describing the problem data flow. Hardware backed compute structures would be created in Kubernetes and the Chanel based orchestration language would handle delegating work between stages of a problem definition.

What is the consequence of time-slicing data when using such a system. When we only have one immutable set of data to process, then by definition any derived data structures will have deterministic values. This condition will hold even across machine boundaries. Partial results from code executed on multiple pods could be combined. Data could be kept on the stack and results could be assembled via a network of channels custom constructed to represent an execution flow.

A DSL would provide the language for the generation of the channel networks in Go code. In this way, orchestration blueprints could be saved for execution within different problem sets. This DSL could be stored in Kubernetes as a custom CRD that would hold status as to the state of the execution flow. By piggybacking on the Kubernetes API we are decreasing the amount of code we need to write; also by piggybacking on Go Channels for orchestration, we are also decreasing the amount of code we need to write.

Moving further along in this train of thought, in such a system, by using the Kubernetes CPU manager to give Pods the exclusive use of a CPU, performance gains might be able to achieved via the fact that a problem data set is bound to a single immutable set of values.

It is simple to see that if a Pod is only ever provided one set of data, that the cache on that CPU will only contain values for that exact problem set; it would be impossible for the cache to hold any value that was not consistent across the entire set of machines processing the graph.

jmenke blog

Saturday, September 14, 2019

A DSL for execution of batch operations

No comments: