Sunday, October 06, 2019

DSL for execution of batch operations (Part 3)

This should be a Kubernetes CRD.    A data Flow CRD

Advantages of this approach:
  1. built-in parsing for YAML vs using something like https://github.com/go-yaml/yaml
  2. ability to see changes in the objects and react inside the cluster (have a controller manage the state of the data flow)



So the CRD would need to express a data flow

components of this CRD or sets of CRDs would be

a Node in the graph
the children of the node
links to other nodes
the container to be used for the execution of the node

It's basically the Composite pattern:







Based on this tree structure 

A generator would take this information from the CRD and
  1. generate static code that implemented the flow in channels
How to make this dynamic?  So that code generation is not offline but part of the process of spinning up containers?

A Quarterback container could have rights to query the Kubernetes API to get bootstrap information for this Flow DSL

It could:

  1. get the info
  2. generate and compile the code
  3. run the compiled code

Maybe an init-container could be used for this to decouple this code from each QB controller.

As per the CRD documentation.  A CRD can contain fields of any valid JSON

After the CustomResourceDefinition object has been created, you can create custom objects. Custom objects can contain custom fields. These fields can contain arbitrary JSON.

This arbitrary JSON could describe the flow of data



Note: I have previously blogged on using JSON for expressing the structure of Generations and Synchronization sets.  

These look like a common theme in using the Kubernetes API as a development platform. 

1. A CRD is created that defines an object to be managed by the Kubernetes API
2. Complex data represented in the CRD could be contained in arbitrary JSON

So its not the whole object structure that is defined in the CRD specification, but the "shell" for storing such data.  It may be the fact that a CRD does not need to embed complex JSON, but the route to expressing complex information seems to be via the arbitrary JSON path.

Note:  This arbitrary JSON could be the language of the DSL vs YAML.  It would be nice if there was a similar structure for parsing YAML, but currently, CRDs seem to support JSON.

Possibly something like this could be used:

https://stackoverflow.com/questions/37150668/json-schema-for-tree-structure


{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "$ref": "#/definitions/node",
  "definitions": {
    "node": {
      "properties": {
        "Id": {
          "type": "integer"
        },
        "Label": {
          "type": "string"
        },
        "Children": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/node"
          }
        }
      },
      "required": [
        "Id"
      ]
    }
  }
}

These data structures could be imported into Go via a tool like this


This schema
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "title": "Example",
  "id": "http://example.com/exampleschema.json",
  "type": "object",
  "description": "An example JSON Schema",
  "properties": {
    "name": {
      "type": "string"
    },
    "address": {
      "$ref": "#/definitions/address"
    },
    "status": {
      "$ref": "#/definitions/status"
    }
  },
  "definitions": {
    "address": {
      "id": "address",
      "type": "object",
      "description": "Address",
      "properties": {
        "street": {
          "type": "string",
          "description": "Address 1",
          "maxLength": 40
        },
        "houseNumber": {
          "type": "integer",
          "description": "House Number"
        }
      }
    },
    "status": {
      "type": "object",
      "properties": {
        "favouritecat": {
          "enum": [
            "A",
            "B",
            "C"
          ],
          "type": "string",
          "description": "The favourite cat.",
          "maxLength": 1
        }
      }
    }
  }
}
generates
package main

type Address struct {
  HouseNumber int `json:"houseNumber,omitempty"`
  Street string `json:"street,omitempty"`
}

type Example struct {
  Address *Address `json:"address,omitempty"`
  Name string `json:"name,omitempty"`
  Status *Status `json:"status,omitempty"`
}

type Status struct {
  Favouritecat string `json:"favouritecat,omitempty"`
}

And then go itself could be used to generate Go code with a tool like this:


package main

import (
  "flag"
  "os"
  "text/template"
)

This could be the Tree Structure Here

type data struct {
  Type string
  Name string
}

func main() {
  var d data
  flag.StringVar(&d.Type, "type", "", "The subtype used for the queue being generated")
  flag.StringVar(&d.Name, "name", "", "The name used for the queue being generated. This should start with a capital letter so that it is exported.")
  flag.Parse()

Use template from Go to do the code generation

  t := template.Must(template.New("queue").Parse(queueTemplate))
  t.Execute(os.Stdout, d)
}

var queueTemplate = `
package queue

import (
  "container/list"
)

func New{{.Name}}() *{{.Name}} {
  return &{{.Name}}{list.New()}
}

type {{.Name}} struct {
  list *list.List
}

func (q *{{.Name}}) Len() int {
  return q.list.Len()
}

func (q *{{.Name}}) Enqueue(i {{.Type}}) {
  q.list.PushBack(i)
}

func (q *{{.Name}}) Dequeue() {{.Type}} {
  if q.list.Len() == 0 {
    panic(ErrEmptyQueue)
  }
  raw := q.list.Remove(q.list.Front())
  if typed, ok := raw.({{.Type}}); ok {
    return typed
  }
  panic(ErrInvalidType)
}
`

It should be noted that the creation of the channels via code generation is not the only option here.  It may be possible to write code to generate channels dynamically

The article discusses using Go channels to create a pipeline:


Building and Executing the Pipeline

Here’s how to build and execute the pipeline.
func runSimplePipeline(base int, lines []string) error {
  ctx, cancelFunc := context.WithCancel(context.Background())
  defer cancelFunc()


Note: the above article could highlight what base patterns could look like for either a generator or a code-based infrastructure.  It has been suggested that code generation is a way to get around the lack of generics in Go.

Note: Oreilly has an online course that discusses channel patterns.



Note:  Frameworks exist for generating generic data structures:

https://github.com/cheekybits/genny

Still not sure if generic code will be needed, but these are the tools that exist to take a specification created in JSON and transform it into executable code:

  1. Generation of flow implementation with a template as above ("text/template")
  2. Creation of channels via "template" methods in Go that would execute arbitrary sets of channel configurations based on parameters passed into methods.  For example,  a method could be created that would create 4 vs 5 child node (channels) and orchestrate communication between them. Or it could create a chain of operations between 2 or 3 containers.  Maybe the template method would generalize the type of flow that was created and make it possible to do this orchestration with any code generation.
  3. Use a templating framework such as cheekybits
In this use case there will probably be standardization on one of these options as the final solution, but, nevertheless, all of the solutions seem valuable for dynamic programming and should not be entirely discounted.




No comments: