Sunday, November 17, 2019

3 DSLs - Resource - Execution - Data

In previous posts, I have talked about the 3 DSLs that comprise the system.

Recently, I have been focusing on 1 and 2

1. Resource DSL - SynchronizationSet, DataPond, etc
2. Execution DSL - 4 go patterns

Some planning needs to be done on the Data DSL (The data feed for the Data Pond)

Some suggested objects in the Data DSL are:

1.  SyncJobDefinition - job level parameters

2.  SyncSetDefinition - identifies "what" is being synced





Note:

This DSL could be implemented on top of several databases. 

I am selecting to implement on top of MySQL/MemSQL as memsql has support for Kafka built in and Debezium can sync a MySQL database with MemSQL using MemSQL kafka pipeline. 



Even though the first implementation of the DataDSL will be implemented with MemSQL components, an effort should be made to keep the interfaces generic and not bound to MemSQL if possible.

Two Phases: (11/19/2019 switching to MySQL master)

Data Extraction Phase -

From a Master MySQL Database:  must init once at DataPond creation time

Do this via MySQL and SQL libraries in Go



Data Sync Phase - via  Debezium CDC connector into  KAFKA pipeline or forked kafka source feed into main


DataSync Back to Master (results)

Do this via MySQL and SQL libraries in Go


Note: The goal of this part of the system is to provide snapshot data to compute containers located on the same physical node.  The satellite databases would be read only representations of portions of the main data set.  




No comments: