jmenke blog: 3 DSLs - Resource - Execution

In previous posts, I have talked about the 3 DSLs that comprise the system.

Recently, I have been focusing on 1 and 2

1. Resource DSL - SynchronizationSet, DataPond, etc

2. Execution DSL - 4 go patterns

Some planning needs to be done on the Data DSL (The data feed for the Data Pond)

Some suggested objects in the Data DSL are:

1. SyncJobDefinition - job level parameters

2. SyncSetDefinition - identifies "what" is being synced

Note:

This DSL could be implemented on top of several databases.

I am selecting to implement on top of MySQL/MemSQL as memsql has support for Kafka built in and Debezium can sync a MySQL database with MemSQL using MemSQL kafka pipeline.

https://debezium.io/documentation/reference/0.10/connectors/mysql.html

Even though the first implementation of the DataDSL will be implemented with MemSQL components, an effort should be made to keep the interfaces generic and not bound to MemSQL if possible.

Two Phases: (11/19/2019 switching to MySQL master)

Data Extraction Phase -

From a Master MySQL Database: must init once at DataPond creation time

Do this via MySQL and SQL libraries in Go

http://programming.freeideas.cz/2017/02/17/golang-feed-data-from-mysql-into-memsql/

Data Sync Phase - via Debezium CDC connector into KAFKA pipeline or forked kafka source feed into main

DataSync Back to Master (results)

Do this via MySQL and SQL libraries in Go

Note: The goal of this part of the system is to provide snapshot data to compute containers located on the same physical node. The satellite databases would be read only representations of portions of the main data set.

jmenke blog

Sunday, November 17, 2019

3 DSLs - Resource - Execution - Data

No comments: