Recently, I have been focusing on 1 and 2
1. Resource DSL - SynchronizationSet, DataPond, etc
2. Execution DSL - 4 go patterns
Some planning needs to be done on the Data DSL (The data feed for the Data Pond)
Some suggested objects in the Data DSL are:
1. SyncJobDefinition - job level parameters
Note:
This DSL could be implemented on top of several databases.
I am selecting to implement on top of MySQL/MemSQL as memsql has support for Kafka built in and Debezium can sync a MySQL database with MemSQL using MemSQL kafka pipeline.
This DSL could be implemented on top of several databases.
I am selecting to implement on top of MySQL/MemSQL as memsql has support for Kafka built in and Debezium can sync a MySQL database with MemSQL using MemSQL kafka pipeline.
Even though the first implementation of the DataDSL will be implemented with MemSQL components, an effort should be made to keep the interfaces generic and not bound to MemSQL if possible.
Two Phases: (11/19/2019 switching to MySQL master)
Data Extraction Phase -
From a Master MySQL Database: must init once at DataPond creation time
Do this via MySQL and SQL libraries in Go
Data Sync Phase - via Debezium CDC connector into KAFKA pipeline or forked kafka source feed into main
DataSync Back to Master (results)
Do this via MySQL and SQL libraries in Go
Note: The goal of this part of the system is to provide snapshot data to compute containers located on the same physical node. The satellite databases would be read only representations of portions of the main data set.
No comments:
Post a Comment