Monday, August 10, 2020

CDC Join and Comparison with MemSQL

 Its looking like Hazelcast Jet might support many of the operations originally envisioned for the Data DSL out the the box.  This would greatly simplify this part of the equation.  

https://jet-start.sh/docs/tutorials/cdc-join

If the Data DSL could delegate down into the Hazelcast Operator for setup and then use the Jet Engine for pulling data, it could provide all the features without re-inventing the wheel. 

The same DSLs exist in the overall picture:  Data, Resource, Routing, Execution - but maybe the hard lifting for Data can be handled by Hazelcast?

It would still be possible to create isolated islands of compute with the Resource and Routing DSLs with each island having its own Data configuration.  This was the plan, now with pipelines and CDC this could be easy to implement.

Update 8/11

MemSQL vs Hazelcast

For MemSQL

One of the key features of MemSQL was the ability to structure and append only cache.  This would give lock-less updates and consistency per a specific point in time.  

As for HazelCast

To alleviate this issue, the latest release of Hazelcast Jet also includes windowing functionality which enables users to evaluate stream processing jobs at regular time intervals, regardless of how many incoming messages the job is processing. Hazelcast Jet offers three types of windows:

Fixed/tumbling – time is partitioned into same-length, non-overlapping chunks. Each event belongs to exactly one window.

Sliding – windows have fixed length, but are separated by a time interval (step) which can be smaller than the window length. Typically the window interval is a multiplicity of the step.

Session – windows have various sizes and are defined basing on data, which should carry some session identifiers.

What MemSQL can offer in addition:  It would appear that MemSQL can hold historical values.  More investigation is needed to see exactly how Hazelcast handles this, but for debug purposes being able to replay data might prove to be valuable.  

It would also seem that reasoning about historical data might be easier in the MemSQL.  Just select within the time window.  

This could enable offline debug and replication of environments for support. 



No comments: