Monday, January 03, 2022

Looking under the hood at AWS - a path to data locality

Looking at firecracker... apparently the same technology that drives Lambda and Fargate can be controlled vs just handing control over to AWS.  

In particular, the spin up time for EKS clusters is very interesting

Firekube pulls everything from Git, detects your operating system and can boot up a secure cluster of VMs from nothing in 2.5 minutes.

https://www.weave.works/oss/firekube/

The game changer here is since you control the node running the microVM's maybe you regain control over data locality???

In past posts we have talked about the KBL = Kubernetes Based Lifeform

The issue has been that it's slow to spin up reasonable size nodes that would enable data locality for a set of co-located pods using node selectors ( a KBL).  But now... instead of spinning up smaller machines in AWS just spin up a monster capable of running many contexts!!! Now bringing up a new MicroVM to establish a sub-context can occur in microseconds!!!  Contexts can be created/destroyed at will and systems can dynamically adjust to the current needs of multiple coordinating contexts dynamically.   

We have talked about KBL's talking to KBL's that spawn KBL's etc... the MicroVM makes this possible by giving data locality a place in the primordial soup that will spawn more complex life forms.  We are just one part removed from any cloud provider...it needs to provide us with the raw building blocks (the "metal" machines)  --- kind of like big fish tanks where these automated entities can spawn and exist.

A friend of mine, Steven, has been explaining the metaverse to me.  The following ideas are a result of my conversation's with him:

It's all starting to sound similar to the metaverse ???  

Reality could be segmented into groups of "metal" nodes which then provide the "primordial soup" for KBL's to exist.  KBL's would be aware of their location in a particular segment and there could be logic to migrate a KBL from one segment to another.  The KBL's would all have fast connections to their data (avoid latency issues) via data locality provided by storing data locally to the KBL compute.  

It would seem there is an upper limit to the amount of resources that could share a local context (that limit being the metal instance itself) at least at the aggregate level.  A single KBL would always have access to localized data.  A group of KBLs could share a localized space but this hits the machine limit.  In this case a scope of "rack" might be used... where data is positioned for sharing at the rack level.  

As for fault tolerance... as discussed in previous posts the "raid" like positioning of data could help to handle the use case of node failure.  Sync mechanisms would need to be in place to keep slave machines updated.  These copies could be on different racks to limit the blast range of a node failure.

In fact it could be a data only sync -- the compute processes would not actually need to be running because of the fast spin up time of firecracker.  There could be spin up on failure with local data - or maybe this could be exception to the rule and sync data would be from the network.  Depends on how fast you need to seed the data.

No comments: