Monday, January 25, 2021

Where is the mandelbrot?


In the previous post I mentioned that the self-similarity pattern is starting to emerge, but where is the Mandelbrot (the self similarity)?  A hierarchical pattern for data aggregation was shown in the previous post, but this is not the Mandelbrot pattern of self similarity.  As it turns out, the connection to the Mandelbrot pattern lies in envisioning that the large circles at each level are ponds of data.  The circles are black box's, any compute can occur inside them, the pattern for compute is hierarchical, but the pattern for data evolution is that of the Mandelbrot.  From the data perspective, zooming into the Mandelbrot is the process of taking raw data and transforming it.  This is just a way of visualizing the data flow and patterns that are used to construct the system.  We don't actually have a Mandelbrot here... it's "like" a Mandelbrot.  Data from smaller Mandelbrot segments can be combined to make bigger Mandelbrot sections. 



It's data that flows like the Mandelbrot.  Compute rearranges itself around the data - bringing compute to the data. 


As mentioned earlier, it will be necessary to have some redundancy in data distribution so that compute can be resilient to node failures - something like RAID.  This is needed as data locality plays a part in this design.  Instead of using data-center level databases, data is kept as close to the compute as possible.

One of the keys to high performance compute is the concept of data locality.  If data can be kept on a node and new resources can be provisioned on that node, then you have the basis for compute engine that can be data centered.  The key here is brining to compute to the data not the data to the compute.

Note in the previous blog post, a Compute Wheel pattern was suggested that continuously processes live data, there also can be a corresponding pattern that does a single execution of a flow of operations.  In effect you are taking out the Compute Wheel and starting from the Compute Context - you still shard the execution across nodes, but you don't cycle.  You can aggregate similarly in both scenarios.   One ramification of using Compute Wheels is that when moving from one aggregation set (the whole structure)  to another BOTH sets are kept alive.  In the single execution model, transitioning from aggregation set to aggregation set allows removing the previous aggregation set.  So there is a windowed ( single execution model) and non-windowed Mandelbrot.

The pattern here is that the Compute Context - is associated with a Node and can be wrapped in a Compute Wheel.  The compute context should not know about the Wheel.  The wheel knows how to cycle through compute nodes.

Also covered earlier: within a compute instance ( a Pod) - you can have a similar cycling to the wheel using data locality also.  This was been covered in earlier posts.  The difference here is that all the data being processed is from the same time slice.  Containers might be able to be swapped out during a flow of operations.    




No comments: