jmenke blog: Notes on the Uber language of compute

Just as every neuron in the brain is not firing at once… such is the design of the fabric of compute and goal of the Uber Language of compute is to make an approximation of such a system.

Images of neural activity show only segments being active at once. This is like the fabric of compute - the entire fabric can be viewed as all the possible combinations of all calculations being run at the same time. The reality is that you don't need to have all parts of the fabric allocated at the same time. There can be some optimization even without resorting to server-less compute.

The "player piano" for scheduling becomes the key to batch operations - provisioning base compute resources ahead of need and shutting them when they are no longer needed

Not actually all the complex… just have to know the approximate load and spin up nodes with node selectors pinned to segment of work… so a whole set of compute materializes like a neuron firing in brain or a lightening flashing across the sky. Just as the neuron flash and the lightening are ephemeral so are the compute resources for any given task.

As we look at the nature of compute from the perspective of the brain, we can turn to physics to see how changing data affects calculations. What prevents time travel and establishes the arrow of time? The 2nd Law of Thermodynamics

In order to calculate correctly you need precise and consistent measurements. Need to prevent entropy??? How? We want the same answer when we run the calculation tomorrow as we got today. In order to do this there can be no entropy in the data. This means we need to use snapshots. The exact same data for the beginning parts of the calculation need to be used at the end of the calculation. This will enable precision of results.

Snapshots Enable Reproducibility Limit Entropy to Zero

Speed Limit - to a problem... At some point you can't compute any faster with parallelization. How do you deliver updates faster than this speed limit allows. Secret is give the appearance of speed by providing more frequent updates. Its the next best thing.

Each domino represents a time slice of data. Each time slice is fed into the system. If you want updates every 10 minutes but your calculation takes an hour then having more concurrent calculations using time sliced data is the answer. This ends up looking like a Ferris wheel. Compute is provisioned on the wheel and as calculations are finished new calculations are started using existing resources with new data.

Physical compute instances are pegged to a particular car in the Ferris wheel of compute, and when not needed, the instances come down. What is interesting is that you can add / remove Ferris wheel cars on the fly devoting more or less resources to a processing chain.

Complex hierarchies of compute structures feeding off centralized data stores (located directly next to the compute resources) is similar to the Mandelbrot pattern that repeats itself where large data stores cache information and pass down to smaller data stores. At each level compute is provisioned next to the data.

Generally, batch operations do not need low-latency data access... and systems can be designed without access to low latency data... BUT - a system that can handle both batch and real time without changing structure would consolidate the design. You might not need low latency for batch, this is true, but it can't hurt.

If you want to stay in your comfortable world of single execution batch, you can choose the blue pill, but if you want to process a stream of data take the red pill. The point is, you are staying in the same code base...not switching to another compute paradigm.

Is this a good choice? Are there any examples out there of companies seeing value in a single code base?

Google’s choice in this trade-off was not to invest heavily into serverless solutions. Google’s persistent containers solution, Borg, is advanced enough to offer most of the serverless benefits (like autoscaling, various frameworks for different types of applications, deployment tools, unified logging and monitoring tools, and more). The one thing missing is the more aggressive scaling (in particular, the ability to scale down to zero), but the vast majority of Google’s resource footprint comes from high-traffic services, and so it’s comparably cheap to overprovision the small services.

Thus, the benefits of having one common unified architecture for all of these things outweigh the potential gains for having a separate serverless stack for a part of a part of the workloads.

from https://learning.oreilly.com/library/view/software-engineering-at/9781492082781/ch25.html#level_of_abstraction_serverless

As discussed above, the ability to use advanced scheduling can get non serverless applications closer to "scale to zero" Also, the driving factor in the use case above is access to low latency data - making it possible to position compute next to data and not having to do network hops for data when not needed.

If your workflow is truly unpredictable then it's possible serverless will offer startup advantages on truly ad-hoc compute needs, but if your business can project the types of calculations that need to be performed to a large degree and keeping a small amount of pre-provisioned compute will cover the ad-hoc needs... then a non serverless design might be simpler than forcing all your code to conform to serverless patterns

jmenke blog

Wednesday, November 10, 2021

Notes on the Uber language of compute

No comments: