jmenke blog: Uber Language of Compute v2.0: AI powered design

A few months back, I was watching an interview with Eric Schmidt, and he was talking about the real impact AI will have beyond just generating code. As I recall—I don’t have a link to the video—he was saying that AI was going to be a game changer for people trying to invent new things. I thought the idea was interesting but never connected it to my previous thought experiment on the Uber Language of Compute.

As it turns out, Mr. Schmidt was, of course, 100% correct. When I apply this idea to my previous work, it’s unblocking me. I had run into an issue with the sheer amount of work that needed to be done to take the idea from concept to fruition, but that is changing. Here is the output from a quick 20 minute sessions with Grok exploring one of the ideas needed for the design:

When considering testing and modularity, the multi-step process (your original hot-swap approach with 20 separate containers) offers distinct advantages over the single-container approach (all steps in one container). Let’s break down the gains of the multi-step process in these areas, focusing on how it impacts development, maintenance, and deployment, while keeping the context of your sequential process with 20 containers, each needing 256GB of memory.

Multi-Step Process (Hot-Swap) Recap

20 containers, each handling one step, hot-swapped in a daisy-chain sequence.
Only 2 containers active at once (256GB constant, 512GB peak during transitions).
Uses OpenKruise for in-place updates and container management.

Single Container Recap

All 20 steps in one container, running sequentially.
Fixed memory allocation (~265GB peak and constant).
No swapping or dynamic updates.

Gains of Multi-Step Process for Testing

Testing is a critical phase, and splitting the process into multiple containers provides several benefits:

Isolated Testing
- Multi-Step: Each container (step) can be tested independently. You can spin up a single container (e.g., Step 7) with its 256GB allocation, feed it mock input from the scratch space, and verify its output without running the entire chain.
- Single Container: You must test the entire sequence or add logic (e.g., flags or conditionals) to isolate a step, which increases complexity and runtime (e.g., skipping steps 1-6 to test Step 7 still requires loading the full app).
- Gain: Faster, more focused testing cycles—test one 256GB container in ~10 minutes vs. potentially running a 200-minute sequence to reach a specific step.
Error Isolation
- Multi-Step: If Step 12 fails (e.g., crashes or produces bad output), you can pinpoint it immediately—logs and exit codes are per-container. Restarting or debugging just that container is straightforward.
- Single Container: A failure in Step 12 requires sifting through a monolithic log or adding step-specific tracing, and you might need to rerun the full 200 minutes to reproduce it.
- Gain: Reduced debugging time and clearer failure boundaries, improving mean-time-to-resolution (MTTR).
Parallel Test Development
- Multi-Step: Multiple developers or CI/CD pipelines can test different steps concurrently (e.g., Step 3 on one node, Step 15 on another), each with its own 256GB pod.
- Single Container: Testing is inherently serial—everyone works on the same monolithic image, potentially causing bottlenecks.
- Gain: Improved team productivity and faster test suite execution (e.g., 20 steps tested in parallel could finish in 10 minutes vs. 200 minutes serially).
Incremental Validation
- Multi-Step: You can validate the chain incrementally—run Steps 1-5, check the scratch space, then add Step 6, etc. This is natural with hot-swapping.
- Single Container: Incremental testing requires modifying the app to stop after a specific step, adding complexity.
- Gain: Easier validation during development, reducing the risk of late-stage integration issues.

Gains of Multi-Step Process for Modularity

Modularity refers to the ability to develop, update, and maintain components independently. The multi-step approach shines here:

Independent Development
- Multi-Step: Each step is a separate container with its own codebase, dependencies, and image. Developers can work on Step 9 (e.g., a Python app) without touching Step 10 (e.g., a Java app).
- Single Container: All steps share one image, forcing a unified tech stack or complex multi-language builds (e.g., embedding Python, Java, and Go in one Dockerfile).
- Gain: Flexibility in tools and languages per step, plus reduced merge conflicts—each container is a standalone module.
Smaller, Focused Images
- Multi-Step: Each container can be optimized for its step (e.g., Step 1 uses a 100MB base image with specific libs, Step 2 uses a 50MB image). Total image size might be 1-2GB across 20 containers.
- Single Container: One image must include all dependencies for all steps, potentially ballooning to 5-10GB if steps have diverse needs.
- Gain: Smaller images per step mean faster builds, pulls, and deployments, plus lower storage overhead.
Granular Updates
- Multi-Step: If Step 14 needs a bug fix or upgrade, you update only its image (e.g., my-app:step14:v2) and hot-swap it in. The chain continues without rebuilding everything.
- Single Container: Any change (e.g., fixing Step 14) requires rebuilding and redeploying the entire 265GB container, even if other steps are unchanged.
- Gain: Faster update cycles and lower risk—deploy a 100MB image vs. a 5GB one, with no impact on unrelated steps.
Reusability
- Multi-Step: A step (e.g., Step 3: data preprocessing) can be reused in another workflow by referencing its container image.
- Single Container: Steps are tightly coupled in one app, making reuse harder without extracting logic into a separate library or service.
- Gain: Higher code reuse potential, reducing duplication across projects.

Quantitative Estimates

Let’s put some numbers to these gains, assuming each step takes 10 minutes to run and test, and development involves 5 developers:

Testing Gains

Test Time per Step:
- Multi-Step: 10 minutes to test one container.
- Single Container: Up to 200 minutes to test a late step (e.g., Step 20), or ~100 minutes average with skip logic.
- Gain: ~10-20x faster per-step testing (10 vs. 100-200 minutes).
Parallel Testing:
- Multi-Step: 20 steps tested by 5 developers in ~40 minutes (4 steps each, 10 minutes).
- Single Container: 200 minutes serial, no parallelism.
- Gain: ~5x faster test suite completion.

Modularity Gains

Build Time:
- Multi-Step: 20 containers, 5 minutes each to build = 100 minutes total, parallelizable to 20 minutes with 5 builders.
- Single Container: 1 build, 20-30 minutes (larger image, all deps).
- Gain: ~1-1.5x faster builds with parallelism.
Update Deployment:
- Multi-Step: 1-minute swap for a 100MB image.
- Single Container: 5-10 minutes to redeploy a 5GB image.
- Gain: 5-10x faster updates.

Trade-Offs of Multi-Step Process

Complexity: Requires OpenKruise, a custom controller, and swap logic—higher setup and maintenance cost vs. the single container’s simplicity.
Runtime: 220 minutes (with 1-minute transitions) vs. 200 minutes (single container), a 10% increase.
Peak Memory: 512GB vs. 265GB, a 93% increase during transitions.

Conclusion: Gains for Testing and Modularity

The multi-step hot-swap process offers significant gains in testing and modularity:

Testing: 10-20x faster per-step testing, 5x faster suite execution, and better error isolation—ideal for iterative development and CI/CD.
Modularity: Independent development, smaller images, granular updates, and reusability—perfect for a team or evolving process.

If testing speed and modularity are priorities (e.g., frequent updates, multiple developers, or reusable components), the multi-step approach outweighs its complexity and slight runtime/memory costs. If simplicity and minimal resource use are key (e.g., a stable, one-off process), the single container wins.

What’s your focus—rapid development and testing, or deployment simplicity? I can tailor further advice based on that!

jmenke blog

Sunday, April 06, 2025

Uber Language of Compute v2.0: AI powered design

No comments: