jmenke blog: Uber Language of Compute: Hot swapping containers in daisy chain

In about 10 minutes... Grok was able to advance the design of the hot-swapping portion a little bit...

Your idea of designing a system to "hot swap" containers within a Kubernetes pod using OpenKruise’s advanced features is intriguing and aligns well with some of the capabilities OpenKruise provides, particularly its support for in-place updates and sidecar management. Let’s break this down step-by-step to explore how you could implement this, leveraging OpenKruise, while considering memory footprint optimization and a daisy-chain process.

Understanding the Goal

You want to:

Avoid starting all containers in a pod simultaneously to reduce memory usage.
"Hot swap" containers in a sequence (daisy-chain style), keeping only two active at a time.
Share a pod’s scratch space (e.g., an emptyDir volume) across this chain for data handoff.

This approach could minimize the memory footprint by ensuring that only a subset of containers is running at any given time, with each container passing its output to the next via shared storage.

Leveraging OpenKruise Features

OpenKruise extends Kubernetes with advanced workload management capabilities that could support your design. Here’s how its features align with your idea:

In-Place Updates:
- OpenKruise supports in-place container updates, meaning you can change a container’s image or configuration without recreating the entire pod. This is faster and less resource-intensive than the default Kubernetes behavior of recreating pods.
- You could use this to swap out one container for another in the sequence without disrupting the pod or the shared scratch space.
Container Launch Priority:
- OpenKruise’s Container Launch Priority feature allows you to define the startup order of containers within a pod. By annotating the pod with apps.kruise.io/container-launch-priority: Ordered, containers start sequentially based on their order in the spec.containers list.
- This could help you control the daisy-chain process by ensuring containers activate in a specific sequence.
SidecarSet for Dynamic Container Management:
- OpenKruise’s SidecarSet resource allows you to inject and manage sidecar containers independently of the main workload. You could use this to dynamically introduce or update containers in the pod as part of the hot-swap process.
- It also supports in-place updates for sidecars, which could be adapted to swap containers without pod recreation.
Pod Lifecycle Hooks:
- OpenKruise provides lifecycle hooks (e.g., PreDelete in Advanced DaemonSet or similar constructs) that could be extended or customized to manage container transitions, ensuring one container completes and hands off data before the next starts.

Proposed Design

Here’s a high-level design for your hot-swapping system using OpenKruise:

Pod Setup

Define a pod with an emptyDir volume as the shared scratch space.
Initially, include two containers: the first "active" container (C1) and a lightweight "placeholder" container (P) that does minimal work (e.g., sleeps or exits quickly).

Daisy-Chain Process

Step 1: Start with C1 and P
- C1 runs its task and writes output to the shared emptyDir volume.
- Use OpenKruise’s Container Launch Priority to ensure C1 starts before P.
Step 2: Swap P for C2
- Once C1 completes (detected via a readiness probe or custom logic), update the pod spec in-place using OpenKruise to replace P’s image with the second container (C2).
- C2 starts, reads C1’s output from the scratch space, and begins its task.
Step 3: Swap C1 for C3
- When C2 is done, update the pod spec again to replace C1 (now idle) with the third container (C3).
- C3 reads C2’s output from the scratch space and proceeds.
Repeat
- Continue this pattern, alternating between the two container slots, swapping out the idle container for the next in the chain.

Memory Footprint Optimization

By keeping only two containers active (one working, one as a placeholder or in transition), you limit the memory footprint to the needs of those two containers at any given time.
The placeholder container can be a minimal image (e.g., busybox with a sleep command) to consume negligible resources until swapped.

Example Workflow

yaml

apiVersion: v1
kind: Pod
metadata:
  name: hot-swap-pod
  annotations:
    apps.kruise.io/container-launch-priority: Ordered
spec:
  containers:
  - name: container-1
    image: my-app:c1
    volumeMounts:
    - name: scratch
      mountPath: /scratch
  - name: placeholder
    image: busybox
    command: ["sleep", "infinity"]
    volumeMounts:
    - name: scratch
      mountPath: /scratch
  volumes:
  - name: scratch
    emptyDir: {}

Step 1: C1 runs, writes to /scratch, and signals completion (e.g., via a file or exit code).

Step 2: Use OpenKruise to patch the pod spec in-place:

yaml

spec:
  containers:
  - name: container-1
    image: my-app:c1  # Still present but idle
  - name: placeholder
    image: my-app:c2  # Swapped in

Step 3: After C2 finishes, patch again:

yaml

spec:
  containers:
  - name: container-1
    image: my-app:c3  # Swapped in
  - name: placeholder
    image: my-app:c2  # Idle or revert to placeholder

Implementation Considerations

Control Logic:
- You’ll need a controller or script to monitor container states (e.g., via Kubernetes events or probes) and trigger the in-place updates. OpenKruise doesn’t natively automate this daisy-chain sequence, so you’d extend it with custom logic.
OpenKruise Integration:
- Use CloneSet or Advanced StatefulSet with in-place update strategies (InPlaceIfPossible) to manage the pod lifecycle.
- Alternatively, explore SidecarSet to dynamically inject and update containers, though it’s typically used for sidecars rather than primary app containers.
Shared Scratch Space:
- The emptyDir volume persists as long as the pod exists, making it ideal for passing data between containers without external storage.
Resource Limits:
- Set CPU/memory requests and limits on containers to ensure the pod’s total resource usage remains predictable, even during swaps.
Error Handling:
- If a container fails, ensure the system can revert to a placeholder or retry the swap without disrupting the chain.

Benefits

Memory Efficiency: Only two containers are active at once, reducing the peak memory footprint compared to starting all containers simultaneously.
Speed: In-place updates avoid pod recreation, minimizing downtime and resource churn.
Flexibility: The daisy-chain approach can scale to any number of containers in sequence, limited only by your control logic.

Challenges

Complexity: Managing the swap sequence requires additional orchestration beyond OpenKruise’s out-of-the-box features.
State Detection: Accurately detecting when a container completes its task (to trigger the next swap) may require custom probes or signaling mechanisms.
Kubernetes Compatibility: Ensure your cluster supports OpenKruise and any required feature gates (e.g., InPlacePodVerticalScaling if adjusting resources).

Next Steps

Prototype: Set up a small proof-of-concept with OpenKruise installed in a Kubernetes cluster. Test with a simple chain of 3-4 containers.
Automate: Build a custom controller or use a script with kubectl patch to automate the swapping process based on container status.
Measure: Compare memory usage and performance against a baseline where all containers start at once.

This design leverages OpenKruise’s strengths effectively while pushing its capabilities into a creative use case. If you’d like, I can help refine the YAML configs or brainstorm the control logic further! What do you think about this approach?

jmenke blog

Sunday, April 06, 2025

Uber Language of Compute: Hot swapping containers in daisy chain

No comments: