jmenke blog: IPC, TCP, and POSIX Shared Memory

It seems that using the tools available to us, we can create "compute-units" with Kubernetes that share pinned CPU resources and are able to communicate efficiently via IPC or should we use localhost, or should we use POSIX shared memory?

In the above diagram, a "QB" container would have a set of channels that provide the blueprint for a data flow, where multiple containers could collaborate on solving a problem. These containers would be pinned to a set of CPU's.

This explains how to allocate a socket of CPUs to a Kubernetes Pod.

https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/

This explains how container to container communication is possible via IPC

https://www.mirantis.com/blog/multi-container-pods-and-container-communication-in-kubernetes/

Looking for benchmarks, I came across this link:

https://stackoverflow.com/questions/14973942/tcp-loopback-connection-vs-unix-domain-socket-performance

It shows that TCP has higher latency and much slower throughput.

This benchmark: https://github.com/rigtorp/ipc-bench provides latency and throughput tests for TCP sockets, Unix Domain Sockets (UDS), and PIPEs.

Here you have the results on a single CPU 3.3GHz Linux machine :

TCP average latency: 6 us

UDS average latency: 2 us

PIPE average latency: 2 us

TCP average throughput: 0.253702 million msg/s

UDS average throughput: 1.733874 million msg/s

PIPE average throughput: 1.682796 million msg/

The link shows further information about a Redis benchmark comparing TCP and UDP. The Redis report had one interesting condition: they note that the difference only matters when throughput is high.

Since containers in a Pod share the local file system, it would seem to follow that using the localhost loopback to send events from container to container should be acceptable in a compute system if the only data that was transmitted was the messaging signal (not high throughput). Instead of passing all the data across the TCP link, only the signal that an event has happened would be passed.

This sounds a lot like Go channels. The QB (quarterback) container would orchestrate the process of signaling other containers. It would use a graph of channels to do this - orchestrating the processing. Could channels be used to control access to the "shared" data? It would seem that just like channels can simplify synchronization in "in-process" calls, they could also orchestrate a series of distributed calls.

It should be noted though (as per the Mirantis link)

Containers in a Pod share the same IPC namespace, which means they can also communicate with each other using standard inter-process communications such as SystemV semaphores or POSIX shared memory.

It would seem that POSIX shared memory would be even faster than filesystem access for sharing data between containers as the data is in memory and would not require disk reads.

See: https://www.softprayog.in/programming/interprocess-communication-using-posix-shared-memory-in-linux

This brings up an interesting question. What about SSD?

Related Links:

The Job object is not designed to support closely-communicating parallel processes, as commonly found in scientific computing. It does support parallel processing of a set of independent but related work items.

https://kubernetes.io/docs/tasks/job/fine-parallel-processing-work-queue/

https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#job-patterns

Note: In considering the QB scenario above, since the CPUs are pinned to a pod's workload, it might have a negative effect on CPU utilization if the workflow has stages where a lower number of containers are used. CPU utilization would be tied to the "breadth" of the channel graph at any given point. The positive effects of pinning CPUs to pod workflows could be offset by this utilization problem.

Relative speed of operations 2019

https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html

jmenke blog

Saturday, October 26, 2019

IPC, TCP, and POSIX Shared Memory

No comments: