Saturday, May 09, 2020

Getting down to the Calculations

Options and more Options - which language provides the best starting point?




So far, we have been exploring how to break up problem sets into pieces and how to orchestrate a flow of operations.  At this level we are still at the infrastructure level.  At the level of the Execution DSL.  This is no business logic so far.

The question becomes where do the calculations get performed?  And there are a few possible answers.

Logic in Java Containers:  Java is versatile and can call C++ / C via JNI.

Jorg Shad has some interesting information on thiis:  https://jaxenter.com/nobody-puts-java-container-139373.html

Logic in Go Containers:  Go can make ultra small containers that spin up fast.  Go can call C via CGo:  https://golang.org/cmd/cgo/

Logic in Python Containers:  Python is a very popular tool for data science and can integrate with C also :  https://realpython.com/python-bindings-overview/

All of these languages above have their strong points but all were developed as General Purpose languages from their start.  There is another option that can sometimes avoid having to shell out to C for speed:  Julia

Julia was designed for data science from the ground up.  Sounds interesting right?  Maybe its worth a look?

Here is info on running Julia in Kubernetes:

https://cloud4scieng.org/2018/12/13/julia-distributed-computing-in-the-cloud/

And here is info on how Julia solves the "2 language problem"

https://thebottomline.as.ucsb.edu/2018/10/julia-a-solution-to-the-two-language-programming-problem

It seems that people are saying you don't need to go down to the C level to get the work done:

In a case study from the Federal Reserve:  https://juliacomputing.com/case-studies/ny-fed.html

The Federal Reserve Bank of New York uses Julia to:

Estimate models 10x faster
Complete 'solve' test 11x faster
Reduce number of lines of code in half, saving time, increasing readability and reducing errors

It should be noted that Julia can also call C++  :  https://docs.julialang.org/en/v1/manual/calling-c-and-fortran-code/

And Julia can Call Python:  https://github.com/JuliaPy/PyCall.jl

And Julia can Call Java:  https://juliainterop.github.io/JavaCall.jl/


A fast language designed from the ground up for data science that has built in support for distributed programming and can run on Kubernetes with GPU support?  A language that could possibly integrate with legacy code bases via language interoperability features?  This is definitely worth look.

A commercial solution based on Julia is here:

https://juliacomputing.com/products/juliarun




No comments: