Monday, July 06, 2020

Is intra job data repositioning possible?

 Could the Data DSL be used for this?




Distributed Job A --> produces results
Distributed Job B needs these results but not organized by the same rules

Can this data be moved so that Job B has the benefit of data locality?

Can this be done during the execution of JobA (not at the end)

Could the Data DSL handle this data repositioning?  Would there be performance improvements?

Instead of A pushing data to data source that is not local to B.  Can A push data directly to where B is located?

No comments: