Could the Data DSL be used for this?
Distributed Job A --> produces results
Distributed Job B needs these results but not organized by the same rules
Can this data be moved so that Job B has the benefit of data locality?
Can this be done during the execution of JobA (not at the end)
Could the Data DSL handle this data repositioning? Would there be performance improvements?
Instead of A pushing data to data source that is not local to B. Can A push data directly to where B is located?
No comments:
Post a Comment