[PIO-138] Fix batchpredict for custom PersistentModel #447

mars · 2017-11-17T23:47:12Z

Switches batch query processing from Spark RDD to a Scala parallel collection. As a result, the pio batchpredict command changes in the following ways:

--query-partitions option is no longer available; parallelism is now managed by Scala's parallel collections
--input option is now read as a plain, local file
--output option is now written as a plain, local file
because the input & output files are no longer parallelized through Spark, memory limits may require that large batch jobs be split into multiple command runs.

This solves the root problem that certain custom PersistentModels, such as ALS Recommendation template, may themselves contain RDDs, which cannot be nested inside the batch queries RDD. (See SPARK-5063)

…park RDD

mars · 2017-11-17T23:48:17Z

I'm currently testing this change with various engines and large batches.

mars · 2017-11-18T00:56:48Z

Tested this new pio batchpredict with all three model types:

✅ custom PersistentModel (ALS Recommendation)
✅ built-in, default model serialization (Classification)
✅ null model (Universal Recommender)

This PR is ready to go!

mars · 2017-11-18T00:58:33Z

BTW, I found performance for a large, 250K query batch running on a single multi-core machine is equivalent to the previous Spark RDD-based performance.

mars · 2017-12-14T20:40:33Z

This PR stalled due to @dszeto's concerns about removing the distributed processing capability from pio batchpredict. I agree that distributed batch processing is optimal, but do not have a solution for the nested RDDs problem encountered for RDD-based persistent models.

Parallelize batchpredict with Scala parallel collections instead of S…

aad6b22

…park RDD

mars changed the title ~~Fix batchpredict for custom PersistentModel~~ [PIO-138] Fix batchpredict for custom PersistentModel Nov 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PIO-138] Fix batchpredict for custom PersistentModel #447

[PIO-138] Fix batchpredict for custom PersistentModel #447

mars commented Nov 17, 2017

mars commented Nov 17, 2017

mars commented Nov 18, 2017

mars commented Nov 18, 2017 •

edited

mars commented Dec 14, 2017

[PIO-138] Fix batchpredict for custom PersistentModel #447

Are you sure you want to change the base?

[PIO-138] Fix batchpredict for custom PersistentModel #447

Conversation

mars commented Nov 17, 2017

mars commented Nov 17, 2017

mars commented Nov 18, 2017

mars commented Nov 18, 2017 • edited

mars commented Dec 14, 2017

mars commented Nov 18, 2017 •

edited