Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to convert spark dataframe to mleap tensor[double] straightly #819

Open
mullerhai opened this issue Jun 29, 2022 · 3 comments
Open

how to convert spark dataframe to mleap tensor[double] straightly #819

mullerhai opened this issue Jun 29, 2022 · 3 comments

Comments

@mullerhai
Copy link

Hi : I know how to convert mleap tensor to tensorflow tensor use our package, but I don't know how to make spark dataframe convert to mleap tensor[Double], I have found the method [TypeConverters.sparkToMleapValue()] ,but I don't know how to use,could you support me give a tutorial for this,thanks

@mullerhai
Copy link
Author

Use org.apache.spark.sql.mleap.TypeConverters

  def sparkToMleapConverter(dataset: DataFrame,
                            field: StructField): (types.StructField, (Any) => Any) = {
    (sparkFieldToMleapField(dataset, field), sparkToMleapValue(field.dataType))
  }

from spark dataframe can not get tensor[Double]

@jsleight
Copy link
Contributor

jsleight commented Jul 1, 2022

As a caveat, converting from spark to mleap is kind of an unusual thing which we don't usually need to do. If you have a spark session and dataframe, then just do things with spark. mleap runtime is more for when you don't have a spark session (e.g., in a real time inference service).

That said, the sparkToMleapConverter is the way to do the conversion of a single field if you really need to. If you need to convert the entire dataframe, then toSparkLeapFrame is probably easier. Take a look at the toSparkLeapFrame code to see how to use the sparkToMleapConverter. You use that just by adding import ml.combust.mleap.spark.SparkSupport._.

Looking at sparkFieldToMleapField code you will need to have a spark VectorUDT, MatrixUDT, or an Array[VectorUDT] in order for it to be converted to an mleap tensor.

@mullerhai
Copy link
Author

As a caveat, converting from spark to mleap is kind of an unusual thing which we don't usually need to do. If you have a spark session and dataframe, then just do things with spark. mleap runtime is more for when you don't have a spark session (e.g., in a real time inference service).

That said, the sparkToMleapConverter is the way to do the conversion of a single field if you really need to. If you need to convert the entire dataframe, then toSparkLeapFrame is probably easier. Take a look at the toSparkLeapFrame code to see how to use the sparkToMleapConverter. You use that just by adding import ml.combust.mleap.spark.SparkSupport._.

Looking at sparkFieldToMleapField code you will need to have a spark VectorUDT, MatrixUDT, or an Array[VectorUDT] in order for it to be converted to an mleap tensor.

Ok ,thank ,Now use our package ,I make from spark dataframe normally generate tensorflow-java tensor & NdArray !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants