Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error from many DataFrame methods after UDF called in DataFrame.WithColumn #1137

Open
dogulas-accip opened this issue Feb 8, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@dogulas-accip
Copy link

I'm a long time C# programmer but just getting my feet wet with .Net for Apache Spark. Following many "getting started" instructions and videos, I installed:

7-Zip
Java 8
I downloaded Apache Spark from https://spark.apache.org/downloads.html
.NET for Apache Spark v2.1.1
WinUtils.exe I'm running this on Window 10

Problem:
When I call DataFrame.Show() after doing a DataFrame.WithColumn() using a UDF, I always get an error: [2023-02-07T15:45:31.3903664Z] [DESKTOP-H37P8Q0] [Error] [TaskRunner] [0] ProcessStream() failed with exception: System.ArgumentNullException: Value cannot be null. Parameter name: type

TestCases.csv looks like this:
TestCases.csv

OrderList.csv looks like this:
OrderList.csv

Here is the Program class of the TestSparkApp console project:
Program.cs.txt
and supporting classes:
Player.cs.txt
Collector.cs.txt

Here is the output of the above app:
TestSpartAppOutput.txt

Note that the same bug will appear executing many different methods on the DataFrame object but only after a call to the WithColumn method using a UDF. In this case, the code looks like this:

          // user defined function
           Func<Column, Column, Column> GetSubst = Udf<string, string, int>(
               (strOrder, strPlayers) =>
               {
                   return GetSubstance(strOrder, strPlayers);
               });

           // call the user defined function and add a new column to the dataframe
           ordersFrame = ordersFrame.WithColumn("substance", GetSubst(ordersFrame["names"], ordersFrame["players"]).Cast("Integer"));

           // *** This is where the error will be thrown, but if I comment it out, the same error will be thrown later
           // print out the data
           ordersFrame.Show(20, 20, false);

however, I've tried it with other UDFs followed by other DataFrame method calls and I always get the same error. In the Main() function, you will see a later foreach loop. If I comment out the ordersFrame.Show() call, and comment in the contents of the loop, I will get the same error when I access row.Values[0].ToString().

I wonder if I have missed something in my installation?

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser n/a
  • Version see above
@dogulas-accip dogulas-accip added the bug Something isn't working label Feb 8, 2023
@dogulas-accip
Copy link
Author

Well, it has been 5 days and I'm getting crickets.

I noticed that other questions have no responses after long periods of times and those that have any responses have had to wait weeks if not months.

Should I interpret this to mean that .NET for Apache Spark is sundowned and no longer supported?
Is this a dead product and we should not incorporate it in new development?

Thanks,
dogulas-accip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant