Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark Connector has errored json/gson after certain batch size. #109

Open
lbakshi opened this issue Nov 14, 2023 · 1 comment
Open

Spark Connector has errored json/gson after certain batch size. #109

lbakshi opened this issue Nov 14, 2023 · 1 comment

Comments

@lbakshi
Copy link

lbakshi commented Nov 14, 2023

Hi all,

I'm using the spark connector to import nearly 200M records. While I'd like to use bigger batches and make use of asynchronous importing from weaviate version 1.22, the spark connector seems to have issues in handling batch sizes beyond 200. Specifically, when going beyond 200, I often see errors like the following:


reason=ExceptionFailure(io.weaviate.spark.WeaviateResultError,error getting result and no more retries left. Error from Weaviate: [WeaviateErrorMessage(message=java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $, throwable=com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $), WeaviateErrorMessage(message=Failed ids: 42946687-9c7b-5a99-b5a5-60f2216e894d,...

Any help would be appreciated!

@trengrj
Copy link
Member

trengrj commented Nov 16, 2023

@lbakshi are you testing this in WCS or in your self-hosted Weaviate cluster?

Expected BEGIN_OBJECT but was STRING indicates the result isn't coming back as valid json. The most likely cause is a load balancer or something between Weaviate and Spark returning an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants