Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[STORM-3566] add serialVersionUID field to class which implement Serializable #3193

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

howardliu-cn
Copy link
Contributor

@howardliu-cn howardliu-cn commented Jan 14, 2020

[STORM-3566]add serialVersionUID field to class which implement Serializable interface.

@Ethanlm
Copy link
Contributor

Ethanlm commented Jan 14, 2020

Thanks for your contribution.
Please file a JIRA and then update this PR title and commit message to prefix with the JIRA number.

@howardliu-cn howardliu-cn changed the title add serialVersionUID field to class which implement Serializable [STORM-3566]add serialVersionUID field to class which implement Serializable Jan 15, 2020
@howardliu-cn
Copy link
Contributor Author

Thanks for your contribution.
Please file a JIRA and then update this PR title and commit message to prefix with the JIRA number.

I have file a JIRA. The link is https://issues.apache.org/jira/browse/STORM-3566.

@Ethanlm Ethanlm changed the title [STORM-3566]add serialVersionUID field to class which implement Serializable [STORM-3566] add serialVersionUID field to class which implement Serializable Jan 15, 2020
@Ethanlm
Copy link
Contributor

Ethanlm commented Jan 15, 2020

Please also update your commit message to prefix with the JIRA number. Thanks.
Other than that, I am +1

@Ethanlm
Copy link
Contributor

Ethanlm commented Jan 23, 2020

Sorry I have a second thought on this. Although serialVersionUID is recommended, we should not add it blindly.

serialVersionUID should be added where there is a need for backwards compatibility or interoperability.

For example, the user defined Bolt/Spout. User defines their own Bolt/Spout and submits the topology over the network to Nimbus. Eventually supervisors will get the assignment and deserialize bolt/spout and put it to work. serialVersionUID is recommended in this case because user might submit their topology from a JVM different than where the supervisor is running on (e.g. Oracle vs IBM java). In this case, if serialVersionUID is not set, a UID will be generated (say 1234L) and stored in the serialized object. When supervisor tries to deserialize it, because it's running on different JVM, it could generate a different UID for this object (say 3456L), then an InvalidClassException would be thrown. Hence we want to declare serialVersionUID explicitly to avoid this.

But examples like EventLoggerBolt, it's only accessed by nimbus or supervisor itself. Not declaring serialVersionUID should be just fine. It's almost definitely devs will forget to update serialVersionUID manually on the code they changes. Declare serialVersionUID explicitly might cause unnecessary trouble.

All I am trying to say here is we should think about this carefully and do it case by case.
I would like to hear some inputs from @srdo @kishorvpatil too. I feel like this is a more complicated issue than I originally thought.

Some useful links:

  1. https://www.vojtechruzicka.com/explicitly-declare-serialversionuid/
  2. https://community.sonarsource.com/t/serializable-classes-should-use-auto-generated-version-ids/1217

Thanks for your work on this.

@srdo
Copy link
Contributor

srdo commented Jan 24, 2020

@Ethanlm I'm not sure.

I think we should probably add the field for all bolts and classes serialized as part of bolt state, since these classes are likely to be serialized by the submitter JVM, and then deserialized by the supervisor/worker.

For other classes, we should probably look at whether they get serialized by one JVM, and then deserialized by another. If they are, we should add the field, and will then have to remember to change the value if we add/change fields in PRs.

Regarding whether we remember to update the field, I think there might be a compatibility issue in the current code. As it is now, I think we can't change state in any of the serializable classes that are not packaged in topology jars without breaking user code due to changed UID. This would for example include the bolts in storm-client. I think if we change a class there, we would run into issues with rolling upgrades, since e.g. Nimbus might be running 2.1.0, and the worker might be running 2.0.1, and the version of the class that is loaded depends on the Storm version instead of depending on the contents of the topology jar.

For classes that get packaged as part of the topology jar (e.g. any bolts in the externals or examples), we should be fine.

I'm not sure how to solve the compatibility issue (assuming it exists), without moving the serializable classes out of storm-client, and into another module that we then require topology jars to package.

@Ethanlm
Copy link
Contributor

Ethanlm commented Jan 27, 2020

@srdo Thanks. I agree with you

@howardliu-cn
Copy link
Contributor Author

@srdo @Ethanlm Thanks. Would you approve this PR? Or, Do I need to do some change?

@Ethanlm
Copy link
Contributor

Ethanlm commented Feb 10, 2020

@howardliu-cn I think we need to distinguish what classes should be added with serialVersionUID and what shouldn't because of the reasons we discussed above. The decision needs to be made case by case. serialVersionUID shouldn't be added blindly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants